Donald Knuth Rips On Unit Tests and More
eldavojohn writes "You may be familiar with Donald Knuth from his famous Art of Computer Programming books but he's also the father of TeX and, arguably, one of the founders of open source. There's an interesting interview where he says a lot of stuff I wouldn't have predicted. One of the first surprises to me was that he didn't seem to be a huge proponent of unit tests. I use JUnit to test parts of my projects maybe 200 times a day but Knuth calls that kind of practice a 'waste of time' and claims 'nothing needs to be "mocked up."' He also states that methods to write software to take advantage of parallel programming hardware (like multi-core systems that we've discussed) are too difficult for him to tackle due to ever-changing hardware. He even goes so far as to vent about his unhappiness toward chipmakers for forcing us into the multicore realm. He pitches his idea of 'literate programming' which I must admit I've never heard of but find it intriguing. At the end, he even remarks on his adage that young people shouldn't do things just because they're trendy. Whether you love him or hate him, he sure has some interesting/flame-bait things to say."
the prize was not US$1000. it started out very small. Knuth did indeed pay out, and indeed doubled it, several times. From wikipedia: "The award per bug started at $2.56 (one "hexadecimal dollar"[24]) and doubled every year until it was frozen at its current value of $327.68. This has not made Knuth poor, however, as there have been very few bugs claimed. In addition, people have been known to frame a check proving they found a bug in TeX instead of cashing it."
That's not literate programming at all. A tad more research on your part is required. I actually remember when "web" in a computing context a literate programming tool rather than that thing you're surfing right now.
Literate Programming interleaves the documentation (written in TeX, naturally) and code into a single document. You then run that (Web) document through one of two processors (Tangle or Weave) to produce code or documentation respectively. The code is then compiled, and the documentation built with your TeX distribution. The documentation includes the nicely formatted source code within.
You can use literate programming in any language you want. I even wrote rules for Microsoft C 7.0's Programmer's Workbench to use it within the MSC environment.
I've frequently thought about going back. Javadoc and/or Sandcastle are poor alternatives.
No, that isn't arguable.
Tex got started in 1977 after Unix (1974), well after SPICE (1973), and about even with BSD.
The snippets have markup to indicate when some snippet needs to come textually before another to keep a compiler happy, but mostly this is figured out automatically. But in general, the resulting C code is in a different order than it appears in the source documentation. For instance, the core algorithm might come first, with all the declarations and other housekeeping at the end. (With documentation about why you're using this supporting library and not that, of course.)
In literate programming, the documentation is the default state and you have to escape it to write code. It's an important difference if you're going to write a lot of it.
Why won't slashdot let me change my terrible username
From my brief look at doxygen, it looks like the biggest difference is semantic. Literate Programming with web is effectively documentation with code bits and metacode to indicate where the code bits should go. This means that the code bits can be (and should be) in the order that makes the most sense for the documentation. This is not necessarily the order that makes the most sense for the code.
Doxygen looks like it just extracts properly formatted comments in code in order to generate documentation. Web extracts properly formatted bits of code in order to generate a semantically correct C file.
Craft Beer Programming T-shirts
It predates it.
And the philosophy is different, literate program is essentially embedding the code in the documentation. Doxygen is more about embedding documentation in the code.
So doxygen gives you fancy comments and a way of generating documentation from them and from the code structure itself. CWEB lets you write the documentation and put the code in it deriving the code structure from the documentation, sample cweb program: http://www-cs-faculty.stanford.edu/~knuth/programs/prime-sieve.w
Literate programming is more suited for "dense" programs, which surprise, surprise is the type of stuff Knuth does.
"...but here we are in 2008 with no punch cards..."
Yes and no. Yes, the physical punch cards are gone, but they live on in financial institutions in the form of Automated Clearing House (ACH) debits and credits which use the 96 column IBM punch card format. So, the next time you use your credit card, ATM card, e-check or pay a bill online through some company's web site, on the backend they are probably using ACH upload files (aka NACHA format) which was based on IBM's 96 column punch card to transfer financial data. Magnetic tape may be used on a contingency basis but it has to have an additional header record, be EBCDIC encoded and use 9 track tape. The IRS and many state tax agencies use ACH transfers, as an option, to refund personal income taxes instead of sending taxpayers a physical check.
I'll forgive you for being a Java developer, but the fathers of C have always cited readability first (The C Programming Language ~1978). They don't call it "literate programming", which is simply a trendy buzzword, but the idea of programming for readability has been around for an extremely long time.
Literate Programming is not about making programming languages incredibly verbose; it's about *describing* your program in a normal, human way, by explaining it step by step and quoting bits and pieces of the program. Sounds ideal from a documentation point of view, right ? only that if that was all there was with Literate Programming, it would be a stupid idea, as documentation has a nasty habit to not follow up with code modification.
... and then remove all the green inputs:
:-P
The really cool idea with LP is that the code snippets you use in the documentation are then weaved together to generate the "real" code of your program. So a LP document is BOTH the documentation and the code. A code snippet can contains references ("include") to other code snippets, and you can adds stuff to an existing code snippet.
Let me show you an example in simple (invented) syntax:
{my program}
{title}My super program{/title}
Blablabla we'd need to have the code organized in the following loop:
{main}:
{for all inputs}:
{filter inputs}
{apply processing on the filtered inputs}
{/}
{/}
The {for all inputs} consist in the following actions:
{for all inputs}:
some code
{/}
The filtering first remove all blue inputs:
{filter inputs}:
remove all blue inputs
{/}
{filter inputs}+:
remove all green inputs
{/}
etc.
{/}
The above is purely to illustrate the idea, the actual CWEB syntax is a bit different. But you can see how, starting with a single source document, you could generate both the code and the documentation of the code, and how you can introduce and explain your code gradually, explaining things in whichever way that makes the most sense (bottom-up, top-down, a mix of those..).
In a way, Doxygen or JavaDoc have similar goals: put documentation and code together and generate documentation. But they take the problem in reverse from what literate programming propose; with Doxygen/JavaDoc, you create your program, put some little snippets of documentation, and you automatically generate a documentation of your code. With LP, you write your documentation describing your program and you generate the program.
Those two approaches produce radically different results -- the "documentation" created by Doxygen/JavaDoc is more a "reference" kind of documentation, and does little to explain the reason of the program, the choice leading to the different functions or classes, or even something as important as explaining the relationships between classes. With some effort it's possible to have such doc system be the basis of nice documentation (Apple Cocoa documentation is great in that aspect for example), but 1/ this requires more work (Cocoa has descriptive documents in addition to the javadoc-like reference) 2/ it really only works well for stuff like libraries and frameworks.
LP is great because the documentation is really meant for humans, not for computers. It's also great because by nature it will produces better documentation and better code. It's not so great as it poorly integrates with the way we do code nowadays, and it poorly integrates with OOP.
But somehow I've always been thinking that there is a fundamentally good idea to explore there, just waiting for better tools/ide to exploit it
(also, the eponymous book from Knuth is a great read)
I think, perhaps, you're missing the point. Go ahead, build a prototype and try out ideas. Do the Brooks thing, and build one to throw away. Work out exactly what it is you want to do via experimentation. None of that contradicts literate programming, or "thinking first": the prototypes, the messing around, that's part of the thinking (stage one really). Once you've gone through your iterations and want to finalise something... well at that point you do have some specs, you should know what you want to build; and at that point you can use literate programming, and you can do some design by contract, and make the finalised version robust, well documented, and easily maintainable and reuasable. Building prototypes is not failing to "think first"; not "thinking first" is shipping the nth iteration of the prototype as is and calling it done.
Craft Beer Programming T-shirts
That's a mischaracterization of literate programming.
The whole idea of literate programming is to basically write good technical documentation -- think (readable) academic CS papers -- that you can in effect execute. What many people do with Mathematica and Maple worksheets is effectively literate programming.
It has nothing to do with what language you use, and is certainly not about making your code more COBOL-esque.
Maybe think of it this way: Good documentation should accurately describe what your code does. In literate programming, the computer code is just the "comments" you add to your documentation so that the computer can execute it.
See this post, for instance.
The GP must have been confused by the example on Wikipedia, which a) wasn't literate programming and b) used a shitty made-up language where "multiplyby" was one of the operators. Literate programming is programming (in your favourite language) with a code-in-documentation approach instead of the usual documentation-in-code approach. So, for example, the flow of your literate program is defined by how best to explain what's happening to a human reader, rather than being constrained by the order the compiler requires. You run your literate program through a tool and it spits out compilable code or pretty documentation.
Remember that MMIX is not designed to be a practical hardware computer architecture. It's designed to illustrate algorithms written in assembly language. It's optimized for humans to read and write, not for computers to execute quickly. I'm glad that he's keeping assembly as part of his books, and that's he's updated them to a 64-bit RISC architecture. Reading MMIX assembly programs is the closest to hardware that some readers will ever get, so he has one chance to show those readers how computers actually work. It had better be as simple as possible for people to understand.
What a fool believes, he sees, no wise man has the power to reason away.
Something like this might help: folding in vim. Emacs probably already has an 11-note chord that does this.
Software patents delenda est.
It's called ravioli code
The world has changed and we all have become metal men.
Craft Beer Programming T-shirts
I think most people who post here don't know what literate programming is. It's more like writing a textbook explaining how your code works, but you can strip away the text and actually have runnable code. This code can be in any language of your choice. It makes sense from Knuth's point of view, but for many of us, we don't write textbooks for a living.
Knuth also doesn't need unit testing because he probably runs (or type checks) the program in his head. Again, for most of us, seeing the program run provides additional assurance that it works. Unit tests also provide a specification of your program. It doesn't have to be just b = f(a). For example, if your code implements a balanced binary search tree, a unit test could check the depth of all subtrees to make sure the tree is balanced. Another unit test would check if the tree is ordered. You can prove by the structure of your program that these properties hold, but a lay-man doesn't want to write proofs for the code he writes, so the second best alternative is to use unit test.
About parallel programming, Knuth is actually right. Many high-performance parallel programs are actually very involved with the underlying architecture. But we can write a high-level essentially-sequential program that uses libraries to compute things like FFT and matrix multiplication in parallel. This tends to be the trend anyways.
I once had a signature.
It is also the use of accurate and descriptive symbol names.
Database database("data.txt");
if (database.empty())
is a lot more readable (i.e. literate) than
DB d("data.txt");
if (d.e())
I will never live for sake of another man, nor ask another man to live for mine.
It's integrated into my IDE (VS2008 Team Edition), so I have no excuse not to - it's just a couple of clicks.