Donald Knuth Rips On Unit Tests and More

← Back to Stories (view on slashdot.org)

Donald Knuth Rips On Unit Tests and More

Posted by CmdrTaco on Saturday April 26, 2008 @04:46AM from the all-hail-knuth dept.

eldavojohn writes "You may be familiar with Donald Knuth from his famous Art of Computer Programming books but he's also the father of TeX and, arguably, one of the founders of open source. There's an interesting interview where he says a lot of stuff I wouldn't have predicted. One of the first surprises to me was that he didn't seem to be a huge proponent of unit tests. I use JUnit to test parts of my projects maybe 200 times a day but Knuth calls that kind of practice a 'waste of time' and claims 'nothing needs to be "mocked up."' He also states that methods to write software to take advantage of parallel programming hardware (like multi-core systems that we've discussed) are too difficult for him to tackle due to ever-changing hardware. He even goes so far as to vent about his unhappiness toward chipmakers for forcing us into the multicore realm. He pitches his idea of 'literate programming' which I must admit I've never heard of but find it intriguing. At the end, he even remarks on his adage that young people shouldn't do things just because they're trendy. Whether you love him or hate him, he sure has some interesting/flame-bait things to say."

22 of 567 comments (clear)

Min score:

Reason:

Sort:

Re:Did anyone claim the bug prize on TeX? by paulbd · 2008-04-26 05:01 · Score: 4, Informative

the prize was not US$1000. it started out very small. Knuth did indeed pay out, and indeed doubled it, several times. From wikipedia: "The award per bug started at $2.56 (one "hexadecimal dollar"[24]) and doubled every year until it was frozen at its current value of $327.68. This has not made Knuth poor, however, as there have been very few bugs claimed. In addition, people have been known to frame a check proving they found a bug in TeX instead of cashing it."
Re:Literate programming... by Basilius · 2008-04-26 05:10 · Score: 5, Informative

That's not literate programming at all. A tad more research on your part is required. I actually remember when "web" in a computing context a literate programming tool rather than that thing you're surfing right now.

Literate Programming interleaves the documentation (written in TeX, naturally) and code into a single document. You then run that (Web) document through one of two processors (Tangle or Weave) to produce code or documentation respectively. The code is then compiled, and the documentation built with your TeX distribution. The documentation includes the nicely formatted source code within.

You can use literate programming in any language you want. I even wrote rules for Microsoft C 7.0's Programmer's Workbench to use it within the MSC environment.

I've frequently thought about going back. Javadoc and/or Sandcastle are poor alternatives.
and, arguably, one of the founders of open source? by xx_chris · 2008-04-26 05:20 · Score: 2, Informative

No, that isn't arguable.

Tex got started in 1977 after Unix (1974), well after SPICE (1973), and about even with BSD.
Documentation is the source by CustomDesigned · 2008-04-26 05:24 · Score: 5, Informative

So basically it's the same as the XML comments you can put in your .Net or Java code to create JavaDocs, or whatever they are called in .Net, based on the comments in the code? Not quite. In Javadoc (or the C/C++ equivalent) the C/Java code is the source, and documentation is generated from that. In literate programming, the documentation is the source, and it has code snippets, like you would see in a Knuth textbook.

The snippets have markup to indicate when some snippet needs to come textually before another to keep a compiler happy, but mostly this is figured out automatically. But in general, the resulting C code is in a different order than it appears in the source documentation. For instance, the core algorithm might come first, with all the declarations and other housekeeping at the end. (With documentation about why you're using this supporting library and not that, of course.)
Re:Literate programming... by iMacGuy · 2008-04-26 05:49 · Score: 2, Informative

In literate programming, the documentation is the default state and you have to escape it to write code. It's an important difference if you're going to write a lot of it.

--
Why won't slashdot let me change my terrible username :(
The Summary Exaggerates the Interview by Cal+Paterson · 2008-04-26 05:59 · Score: 5, Informative

Knuth said many of these supposedly outrageous things in passing, and does it while noting that he is an academic. Most of these claims in the summary vastly exaggerates the strength of the claims in the interview. Knuth specifically states;
there's no reason anybody should care about the opinions of a computer scientist/mathematician like me regarding software development.
Knuth doesn't claim that unit testing is a waste of time for everyone, just that it is a waste of time for him, in his circumstances. This makes sense, considering he follows his own (diametrically opposed) doctrine of "literate programming", which, if the summary author has never heard of, should cause him to be cautious about interpreting Knuth.
Re:Literate programming... by Sancho · 2008-04-26 06:00 · Score: 2, Informative

From my brief look at doxygen, it looks like the biggest difference is semantic. Literate Programming with web is effectively documentation with code bits and metacode to indicate where the code bits should go. This means that the code bits can be (and should be) in the order that makes the most sense for the documentation. This is not necessarily the order that makes the most sense for the code.

Doxygen looks like it just extracts properly formatted comments in code in order to generate documentation. Web extracts properly formatted bits of code in order to generate a semantically correct C file.
Re:Literate programming... by Coryoth · 2008-04-26 06:02 · Score: 2, Informative

Excuse my ignorance, but please explain how this this different (or superior) to doxygen or any of the many systems that do just this. I'm not meaning to be rude, I'm just asking. I think the prime difference is that literate programming allows you to re-order the code; that is, you include snippet of code within the documentation, and attach tage to the snippets that allow them to be reassembled in a different order. That doesn't sound like much, but it means that you can just write the documentation have code appear as i is relevenat to the documentation rather than having the program structure dictate things. Take a look at some examples (in various languages) to see what I mean. The key here is that documentation is (or should be) first and foremost in the writers mind, and it is the documentation that dictates presentation structure. This means that you are concentrating on writing the documentation, and will thus write it well, as opposed to concentrating on code, and adding documentation as an afterthought if you get around to it. Well, that's the principle at least.

--
Craft Beer Programming T-shirts
Re:Literate programming... by sholden · 2008-04-26 06:02 · Score: 2, Informative

It predates it.

And the philosophy is different, literate program is essentially embedding the code in the documentation. Doxygen is more about embedding documentation in the code.

So doxygen gives you fancy comments and a way of generating documentation from them and from the code structure itself. CWEB lets you write the documentation and put the code in it deriving the code structure from the documentation, sample cweb program: http://www-cs-faculty.stanford.edu/~knuth/programs/prime-sieve.w

Literate programming is more suited for "dense" programs, which surprise, surprise is the type of stuff Knuth does.
Re:he's from another era by Not+The+Real+Me · 2008-04-26 06:30 · Score: 2, Informative

"...but here we are in 2008 with no punch cards..."

Yes and no. Yes, the physical punch cards are gone, but they live on in financial institutions in the form of Automated Clearing House (ACH) debits and credits which use the 96 column IBM punch card format. So, the next time you use your credit card, ATM card, e-check or pay a bill online through some company's web site, on the backend they are probably using ACH upload files (aka NACHA format) which was based on IBM's 96 column punch card to transfer financial data. Magnetic tape may be used on a contingency basis but it has to have an additional header record, be EBCDIC encoded and use 9 track tape. The IRS and many state tax agencies use ACH transfers, as an option, to refund personal income taxes instead of sending taxpayers a physical check.
It's the same philosophy that K&R impart... by galimore · 2008-04-26 06:37 · Score: 3, Informative

I'll forgive you for being a Java developer, but the fathers of C have always cited readability first (The C Programming Language ~1978). They don't call it "literate programming", which is simply a trendy buzzword, but the idea of programming for readability has been around for an extremely long time.
Re:Literate programming... by Nicolas+Roard · 2008-04-26 06:56 · Score: 4, Informative

Literate Programming is not about making programming languages incredibly verbose; it's about *describing* your program in a normal, human way, by explaining it step by step and quoting bits and pieces of the program. Sounds ideal from a documentation point of view, right ? only that if that was all there was with Literate Programming, it would be a stupid idea, as documentation has a nasty habit to not follow up with code modification.

The really cool idea with LP is that the code snippets you use in the documentation are then weaved together to generate the "real" code of your program. So a LP document is BOTH the documentation and the code. A code snippet can contains references ("include") to other code snippets, and you can adds stuff to an existing code snippet.

Let me show you an example in simple (invented) syntax:

{my program}

{title}My super program{/title}

Blablabla we'd need to have the code organized in the following loop:

{main}:
{for all inputs}:
{filter inputs}
{apply processing on the filtered inputs}
{/}
{/}

The {for all inputs} consist in the following actions:

{for all inputs}:
some code
{/}

The filtering first remove all blue inputs:

{filter inputs}:
remove all blue inputs
{/} ... and then remove all the green inputs:

{filter inputs}+:
remove all green inputs
{/}

etc.

{/}

The above is purely to illustrate the idea, the actual CWEB syntax is a bit different. But you can see how, starting with a single source document, you could generate both the code and the documentation of the code, and how you can introduce and explain your code gradually, explaining things in whichever way that makes the most sense (bottom-up, top-down, a mix of those..).

In a way, Doxygen or JavaDoc have similar goals: put documentation and code together and generate documentation. But they take the problem in reverse from what literate programming propose; with Doxygen/JavaDoc, you create your program, put some little snippets of documentation, and you automatically generate a documentation of your code. With LP, you write your documentation describing your program and you generate the program.

Those two approaches produce radically different results -- the "documentation" created by Doxygen/JavaDoc is more a "reference" kind of documentation, and does little to explain the reason of the program, the choice leading to the different functions or classes, or even something as important as explaining the relationships between classes. With some effort it's possible to have such doc system be the basis of nice documentation (Apple Cocoa documentation is great in that aspect for example), but 1/ this requires more work (Cocoa has descriptive documents in addition to the javadoc-like reference) 2/ it really only works well for stuff like libraries and frameworks.

LP is great because the documentation is really meant for humans, not for computers. It's also great because by nature it will produces better documentation and better code. It's not so great as it poorly integrates with the way we do code nowadays, and it poorly integrates with OOP.

But somehow I've always been thinking that there is a fundamentally good idea to explore there, just waiting for better tools/ide to exploit it :-P

(also, the eponymous book from Knuth is a great read)
Re:Out of favor by Coryoth · 2008-04-26 07:14 · Score: 2, Informative

I think, perhaps, you're missing the point. Go ahead, build a prototype and try out ideas. Do the Brooks thing, and build one to throw away. Work out exactly what it is you want to do via experimentation. None of that contradicts literate programming, or "thinking first": the prototypes, the messing around, that's part of the thinking (stage one really). Once you've gone through your iterations and want to finalise something... well at that point you do have some specs, you should know what you want to build; and at that point you can use literate programming, and you can do some design by contract, and make the finalised version robust, well documented, and easily maintainable and reuasable. Building prototypes is not failing to "think first"; not "thinking first" is shipping the nth iteration of the prototype as is and calling it done.

--
Craft Beer Programming T-shirts
That's not literate programming! by TerranFury · 2008-04-26 07:26 · Score: 2, Informative

That's a mischaracterization of literate programming.

The whole idea of literate programming is to basically write good technical documentation -- think (readable) academic CS papers -- that you can in effect execute. What many people do with Mathematica and Maple worksheets is effectively literate programming.

It has nothing to do with what language you use, and is certainly not about making your code more COBOL-esque.

Maybe think of it this way: Good documentation should accurately describe what your code does. In literate programming, the computer code is just the "comments" you add to your documentation so that the computer can execute it.

See this post, for instance.
Re:You misunderstand by Anonymous Coward · 2008-04-26 08:07 · Score: 5, Informative

The GP must have been confused by the example on Wikipedia, which a) wasn't literate programming and b) used a shitty made-up language where "multiplyby" was one of the operators. Literate programming is programming (in your favourite language) with a code-in-documentation approach instead of the usual documentation-in-code approach. So, for example, the flow of your literate program is defined by how best to explain what's happening to a human reader, rather than being constrained by the order the compiler requires. You run your literate program through a tool and it spits out compilable code or pretty documentation.
Re:MMIX is poor design by bunratty · 2008-04-26 08:22 · Score: 2, Informative

Remember that MMIX is not designed to be a practical hardware computer architecture. It's designed to illustrate algorithms written in assembly language. It's optimized for humans to read and write, not for computers to execute quickly. I'm glad that he's keeping assembly as part of his books, and that's he's updated them to a 64-bit RISC architecture. Reading MMIX assembly programs is the closest to hardware that some readers will ever get, so he has one chance to show those readers how computers actually work. It had better be as simple as possible for people to understand.

--
What a fool believes, he sees, no wise man has the power to reason away.
Re:Literate programming... by Anomolous+Cowturd · 2008-04-26 08:27 · Score: 2, Informative

Something like this might help: folding in vim. Emacs probably already has an 11-note chord that does this.

--
Software patents delenda est.
Re:Spaghetti-O Code by mykdavies · 2008-04-26 09:08 · Score: 4, Informative

It's called ravioli code

--
The world has changed and we all have become metal men.
Re:Shocked by Coryoth · 2008-04-26 10:57 · Score: 2, Informative

Yes! I like writing code to see how things pan out, it's one of my ways of thinking about what the problems and goals are. But I don't intend that to be the final code - make a cheap throwaway prototype, then make the final product, possibly salvaging bits of the prototype. Don't get me wrong, and not arguing against that. Neither is Knuth. He mentions such scenarios as one of the cases when he does find a use for unit testing. That's part of the "thinking" stage. From what I gather from the interview Knuth tends to do that with pencil and paper, but you can do it just as well by mucking out some example code and test cases. The point, as you note, is that it isn't your final code: it is that "final code" step (once you've figured out what you want to do) that literate programming really comes in (and in this sense is orthogonal to TDD and similar approaches).

--
Craft Beer Programming T-shirts
Literate programming by pikine · 2008-04-26 11:57 · Score: 2, Informative

I think most people who post here don't know what literate programming is. It's more like writing a textbook explaining how your code works, but you can strip away the text and actually have runnable code. This code can be in any language of your choice. It makes sense from Knuth's point of view, but for many of us, we don't write textbooks for a living.

Knuth also doesn't need unit testing because he probably runs (or type checks) the program in his head. Again, for most of us, seeing the program run provides additional assurance that it works. Unit tests also provide a specification of your program. It doesn't have to be just b = f(a). For example, if your code implements a balanced binary search tree, a unit test could check the depth of all subtrees to make sure the tree is balanced. Another unit test would check if the tree is ordered. You can prove by the structure of your program that these properties hold, but a lay-man doesn't want to write proofs for the code he writes, so the second best alternative is to use unit test.

About parallel programming, Knuth is actually right. Many high-performance parallel programs are actually very involved with the underlying architecture. But we can write a high-level essentially-sequential program that uses libraries to compute things like FFT and matrix multiplication in parallel. This tends to be the trend anyways.

--
I once had a signature.
Re:You misunderstand by jwiegley · 2008-04-26 12:04 · Score: 3, Informative

It is also the use of accurate and descriptive symbol names.

Database database("data.txt");
if (database.empty())

is a lot more readable (i.e. literate) than

DB d("data.txt");
if (d.e())

--
I will never live for sake of another man, nor ask another man to live for mine.
Re:He's right by shutdown+-p+now · 2008-04-26 21:20 · Score: 2, Informative

It's integrated into my IDE (VS2008 Team Edition), so I have no excuse not to - it's just a couple of clicks.