Knuth Releases Another Part of Volume 4
junge_m writes "Donald Knuth has released another of his by now famous pre-fascicles to Volume 4 of his epic:
Pre-fascicle 2c is all about 'Generating all Combinations' supplementing his pre-fascicles 2a and 2b.
Furthermore he challenges us all to do more of his daunting exercises and report our success. He thinks we are way too lazy in this respect! So come on slashdot crowd: Do your homework and get the credit from the grandmaster himself!"
anyone else think ThinkGeek.net needs some 'Absolut Knuthingness' tshirts?
Up until now, I've been quite content to have read none of the 3 volumes of TAoCP; with this 4th volume I'm starting to feel poorly read.
The main reason that I have been hesitant to purchase and slog through these books has been the fact that they are written with an outdated assembly language for a non existant processor. I realize that the point is to learn these algorithms, however, since I rarely if ever code on that level any longer, is there an alternative? Something using a language like Java, Python, or even, ack! , 'C' would be more to my liking.
No flames please. This is just an honest question.
Thanks
Technically, all programmer types are supposed to be lazy. Personally, I try to keep my code clean and commented because when I come back to it in a month, I know I'll be too lazy to read through it and figure out what I was doing. Also, being lazy aboud doing work is what leads to reduced algorithmic complexity, right?
Necessity is the mother of invention, but laziness is the father.
He's right - I'm way too lazy to spend more than a few minutes on things like that.
He thinks we are way too lazy in this respect! So come on slashdot crowd: Do your homework and get the credit from the grandmaster himself!
/. articles].
No no no... He's right [yawns, stretches, checks for new
fascicle Pronunciation Key (fs-kl) n.
1. A small bundle.
2. One of the parts of a book published in separate sections. Also called fascicule.
3. Botany. A bundle or cluster of stems, flowers, or leaves.
4. See fasciculus.
I believe the definition used here is #2.
Second, a quick definition of what this is all about: it appears to be a collection of great scientific and programming works to be used as a primer for new programmers.
Hopefully, that allays some of the confusion I was having among others out there.
Knuth's own reply to this question can be seen at http://www-cs-faculty.stanford.edu/~knuth/mmix.htm l.
To quote:
"Many readers are no doubt thinking, 'Why does Knuth replace MIX by another machine instead of just sticking to a high-level programming language? Hardly anybody uses assemblers these days.'
Such people are entitled to their opinions, and they need not bother reading the machine-language parts of my books. But the reasons for machine language that I gave in the preface to Volume 1, written in the early 1960s, remain valid today:
One of the principal goals of my books is to show how high-level constructions are actually implemented in machines, not simply to show how they are applied. I explain coroutine linkage, tree structures, random number generation, high-precision arithmetic, radix conversion, packing of data, combinatorial searching, recursion, etc., from the ground up.
The programs needed in my books are generally so short that their main points can be grasped easily.
People who are more than casually interested in computers should have at least some idea of what the underlying hardware is like. Otherwise the programs they write will be pretty weird.
Machine language is necessary in any case, as output of many of the software programs I describe.
Expressing basic methods like algorithms for sorting and searching in machine language makes it possible to carry out meaningful studies of the effects of cache and RAM size and other hardware characteristics (memory speed, pipelining, multiple issue, lookaside buffers, the size of cache blocks, etc.) when comparing different schemes.
Moreover, if I did use a high-level language, what language should it be? In the 1960s I would probably have chosen Algol W; in the 1970s, I would then have had to rewrite my books using Pascal; in the 1980s, I would surely have changed everything to C; in the 1990s, I would have had to switch to C++ and then probably to Java. In the 2000s, yet another language will no doubt be de rigueur. I cannot afford the time to rewrite my books as languages go in and out of fashion; languages aren't the point of my books, the point is rather what you can do in your favorite language. My books focus on timeless truths. "
~~~~~ BigLig2? You mean there's another one of me?
Apparently very few people cash the (few) checks Don Knuth writes in reward for bugs. Would you rather have $2.56 or a signed checque from Mr Knuth?
256 equals $100. We're being shafted.
"I have opinions of my own, strong opinions, but I don't always agree with them." -- George H. W. Bush
"...But I must confess that I'm also disappointed to have had absolutely no feedback so far on several of the exercises on which I worked hardest when I was preparing this material. Could it be that (1) you've said nothing about them because I somehow managed to get the details perfect? Or is it that (2) you shy away from the more difficult stuff, being unable to spend more than a few minutes on any particular topic?..."
... or could it be (3) that you'd have to be one crack-smoking codemonkey of a nut to spend your spare time doing exercises which (1) require a superbrain, (2) are boring, (3) your superbrain computer science professor already did a week ago to collect the $0.23 award for the errata report.
WinZIP can handle .gz files, and I think PKZIP can as well.
.ps goes, if you have a PostScript printer then dump it straight to that. If not, Adobe Acrobat (full version, not the reader) has a utility that convers .ps to .pdf.
As far as
Learning HOW to think is more important than learning WHAT to think.
I just looked this author up at Amazon. Here is some of his previous work:
The Art of Computer Programming, Volumes 1-3
From the Inside Flap
"The bible of all fundamental algorithms and the work that taught many of today's software developers most of what they know about computer programming."-- Byte, Sept 1995
"If you think you're a really good programmer,...read [Knuth's] Art of Computer Programming....You should definitely send me a resume if you can read the whole thing." -- Bill Gates
This does not sound like it is aimed at the core slashdot crowd, based on the Amazon reviews I am reading. Honestly I have never heard of the guy before. He is without a doubt more for the "hard core" among us. Volume 1 seems to have been written in the 1960's, so this guys been at it a while.
Plenty of reader reviews. Many with comments like:
This timeless classic is bound to make the student (Yes you ought to be a dedicated one..no casual reading here!) proficient in the art and science of constructing programs. -Ganapathy Subramaniam
Be prepared for your brain to do some crunching if you really want to get into this guys work.
-Pete
(amazon affilate like to the book...just so ya know.)
Soccer Goal Plans
just code it in C and then use gcc targeted at mmix to see the assembly and compare them with what he wrote(some of the time it wont be the same if ever because of register alocation but you get the point)
regards
john 'mips64' jones
It's still probably helpful to know where you can obtain and learn to use a MIPS VM. Check out the GNU MDK manual Contains the MIX instruction set and a programming tutorial, as well as documentation on using the VM itself.
For the intro:
In his book series The Art of Computer Programming (published by Addison Wesley), D. Knuth uses an imaginary computer, the MIX, and its associated machine-code and assembly languages to ilustrate the concepts and algorithms as they are presented.
The MIX's architecture is a simplified version of those found in real CISC CPUs, and the MIX assembly language (MIXAL) provides a set of primitives that will be very familiar to any person with a minimum experience in assembly programming. The MIX/MIXAL definition is powerful and complete enough to provide a virtual development platform for writing quite complex programs, and close enough to real computers to be worth using when learning programming techniques. At any rate, if you want to learn or improve your programming skills, a MIX development environment would come in handy.
The MDK package aims at providing such virtual development environment on a GNU box. Thus, MDK offers you a set of utilities to simulate the MIX computer and to write, compile, run and debug MIXAL programs.
I was actually hoping for this.
Cached version of the Knuth document is here.
In addition to Knuth's response (given by another poster), let me add that the assembly language used does not, in practice, detract from the usefulness of the books -- and if one wishes, transcribing any of his algorithms from MIX or MMIX to C is child's play and can be done in ones' head while reading them; changing code from MIX or MMIX to an assembly language for real hardware is similarly easy -- a design goal of both.
Further, while MIX may well be showing its age, Knuth's newer MMIX assembly language is anything but outdated; its features should map well to new processors released for years.
I was a junior in college when I first read The Art of Computer Programming, and it really opened up my mind to what computer science is all about. It challenged me to think outside the HLL box. Knuth's work has become a timeless classic because he concentrates on the higher level concepts of computer science that transcend the currently popular architectures.
What really blew me away about Knuth's work is that he implements all of the features found in modern HLL's in his own variant of assembler. Someone who can work through and solve the exercises in his books will find themselves able to write programs in any language, and write them well at that. He does not concern himself with the Language Wars, or the Platform Wars, but instead presents the problems and solutions which are common to all computer systems. Too many programmers have been babied intellectually by their colleges and universities, which taught them how to program in a high level language rather than teaching them the fundamentals underlying computer science. Knuth does a good job in getting down to the underlying problems of computer science without bothering the user with the details of arcane architectures that will soon be obsolete.
For this reason, I look forward to his forthcoming work. I look forward to the new challenges which will expand my mind even farther.
The society for a thought-free internet welcomes you.
My favorite Knuth quote, when he gave a class a snippet of code to use in their program (not verbatim, sorry):
"Be careful with this code; I have only proven it correct, not tested it."
A demonstration of Hacker Nature:
He wasn't happy with the typesetting on his first book, and decided this should be done by computer, so he wrote a markup language for typesetting.
Of course, he wanted to do it right, so this took him... well... about a decade. And when he was done, he had written TeX. He was very pleased; his publishers thought this was odd, as the new typesetting looked worse than the old.
A few years later, high-resolution laser printers became available; TeX already suppported them, and lo and behold, the new version did look better.
TeX is a huge monster of a programming language/application. Knuth offered a cash prize of $(2^N) for the Nth unique bug report. TeX is now, like, 20 years old, and that system cost him under $1K.
If programmers were Jedi, he would be Yoda.
If programmers were wizards, he was be Gandalf.
He is the serious, friendly grandfather who can kick the butts of all us whippersnapers. So pay attention!
For other Windoes folks, ToastScript is a handy little Java app to read and print PostScript.
"God fights on the side with the best artillery." - Napoleon, Marshal of France - speaking truth to power
I was poking around Knuth's site, looking at the instruction set for MMIX , when I came across this instruction (SR, SRU added for comparison):
3C SR shift right (1) rA
3D SRI Stanford Research Institute (2) rA
3E SRU shift right unsigned (1)
What's that do then?
decoding the octacode
Gray binary clusters of subcubes
medians of bit strings
Gray fields
constructing large-gap codes
an infinite Gray path that fills n-dimensional space
loopless generation of fence-poset ideals
Knuth is entitled to his opinion. It is certainly true that compilers tend to hide the details of the execution. And translating the prior volumes would be a lot of work, so for stylistic consistency it makes sense to continue to use MIX.
... either the Mac 128 or the Fat Mac [512K]). I forget whether the code was presented in Byte or Dr Dobbs. Byte I think. I'm sure it could have been done better, but it wouldn't need to be done better to provide experimental backing to the theoretical arguments. (I suppose that in a sense the MIX virtual machines also provide this, but I don't consider them any better in principle that any other interpreter, and less good than a compiler.)
That said, I feel it's a mistake. I have seen a large subset of C that was implementable via macros as a M68000 assembler language program. This would be a vehicle that would possess all of the virtues of MIX, while being readily understandable by most skilled programers. (They might not be able to compose in it without study, but they could understand algorithms written in it with reasonable ease.) In addition, the timings would be experimentally verifiable (if you could lay your hands on an old Mac
N.B.: If you don't insist on an actual hardware version of the machine, such as would be provided by the 68000, then I still don't see any advantage of MIX over a selected subset of C.
Additionally: MIX doesn't have much similarity to a recent-generation CPU chip. There are a multitude of reasons that Assembler has fallen out of fashion. Lack of portability was one of the major ones. But an even better reason has been the increasing complexity of the chips themselves. Hand-optimizing has become increasingly counter-productive for most people. (It's also become generally less necessary, but that's a separate argument.)
As always, however, there will be exceptions. There exist specialists who need this kind of skil, CPU version dependant though it be. Somebody needs to design the chips. Somebody needs to desigh the compiler optimization strategies. But even for these people, MIX will be a suboptimal choice.
I think we've pushed this "anyone can grow up to be president" thing too far.
>Moorish artisans used to purposedly introduce a
>mistake in their tile designs, as it would be
>presumptous for a mere mortal to attempt
>perfection.
I've heard that about Navajo rugs too, but have never been able to verify it.
-fb Everything not expressly forbidden is now mandatory.
So who else noticed that all MMIX processors have a built in serial number?
Yes, I know they are hypothetical...
None of the languages you mention was available at the time when he started writing the books.
And most of the languages you mention will not be modern at the time when the last books comes out.
Most of the languages you mention hides so much of the underlying hardware that it is hard to see how changes in the software can be influences by the hardware.
On the other hand there is none stopping you from rewriteing the examples in Java if you feel up too it.
C is already an language on its way out, Python seams destined to forever be a minor language and if MS manage to kill Java - in 10 years time the book will be just another antique.
By using a made-up language the books will never become old and out-fashioned.
If you want to write a clsssic you can't fathom towards todays fashion but most go your own way.
Just saying it like it are.
A friend of mine suggested printing post-it notes with Java code to paste over MIX code in the tAoCP.
Suggesting that Knuth should implement his algorithms in Java is the strongest argument for MIX I've ever heard.
Sometimes it's best to just let stupid people be stupid.
What about inflation Prof. Knuth!?
Knuth routinely doubles the bounty for finding an error in his TAoCP books or in his TeX program. This bounty doubles each time an error is found and corrected.
Will I retire or break 10K?
As its been some twenty years since someone came up with a new sort algorithm, and, about fifteen years since someone came up with a really new key hashing algorithm, you will find that 98% of the early volumes relevent.
Most of the irrelevent 2% involves tweeking algorithms to save memory so it might still be relevent if you are working on an embedded system.
Old COBOL programmers never die. They just code in C.
...get really turned off by Knuth's over-inflated ego? I don't doubt that the guy is smart, but I can't stand even reading his FAQ, let alone trying to make it through tAoCS. I don't buy into all his bullshit about batch processing, etc... I think he's just a smart guy who's full of himself and likes to be eccentric. And has a lot of people eating out of his hand.
As evidence of this, I point to Dick Feynman, who was incredibly smart, amassed a decent body of work, both in terms of theories created and teaching books/lessons written, but at the same time took time out to play in a native Brazillian band.
I might be a computer scientist, but I look up to Dick Feynman, and I shake my head at Donald Knuth.
_sig_ is away
I can appreciate the common-denominator factor for the MIX language, but nobody programs computers anymore - they program the OPERATING SYSTEM of those computers. The people who write the O/S are the ones actually programming the machine, and once that's done, the application developers (i.e. the majority) just use the services they provide. The last time I heard of anyone programming the machine rather than the OS was in DOS games.
If you need to have a algoritms book written for you in a certain language then you are missing the point of the study of algorithms completely. Algorithms wither in assembly or high level language still read the same!.. stop complaining about having to use your brain. If you can't understand them because they are written in assembly please take the time to learn to learn it. It will only help your programming believe me i haven't met a compentant programmer that couldn't easily switch his/her fundamental algoritms back and forth between high level and low level without a sweat. besides it's really what is happening behind the scences of your c# code anyways.
I certainly know who he is (he wrote the intresting "society of mind"), but I always figured him more of a philosophical AI-guy than a heavy-theory-guy. I've taken a lot of courses in AI and game theory, and he is never mentioned along with the greats... What field has he contributed to?
Opinions stated are mine and do not reflect those of the Illuminati
Too too many programmers think a line of code is a line of code. They think strcat is just hunky-dory, and append great gobs of strings this way, without realizing that each append traverses the entire string from the beginning each time. They use malloc and free, new and delete, with abandon, not realizing how much they are thrashing the malloc heap, when local variables on the stack would do just as well 99% of the time.
They DO NOT UNDERSTAND the concept of finite resources, that machine cycles cost time.
I believe the first programming course should be a very few weeks of something akin to LOGO, or BASIC, just to get the concept of bugs and such out of the way, weed out those who can't stand thinking. Then a good grounding in a z80 or some other simple 8 bitter, where counting cycles and bytes is part of the course (learn how expensive those cute index registers really are). Only then, when an understanding of machine cycles and bytes has been established, should students move on to a higher level.
Too many ivory towers out there, too many straight-A-can't-tie-their-shoelaces types.
Infuriate left and right
If you'd been around when TeX came out, you wouldn't be saying that. If you were serious about preparing math-heavy documents, you wouldn't be saying that.
What do you consider a good system now -- docbook? If so, remember that your docbook renderer uses TeX as its backend. If you consider (say) Word a good document preparation system, you have no room to speak on the subject.
TeX is an amazingly reliable, high-quality piece of software -- I doubt that anything on its scale has ever been created with as few bugs. Yes, the input syntax is a bit dated -- but in its day, it kicked ass. No doubt if Knuth were doing it today, it'd kick ass by modern standards as well.
C# didn't exist when he started volume 4. Java didn't exist when he started 1-3. Are you suggesting that he should rewrite each volume every time a new language becomes fashionable?
The point of a bare bones old language is that it favors nothing. If you have to see an idea in C# or Java to understand it, then you are a piss poor programmer.
Infuriate left and right
I was pretty turned off by the assembler too, since I don't do any assembler coding at all. I'm a C guy my self. But the thing here is that he gives the algorthms in a pseudo-code form, implementable in any language. When I look at his stuff, I read his writing, and the algorthms, but I don't even bother with the ASM. Yes, I know i'm missing a lot. But he has vastly more than enough otherwise.
Two examples really get me foaiming at the mouth on this. 1977 or so, worked on a 16K 8 bitter, datapoint 2200, similiar to 8080 or z80. One routine set the cursor position from D (x) and E (y) registers. One guy could not think below the Algol level (shows how dated this example is!). Instead of just setting D and E and calling the cursor set routine, which would write D and E to the appriopriate I/O registers, he pushed addresses of the values onto the (hardware limited) 16 level stack, then called the routine, which popped off the addresses one by one, loaded the values, wrote them to the I/O registers. On a 16K machine! Simply could not adapt to anything less than a Burroughs mainframe.
Much more recently, worked on some code from the 1990s, where someone had written some semi-cute memory allocate / free routines. You passed a size and the address of a free routine. It allocated 8 bytes extra, storing the free routine address and size there, but returned the address + 8 to skip the semi-hidden header. The main free routine called the saved routine address if not NULL, then freed the actual memory.
Some bozo had to compare two data structures. Instead of simply comparing various fields, he built up a string representation of each. About a dozen times for each, he determined how many extra bytes to add to the built up string, allocated that many more bytes, used strcat and sprintf to append the conversion representation, and finally used strcmp to tell if the strings differed! Then freed the two built up strings. Gaaak! Gave me the shudders every time I saw crap like that, but it was a legacy product, soon to be obsoleted.
Infuriate left and right
You have misinterpreted Knuth's comment on the French language. He does not warn nit-pickers that French has changed since he wrote his text, but rather that the French language used in his quotations has changed since the quotations were originally put in print, and therefore are not necessarily examples of modern French usage or orthography.
As an English example, at the end of Chapter 7 of The TeXbook, he quotes
Some bookes are to bee tasted,
others to bee swallowed,
and some few to bee chewed and disgested.
--FRANCIS BACON, Essayes (1597).
He will presumably not accept "corrections" to the spelling of "bee," "bookes," "disgested," or "Essayes," because they are spelled as Bacon did originally.
Most English speakers freely acknowlege that spelling and usage have changed in the last 400 years. Probably most French nit-pickers are less realistic, but that is a whole other topic.
Moderation Totals: Troll=2, Insightful=1, Funny=2, Total=5.
Not sure if I should be proud or ashamed. Obviously the moderators aren't either. Honestly, I never even considered it would be modded as funny.
Watching the moderations of this post has been an interesting lesson in Slashdot psycology.
-Pete
Soccer Goal Plans
I just had Netscape 7 and Win2K crash after searching Amazon.com for the 3 volume set of TAOCP. Did my profane OS fall over after merely viewing the holy book's entry in an online store?
The Glass is Too Big: My Take on Things
paradesign, thanks for pointing out something that has been irritating me for years.
.ps or .ps.gz format.
.ps.gz file, it's a big hassle to install Ghostview and Ghostscript and all that. If it was provided in PDF in the first place, all this unnecessary hassle could have been avoided from the very start.
In my opinion, every single person who posts anything on the web should never post something online in *.ps.gz format ever, ever again! *.ps.gz format used to be fine a long time ago, when researchers and academics exchange papers using UNIX boxes. *.ps.gz is terribly Windows unfriendly. More than that, it's Web-unfriendly. Nothing is a bigger turn off than encountering a interesting document/paper in the archaic
Even if somebody posts a Postscript file or gzipped Postscript file online, they could at the very least put up a PDF version too. If they can produce a Postscript file and gzip it, they definitely have the technical ability to produce a PDF file from Ghostscript's ps2pdf or Acrobat.
Now before you think I'm a Windows-only person, who have no clue about using Winzip or installing Ghostscript, let me assure you that I'm not. I use Linux all the time.. but I still prefer my files in PDF. There are a few occassions when I use Windows, such as when I'm at a friend's place or the like. And when I encounter a
Of course, the typical geek excuse (or advice depending on how you see it) would be to say "you can always install Winzip, Ghostscript and Ghostview, or some Postscript viewer browser plugin." Please. Users would want their files opened right away. No hassles, no messing around with weird programs, and no installing megabytes worth of stuff just to view a single file.
And even if PDF files are provided, there are many occassions when the PDF files have jagged fonts which don't display well on Acrobat Reader.. especially if they've been produced by LaTeX. There are ways to solve that, but that's a different story.
So paradesign, thanks once again for bringing this up.
Moderators, remember that moderation is not a measure of how much you agree with somebody's post.
Sedgewick's book is good, that I haven't actually read the C++ version.
But it's the "Algorithms for Busy Programmers" version that is more geared towards showing you a few choices of algorithms to use to solve your particular problem, instead of showing you algorithms and formal proofs like tAoCP does.
So I'd keep Sedgewick's book on my desk, but when I found something I wanted to really understand, I'd read more about it in Knuth's book.
with fewer* than one byte per year of Knuth's life:
perl -e '@a=1..pop;sub a{@a?map@a=(a(@_,pop@a),@a),@a:print"@_\n";pop}a' 4
* I'm just counting the perl code, not the 'perl -e' and quotes for the shell.
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
While I have a lot of respect for Feynman, I don't believe he ever designed and built a pipe organ in his own home.
As for the pipe organ, somewhere there's a photo of Knuth sitting at home in front of it. It's not a small instrument. :-)
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Most algorithm texts I see these days have pseudocode, at most.
You remind me of a good point. Most of the people bitching about MIX are only bitching because they can't cut-and-paste the code into their favorite programs. Knuth does provide psuedocode, flowcharts, and the mathematical definitions of all the algorithms in the book! What more can you ask? The MIX code is just one more layer to ground the algorithms to real life.
If one doesn't understand MIX, then he should look at all the other resources avaialable for understanding the algorithm! If he can't understand the mathematics, then you still have the flowcharts and pseudocode. If he can't understand that, even when combined with the lengthy English discussion/explaination that goes along with it, then he shouldn't be reading the book!
Ignorance of something is different from lack of intelligence. The fact that so many slashdot readers don't know about his books is evidence of their *ignorance* of the history of computer science, NOT their intellectual level.
Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.
The pre and post conditions are no more or less trivial than the algorithm they describe.
And given the number of folk who can specify a sort and forget to state that the output must be a permutation of the input I don't think that using formal methods is as trivial as you make it out.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Probably not. Probably even senior software engineer isn't good enough.
If I want to use an algorithm from Knuth's book I will give the book to an engineer and tell him to give me C# or Java classes that implement the ones I might want to use. Privilege of rank.
Seems like you don't want to LEARN, you just want to ape the procedure and never have to exercise the gray cells.
Knuth's book are for learning IDEAS not copying code blindly.
Infuriate left and right
The pre and post conditions are no more or less trivial than the algorithm they describe.
This statement doesn't stand up to even trivial analysis. The pre- and postconditions set up the purpose of the algorithm and the assumptions behind it -- but for a nontrivial algorithm that's a far different thing than the actual meat; an algorithm with a simple purpose may easily have a complicated execution.
Yes, TeX was quite an achievement. Stable, wonderfully un-buggy, most definitely. Unfortunately, it was good enough to catch on and establish itself before its flaws could be fixed.
My particular take:
1) The Computer Modern fonts are not very pretty unless you have a very high resolution typesetter; even then, the letters look too thin and spidery for my taste. Sure, you can use other fonts. But most users never do, or use even uglier ones, usually sticking with Computer Modern for the formulas. Do any other really complete and compatible math fonts even exist?
2) The programming features were hacked on as an afterthought. The syntax is fine, perhaps even near optimal, for straight mark-up, but for developing actual algorithms, it makes Perl code look self-explanatory. Right up there with Intercal. This means that packages like LaTeX or formats are very hard to modify or extend in any kind of robust way. So everyone uses the default format, borrows one that works from somebody else, or works very hard to roll their own. Making even slight adjustments or fixes is a real nightmare.
3) The whole TeX program is terribly monolithic. Sure, the text description and commentary talks about various stages of TeX, but the code says otherwise. Knuth's optimization of the "inner loop" means there is no intermediate description of the TeX syntax.
The program itself was written in very low-level Pascal. The data structures are defined in terms of byte layouts, with explicit memory management. All sorts of tricky details insinuate through the code. Sure, it runs great even on 1980-era machines, but God help you if you want to re-implement it in a modern high-level language. As far as I know, this has only been done using web2c, which is a hack specially made to translate TeX. What's going to happen when C compilers are as rare as Pascal compilers are today?
4) Likewise for the file formats. Laid out with great care, byte-for-byte. Easy to read, but tough to translate into something higher-level.
My part-time project/dream is a modern re-implementation, where the TeX typesetting algorithms are embedded in a modern (Common Lisp) environment--so you can code TeX formats and macros in a heavy-duty honest-to-god programming language, and have an high-level, truly modular implementation using real data structures that could actually be tweaked and modified to do even funkier typesetting tasks.
Part of me says this will be easy, because something like 60% of TeX: The Program is doing stuff like memory management that Lisp will do for free, and accounting for funky character encodings (EBCDIC and 6-bit Pascal character sets) that probably can be ignored now that almost everyone is now in ASCII, or headed toward Unicode. The other part of me says this will be difficult for exactly the same reason.
1) you are absolutely right. But TeX's font support has only lately become convenient enough for most users, and relatively few fonts with TeX-compatible font metrics have become widespread.
2) TeX is hardly Lisp-like. It's more like a macro assembler language. Its internal variables are a fixed number of registers, arranged in several banks, each of one "type," some of which have special behavior; its named variables are really only replacement strings (control sequences) or aliases to built-in registers; and its control structures are heinous.
For a truly Lisp-like language, I would want variables to be unlimited, support modern data structures, true parameter passing to user subroutines, for the state of the TeX process to support full introspection, be completely customizable (e.g. be able to write new paragraph and spacing routines as part of TeX modules), and for new control and data structures to be defineable.
3) What I mean is that there is no easily accessible representation for "parsed" TeX input. The processing into the internal representation is driven by a few different "modes" of the TeX engine (e.g. vertical mode, horizontal mode, math mode) which consume tokens and produce internal data structures. I would like for users to be able to redefine or add additional modes in add-on modules, rather than having to recode TeX's internals directly.
How do you change TeX's pagination or paragraph-building? Right now it is a total hack job. You have to generate the right tokens to create the right amount of glue and space to have the built-in routines do what you want, tweaking several internal variables along the way.
Here's an example of what I had a desire to do: I was trying to typeset a table where some of the table cells would have multiple text lines. I.e. if the text was long, instead of insisting on a super-wide column, I wanted TeX to break the text into several lines, making the table cell span a larger number of text lines. Then, the other columns with short text will float to the center of this "expanded row."
I don't think you can do that easily. I ended up doing the breaks by hand. Why is that necessary? Why can't I code a table where the vertical direction is as flexible as the horizontal direction, and call the line-breaking algorithm on individual cells? Ideally, I'd like to be able to *easily* code any kind of table-generating macros I want.
4) what I mean is that there is no built-in data structure representing a DVI file. The reason is that TeX was made to run without using much core memory, so only a page or so of output is held in internal form before being squeezed out of the toothpaste tube into the DVI bucket. Once it is out of the tube, the DVI toothpaste can't be sucked up again. I would like more flexibility in processing the final DVI results, without having to write a different dvi2blah driver that has to support everything that dvi2pdf or dvips already supports.
What I want in general is a system where the entire input-tokens to output-DVI is accessible to user programs written in a high-level language, using real high-level data structures, and where the TeX control sequences can be defined in the *same* language (so that I could code a LaTeX-type package in a programmer-friendly language like Lisp, if I wanted.) I basically want the whole Common Lisp language available to do my typesetting, while maintaining the excellent aspects of TeX, including the usually-convenient mark-up syntax.
Sorry if this is not quite clear, but it is, after all, just a vague dream-like urge. Programming in Common Lisp, for me, is pretty much painless. Programming TeX/LaTeX macros is like doing my own dental work. I get the same feeling that I get when I use a 1980's-era line editor (ED) vs. using Emacs. I want TeX to be a 21st century environment, instead of a 1980's environment.
Sorry, but it's obvious you don't know much about TeX.
Do any other really complete and compatible math fonts even exist?
Yes. MathTime and Lucida Math, to name the two most complete sets. A consortium of publishers has a third set in the works. And there are numerous math implementaions that exist but don't pass muster as "complete".
Making even slight adjustments or fixes is a real nightmare
Do you know about, e.g., ConTeXt?
As far as I know, this has only been done using web2c
Wrong again. Google Kasper Peeters for a complete recode of TeX in C or C++. And NTS, a Java TeX.
Laid out with great care, byte-for-byte.
Whatever, dude.
"Never bullshit a bullshitter" All That Jazz
Yeah, this seems to have attacked the modularity issue for TeX's internal structure.
e m X}
Still, it looks like the TeX language and the implementation language (Java) are separate.
I would like a system where there are two syntaxes for the same language. One is the mark-up syntax, which would be the default syntax for writing your documents. Hopefully, compatible with or easily translated from existing TeX documents. This syntax is optimized for writing documents.
The other syntax would be oriented toward re-programming TeX. This would be optimized for describing custom layout algorithms or other processing of user input. Personally, I would want it to look like Lisp.
For simple substitution-type macros, you could use either syntax, like in TeX today with simple
\def\TeX{T\kern-.2em\lower.5ex\hbox{E}\kern-.06
when things are simple, or a more program-like form
(define-control-sequence TeX ()
#\T
(kern (lower #\E (* 0.5 ex)) (* -0.2 em))
(kern #\X (* -0.6 em)))
Then use (TeX) (or \TeX in the markup syntax) to typeset this definition. Sure, this looks more complicated--ideally, you would be able to do
(define-control-sequence TeX ()
"T\kern-.2em\lower.5ex\hbox{E}\kern-.06em X")
as well. But with the Lisp-like syntax, I know that I can use all the powerful Lisp macro and programming capabilities to do typesetting. (E.g. trig functions for typesetting at angles, TeX control sequences that take Lisp data structures as arguments) Wow!
When you look at TeX's definition for \settabs with all its \sett@b and \s@tt@b special names, or LaTeX definitions, I think being able to do things in Lisp would be a real win. And, if it really is Lisp, it won't be hard to write translators that slurp up currently existing formats (like LaTeX) and spew out the new Lispified version automatically, so I don't lose any of today's formatting abilities.
I think you are missing my point. I've just now taken a look at the ConTeXt information, and I am quite impressed at the results. But it seems to me that ConTeXt is more analogous to LaTeX+pdfTeX than raw TeX.
My concerns are more with the internals of TeX; namely, how can it be made easier for people to develop things like LaTeX or pdfTeX in a way that can be easily customized.
Like many people, my experience in TeX was writing my thesis in LaTeX. But when I wanted to adjust the format that I borrowed from a friend, either to cope with changed thesis format requirements, or to typeset a certain kind of program documentation, I found the process of understanding and modifying LaTeX macros to require true wizardry, without the rewards that wizardry brings in modern, powerful programming languages.
OK, so TeX has been recoded by a few people in a few languages. But from what I've seen, these people haven't really attacked the complexity and inflexibility of things like LaTeX's implementation macros. I.e., they maintain the same distinction between TeX internals and the TeX language.
Why is it that some people have to write packages like LaTeX, other people have to write enhanced versions of the TeX processor, and yet other people have to write DVI drivers, each using different languages? I find that the modularity of TeX is too coarse---format files, TeX processor, DVI processor, none of which really communicate with each other, or can be programmed together.
I think the possibility of doing typesetting in a programmatic way (like TeX) is very powerful, especially when coupled with aesthetically good algorithms (like TeX). But I want to have an implementation where the *programmability* is given higher priority, and the programmer is given access to the entire typesetting flow in a modern way.
My point about the DVI file formats is that it is really a machine language for a generic typesetter. It has lost much of the sense of the document. Sure, by liberal use of specials, you can include intelligence in your documents, but that requires you to have a new implementation of the TeX processor, and a new DVI processor, possibly integrated into the TeX processor. What if you could produce the same effect simply by changing or defining a new format file?