Could LaTeX Replace HTML?
Acheon asks: "I recently learned to use LaTeX recently and I wondered why it couldn't be turned into the next standard for online documents. After all, most features of LaTeX make it either easier or more powerful than HTML, such as pagination (pure HTML 4.0 is a nightmare to code by hand) and scientific notation. It is much more suited for scripting, much more standard and readable, as well as more versatile. Also, HTML to LaTeX transcription is already feasible, so the only big feature missing for LaTeX to be supported in browsers would be linking, perhaps object embedding. On the other hand I don't know of any project going into that direction, what is most of a surprise to me given the huge interest for LaTeX and the omnipresence of such documents in many areas."
Eight years of insane growth has pushed HTML into what can only be called an "interface language." Websites aren't documents anymore. They are forms, banners, toolbars, indexes, and all sorts of non-HTML stuff taped together to create an "information interface." That doesn't map well to the LaTeX as it is. LaTeX is overkill for somethings (pagination, text flow layout) and is completely missing other things (forms).
I like LaTeX, but it won't work for websites.
Uh, TeX isn't going to replace HTML and XML as a web standard. Ever. Apart from math and certain other scientific notation, it is not easier to work with or more readable than SGML-based languages. Nor is it in any meaningful way "more scriptable". Nor does it have a decent object model. Nor, now that we're finally moving into XML, is it especially "more" extensible. CSS and XSL stylesheets are more elegant than TeX macros. TeX isn't paticularly display-independent, seeing as it's designed for typesetting. Many of its core commands are for precise layout, not semantic markup.
For another thing, most web pages are at least in part machine generated these days. Between imports from WYSIWYG text editors, templating systems with simplified HTML input, web publishing platforms, databases and so forth, the winning language is the one that programmers can write generators for more easily. HTML wins here, and XML pretty much wraps it up, with nice high-level APIs for generating them from every programming language from RPG and LotusScript to Perl, VB, C++ and Java. As for generating TeX, I think there are some Perl classes and maybe if you rip through the code for LyX you could patch something together for C.
I daresay, Microsoft's XML representations of Word documents have a better shot at supplanting HTML than TeX does, and that's not exactly likely.
Next, as for viewing TeX in a web browser: ou already can, at least on certain platforms. IBM has a plugin for Win32 (ant least) caled TechXplorer or some such. It's been around for years. It renders TeX just fine for the several hundred scientists and mathematicians who want to do such things. If you're curious, sniff around their Alphaworks site.
Good grief.
I was about to suggest that LaTeX is much harder to parse, but given the length of time it takes to display some of the web pages out there, I don't know.
Parsing LaTeX is easier than parsing HTML; LaTeX keywords are escaped with a backslash, and keywords end with a space, or with bracket pairs to define parameters. HTML requires parsing of bra/ket (less than/greater than) pairs, and the attribute values within the first pair. Not much more difficult, but harder than parsing LaTeX.
LaTeX output looks orders of magnitude better than HTML output. It's designed to be rendered on very high DPI printed page output - the algorithm takes a lot of time with kerning, line breaking, placing of floats/diagrams, and the like. By comparison, HTML just spews text on the page. For web pages, this is a perfectly functional alternative, but make no mistake - (La)TeX does a lot more than HTML ever does.
There's the added bonus that TeX works. For years, Donald Knuth was offering monetary rewards for bugs. He recently declared that he didn't think there were any more bugs in TeX and was going to halt development to maintain compatibility.
`Monetary reward' makes it sound like it's a lot more than it really is - you get a cheque for $2.56 if you find a bug in his textbooks, or $3.14 if you find a bug in TeX. It's mostly a kudos factor of having a cheque from the man.
The last reported bug in TeX was about 10 years ago (IIRC), and Knuth has declared that at the time of his death, any bug still remaining will be declared an official feature.
Russ %-)
... and never, ever play leapfrog with a unicorn.
See the HyperTeX FAQ for details.
I like LaTeX's ability to separate semantic structure from layout logic, but any language that will allow style sheets can do the same thing, including HTML4. Also, TeX has a Turing-complete macro language, which I tend to dislike in a document description language. So while I like some aspects of this idea, I can't altogether support it.
--
Some keywords for the NSA in the Lord of the Rings universe: One Ring bind find Sauron quest Nazgul freedom
Math ML
But I still think it is a pity that people insist HTML should be a powerful page layout tool. TeX is pretty anal in its requirement that page layout be predictable, which is cool for paper documents but way uncool for web pages. When I resize my web browser to show a long, narrow page, I'd appreciate it if the text flowed in such a way that the page would still be legible. This is already broken in a lot of web pages that insist on specifying table widths in pixels or using images to enforce a certain size, but TeX is even more inflexible in allowing the user to determine what his screen should look like.
Putting the user in control was one of the advantages of HTML in the old days. These days, one is glad if windows full of ads aren't popping up left, right and center, and obviously there must be someone around who thinks that is somehow a good idea...
Bert Driehuis -- All I asked was a friggin' rotatin' chair. Throw me a bone here, people.
The TechExplorer mentioned has kept up with the times. The plug-in browses TeX, LaTeX, and MathML documents in Netscape and IE. Yet I seriously doubt that any of these three will triumph as the final answer... There is little overlap... or should I say mathematically, "LCD(LaTeX,MathML) << Need". ;-)
Many LaTeX conventions are great for typing up formula descriptions conversationally. Netscape 6 does a bit of (optional) automatic conversion, like smilies, carets-to-superscripts and underscores-to-subscripts, and this is but a step toward what what is needed in places like sci.math and the web.
LaTeX PROs
- A few ASCII keystrokes can compose well-balanced formulae.
- A variety of fonts conventional to math are readily deployed.
- Formulae can be expressed inline with text or in their full glory 'equation mode.'
LaTeX CONsMathML PROs
- It adds a LOT of missing pieces to HTML that are needed in Math.
- It provides some very abstracted content that could be cut-and-pasted into powerful (XML-based?) applications.
MathML CONsIdeally we should get be able to start with a lightweight comprimise, but extensible by fonts and stylesheets that are readable to all clients/browsers. Neither format offers this at present. Hopefully, programmers will turn to cultures like sci.math to see how they converse, and gleen the best of latex AND HTML.
The problem with latex on the web is that there already is a nice platform for those that wish to completely control page layout: Acrobat. And it has the advantage of playing nicely with mainstream word processors.
This is a rather naive question- have you used LaTeX at all? I say this as a dedicated LaTeX user: LaTeX just isn't suited for web applications for a huge number of reasons.
First of all, you say HTML is a nightmare to code in. Perhaps if you are trying to go all the way with CSS, sophisticated visual layout, and so fourth, but I can knock out a simple, standards-compliant web page in 15-20 minutes. Not a pretty one, but a functional one. I can do that with LaTeX, but only with a library of templates which I have built up over the years. You just can't do LaTeX quick-and-dirty. It's not designed for it.
Second, there is the issue of visual formatting. LaTeX and HTML both, in theory, are based on the principle of content-based markup- you specify the data in content terms, and the browser/LaTeX engine determines how best to format it for display. Anyone who has ever used either of these languages knows that this is a total lie, especially for HTML. All professional HTML work centers on various hacks to achieve direct visual formatting of the page, something which HTML is fortunately quite amentable to. LaTeX, on the other hand, is a huge pain in the ass if you're trying to control the look and layout of a document- the LaTeX engine knows what's best , and it's sure as hell not going to take advice from you! You can do visual formatting the proper way, by redefining commands and LaTeX variables to get LaTeX to understand the visual format you are looking for. However, this is an enormous time outlay, and is completely impractical for anything less than, say, a book.
More fundamentally, LaTeX and HTML, although they were originally concieved for similar purposes (content markup for visual display of academic papers), have evolved in radically different directions. While LaTeX has stuck pretty close to that original intent, HTML has become almost a GUI specification language, with all kinds of capabilities which LaTeX simply doesn't have. The proof is in the pudding: Show me a LaTeX version of the Amazon page. Or the Slashdot main page. Even ignoring the issues like linking that you mention, it is for all practical purposes impossible. It would require literally weeks of dedicated LaTeX hacking, and the result would be a horrific kludge. LaTeX is, and is likely to remain, a language for typesetting documents for the purpose of conventional, dead-tree publication. Any other application of it would be a gross violation of a fundamental principle of hacking: the right tools for the right job.
In short, LaTeX and HTML have only their theoretical conception in common. For all practical intents and purposes they are so vastly different that using LaTeX as a general web language is inconceiveable. There is, however, a new language emerging which promises to clean up the blurred boundaries of content and visual formatting, and get rid of the most flagrant horrors of HTML. If you want to see an HTML alternative, go look into XML.
"Never let your sense of morals prevent you from doing what is right" -Salvor Hardin
For example, I have a LaTeX macro which will quote and cite from a source in the margin of my document. The Web has no concept of a margin. Sure, I could make Netscape 4.76 lay out a web page as if it were a technical paper, but why should I have to "flip pages" on the Web? And what if I want to read this super LaTeX-enabled web page in lynx? on my Visor? on my cell phone? with a screen reader?
Sure, you can simulate a lot of physical markup items with style sheets, but that's not the point. The point is, HTML is designed to embellish text with simple, logical markup; one of HTML's greatest strengths is that it can be rendered faithfully by a variety of different tools with myriad differences in capability. LaTeX, OTOH, is designed to target one medium: a DVI file which is tied to a particular page size. So you have some logical markup, but in general a lot of the "logic" is tied to physical realities of the page. (how many times have you typed \vspace{1.0cm}, for -- albeit a trivial -- example?)
In addition, LaTeX doesn't lend itself to interpreting -- the more powerful features, like indexing, citations, and TOCs all require multiple passes. Add to this that it's a LOT harder to parse and (to be honest) to write than semi-valid HTML, and it's just not a viable standard. The final nail is inertia. The web is based on HTML, and it has for a long time. People are OK with extending HTML in bizarre ways to give them an approximation of TeX-like control over their document's appearance, so there's no room for a better, cleaner language. :-)
~wog
LaTeX (based on TeX) is a fine typographic markup language. That is, it is specifically designed for describing pages of text in a elegant fashion.
SGML is a markup language designed to describe a document's contents, not layout. The layout of an SGML document is determined by a stylesheet.
HTML, was based upon SGML because the idea behind HTML was not to design a page description language, but a document description language. A language that describes the elements of a document and not how they are to be displayed on the screen or be printed. Unfortunately thanks to the commercial interests of Netscape and Microsoft, it failed to seperate layout and content.
XML is an attempt to simplify SGML, eliminating the more esoteric features. XML documents do not describe layout, but rely upon Stylesheets to determine how a page is layed out. This proves superior to LaTeX because a seperation between content and layout can be made.
The idea is, you can mark up data with XML, and then using a stylesheet, change how it is presented to the user. Even more impressive, the content's presentation (or stylesheet) can be modified dynamically through scripting.
XHTML is HTML represented in terms of HTML, it is the future, and as time progresses (we can hope) that XML and Stylesheets will eventually replace HTML.
LaTeX is not the answer for HTML. The goals of LaTeX is for the final presentation to be printed pages. LaTeX does a splendid job of that. The goal of XML is data-description. Add stylesheets and you have the means to present content in many ways.
XML is the replacement for HTML. XHTML is the gateway from HTML to XML.
Adding the new feature should take "only a few more weeks" according to them team, although there were suggestions that LaTeX support would also be added to the mail client, futher delaying the browsers release. Another programmer noted that "we might also want to make this LaTeX thing skinable".
Users waiting for Mozilla to release seemed suprisingly unsurprised by the announcement, although one slashdot reader was heard to say "it's a pity - i might have even used mozilla if IE crashed."
Drag n' Drop DVD Recommendations