Could LaTeX Replace HTML?
Acheon asks: "I recently learned to use LaTeX recently and I wondered why it couldn't be turned into the next standard for online documents. After all, most features of LaTeX make it either easier or more powerful than HTML, such as pagination (pure HTML 4.0 is a nightmare to code by hand) and scientific notation. It is much more suited for scripting, much more standard and readable, as well as more versatile. Also, HTML to LaTeX transcription is already feasible, so the only big feature missing for LaTeX to be supported in browsers would be linking, perhaps object embedding. On the other hand I don't know of any project going into that direction, what is most of a surprise to me given the huge interest for LaTeX and the omnipresence of such documents in many areas."
That is incorrect. You are confusing TeX (which is, to a large extent, a type-setting programming language), and LaTeX which is a document description language implemented on top of TeX. In LaTeX I only describe the logical structure of my content, and leave the presentation details to so-called "document classes" ("article", "thesis", "book", "slides", etc) which take care of margins, font sizes and weights, and other trivialities.
Actually... that's also how I write my HTML. A good mix of HTML+CSS2 is remarkably like using LaTeX. And as you point out, once we move to something as general as XSLT (which allows for pretty much arbitrary transformation of XML for presentation purposes), at least the spirit of the LaTeX experience, if not the syntax, will carry over onto the WWW.
--
On the contrary, PDF has solved a great many of the problems that used to plague portable document publishing. It provides precise control over layout and display of documents. It relies on PostScript, so the quality of its displayed and printed output is potentially very high. Need I mention its portable. The only two problems with PDF, both minor, are its speed (PDFs do take awhile to render, at least on my system) and the sort of fascist content/copyright-control mechanisms embedded into the format. The latter really isn't even a weakness, however; it's more me editorializing. Also support for hypertext could be better. Most people find PDF an attractive and secure means for distributing their documents. I'm interested to know what makes you hate it.
I think there is a world market for maybe five personal web logs.
Have you used LaTeX before? Your description of it defines exactly what LaTeX is not.
For those who don't know, LaTeX lets you define structured documents, and then you apply a style across the entire document which then defines how it appears. Yes, this technology has been around for a lot longer than HTML, CSS, blah blah blah.
For example, by writing a document with sections, chapters, references, etc etc, depending on what style I choose, I might end up with a table of contents, or not, with an index, or not, with footnotes, or not, etc etc...
(For the record, you are perhaps confusing LaTeX with TeX...)
--
Ian Peters
This seems to me like it would be going the wrong way. With latex, you are still worried about laying out your document, I was under the impression that the next big thing was to worry about describing your document, and using a translator (ala xslt...)
of course, I could be wrong...
/ZL
Using TeX for typesetting is a pretty good idea, printable media can be easily broken down into sets of mathematical relations. Plain white paper has specific metrics and so does glossy paper and so on. Digital media however is subject to a fuckload of different potentialities. This is one of the reasons HTML came about in the first place. It originally didn't give a shit about the display of the information it just provided the information for display which is translated and rendered by the browser. This is exactly why HTML can be viewed with both Lynx and Netscape if you don't bother with styling shit. Markup languages are really good for this purpose; they are designed to convey information and let something else decide how its going to look. LaTeX and PostScript and the like take the opposite appraoch and relate the information to display elements. This is a shitty document model and a pretty intensive way to display information. Besides the fact you'd lock content and style into the same code you end up losing all of the functionality of the Hyper- prefix. Not only does HTML leave the displaying of content up to an external element it VERY easily connects bits of information. People don't like typing/copying URLs often. Is is much more time efficent to type out the URL's one time (at page creation) then it is to type them out every single fucking time the page is accessed.
I'm a loner Dottie, a Rebel.
No.
And that's not what the person was talking about: ... I wondered why it couldn't be turned into the next standard for online documents
To me, this is looking for a replacement for .PDF files, text files, and technical papers. I have no doubt that my favorite browser will support this easily, but the plug in system for most browsers is seriously broken, and adding mime types to them is a major effort. Due to this, new file formats like fractally compressed images (yes, I know the licensing problems hurt them as well) can't "break in", and the "Browser" has become painfully a http & html only program, plus a few things like http & text and http & image (Plus Flash, PDF, and bad Java implementations). New protocols aren't easily added without the upgrade of the monolithic browser, and old ones (like ftp://, gopher://, telnet:// or esoteric ones like tv://) are not supported well, or at all.
So, yes: LaTeX with its general descriptive tags would make an excellent markup language for papers, and no, it's not likely to be adopted.
--
Evan
"$30 for the One True Ring. $10 each additional ring!" -- JRR "Bob" Tolkien
(pure HTML 4.0 is a nightmare to code by hand)
what planet are you from? planet FrontPage?
You mean like using XSL:FO to generate PDF?
XML with stylesheets gives you much of what you get from LaTeX: ability to define new elements, like, say, 'abstract', and how they should be presented. Defining the structure of the document and indexing are possible too.
Coding HTML 4 by hand is hard; XHTML is worse. That's in the past for the web. Use a decent editor - even if it only keeps the tags matched for you. Autogenerate it from LaTeX if you like.
Page layout is tricky: well yes, that's not what HTML is designed for. For now, use CSS. In the future XSL:FO. Mathematics: simple, use MathML.
My recommendation would be you go find a decent introductory book on XML technologies and read up about them. Or take a good look at the W3C site. There's a whole wealth of stuff out there.
<confession>Okay, I don't actually know LaTeX, so there's a very good chance I'm about to look stupid...</confession>
...but isn't LaTeX designed for print? Fixed page size, for visual reading only. Does it address scalability for different display sizes, from 21" monitors to PDAs? Does it support semantic tagging for accessibility support? If not, it can't and shouldn't replace xhtml.
Might be nice to replace PDF as a web-distributable print-quality format, though. I hate PDFs.
By reading back comments I understand most people don't understand what would be the real point of replacing HTML by Latex. Here I explain : By "Pure HTML 4.0", I mean being up to 4.0 standards. If you read up HTML 4.0 documentation carefully, you will undoubtely notice 99% of tags and attributes we used to code in version 3.x are marked "deprecated". Pure 4.0 structure is very heavy to handle as there are a lot of divisions and styles to define even before typing a single character. In comparison, all classes are already defined with Latex and coding style looks much more like HTML 3.x style, even simpler. Also, as many pointed out, Latex is mostly device-independent, unlike HTML which looks very different from browser to browser due to the way it is interpreted. There is no way to publish a professional document online using HTML whenever you need some strict pagination parameters because the browser systematically alter them. Unlike many seem to believe, Latex parsing is very fast, in fact mostly as fast as HTML since it is compiled. All browsers interpret HTML directly to screen and have to redraw content several times. Although it is supposed to give an impression of speed, it is rather annoying. To those believing Latex is hard to learn, I answer all tags used in HTML have an *simpler* equivalent in Latex. Also, many features of Latex are just automatic, such as text justification, or are handled through standard classes one just have to load at the beginning of the document. Finally, since HTML to Latex conversion is already commonplace through a simple utility, it would not break compatibility. People could still code in HTML if they want. But Latex browsers would undoubtely have an extra. Ultimately, it would turn out being as attractive to support Latex than Shockwave. In my humble opinion, shortly HTML standards will run out of control and become a language much less convenient that it is now simply because it was not created to handle whatever tasks engineers of the web intend to make it run (what explains the drastic changes in 4.0 standards). At least Latex is a most solid and clean architecture, portable, simple to learn and to port to and from, and so on.
I have a fondness for TeX: its hyphenation and justification are still top notch. I'd like to see its algorithms incorporated into the back end of a CSS or FO processor. Here's how I'd relate some of the current markup languages:
Obviously, the lines blur. For example, the above diagram was written in Slashdot's subset of HTML, so it contains lots of and <br>--very presentational. Also, the diagram doesn't explicitly mention screen-based or interactive output media, which are very important.H1 -- \chapter /I - emphtext /B - textbftest
H2 -- \secion
etc
and list type thingies:
LI - \item
UL - \begin{enumerate}
and everyones favories highlights:
I
B
etc... etc...
In most cases the differences are trivial, seeming to indicate that, at first approximation, translating between these two systems should not prove too difficult.
There are translation programs that use these similarities, but in order to exploit the richness of the LaTeX language as compared to HTML (especially , which has no support for tables or mathematics), an ad hoc approach has to be adopted. To handle correctly LaTeX commands that have no equivalent in HTML , such elements can either be transformed into bitmap or pictures (an approach taken by LaTeX2HTML ), or the user can specify how the given element should be handled in the target language.
Accordingly, the migration would be simple if some steps were taken to convert those tables and equations... but good luck getting people to switch until the big gorillas get behind it
First, I'm beginning to get into TeX. I don't know what LaTeX offers that plain old TeX doesn't, or vice versa, but I feel I can discuss this with a bit of intelligence.
One of the biggest barriers to web-standardom is the complexity of the formatting language. I know HTML (I'm quite proficient in it--who isn't?), and I'm beginning to see TeX. TeX has more commands, and many of them are far less intuitive than HTML tags (<bold> is pretty obvious). Most webmasters, particularly busy commercial ones, won't want to take the time to learn TeX. Therefore, while it may be a published standard, it will never be the de facto standard.
The second problem is target applications. TeX is a formatting system. It gives the user fine-grained control of textual layout and appearance. HTML is a classification system. It gives the user the ability to group text according to form and function.
HTML assumes (correctly) that the user knows nothing about how documents appear at the viewing end. What looks excellent on letter-sized paper, for example, looks terrible on A4-sized paper--words run off the page, margins are too small, lines are pressed together. All HTML does is tell the viewer how text should be classified--letting the viewer decide how to display those words. After all, nobody better understands how this information is being viewed than the viewer himself (itself).
TeX, on the other hand, assumes (correctly) that the user knows exactly how documents appear at the viewing end. If you know that you are printing to letter-sized paper, it is very easy to tune the placement and appearance of text on your page to produce an optimal layout--one that is aesthetically and functionally pleasing. The problem with the World Wide Web is that we aren't all viewing things on letter-sized paper. My Netscape window dimensions are 845x960 pixels; I can't believe anybody else has exactly that size window. Even if they did, it is unlikely that their window widgets (borders, titles) are the same, so the viewing area is different. I can make things look great in my window, but in anybody else's window, the same document would not look optimal.
This is precisely why TeX will never make it as a web standard. Nobody likes to scroll in a weird fashion to read documents, or have small text which can't be enlarged (or which screws up formatting if enlarged). TeX is only good when the document producer controls how the viewers are presented the data. And that is impossible on the World Wide Web.
I do not belong in the spam.redirect.de domain.
Unfortunately, because of it's TeX heritage, the way LaTeX describes styles and macros is pretty clunky. Underneath the covers, it's more like a machine language, with numbered registers, side-effects, and odd processing hooks. XML/XSL is probably a better choice: it's more formally defined than LaTeX. The XSL transformation model is easier to understand and more predictable to most people.
The biggest problem with XML in my view is cosmetic: it's a pain to type. XSL is somewhat more limited than LaTeX when it comes to specifying physical page layout in a device-independent way, but those limitations probably can be overcome in the long run, and they don't matter that much on the web: for physical layout on the web, XSL has most of what you need.
I think XML and LaTeX are slowly growing together anyway. LaTeX 3 will probably have some built-in XML support. There are already several packages that can go from XML to LaTeX and from LaTeX to XML. In the long run, we may see that LaTeX will become an alternative input syntax for XML and that TeX/LaTeX will be used more and more for producing actual printed representations of XML documents.
You can find lots of related links here.
Eight years of insane growth has pushed HTML into what can only be called an "interface language." Websites aren't documents anymore. They are forms, banners, toolbars, indexes, and all sorts of non-HTML stuff taped together to create an "information interface." That doesn't map well to the LaTeX as it is. LaTeX is overkill for somethings (pagination, text flow layout) and is completely missing other things (forms).
I like LaTeX, but it won't work for websites.
Uh, TeX isn't going to replace HTML and XML as a web standard. Ever. Apart from math and certain other scientific notation, it is not easier to work with or more readable than SGML-based languages. Nor is it in any meaningful way "more scriptable". Nor does it have a decent object model. Nor, now that we're finally moving into XML, is it especially "more" extensible. CSS and XSL stylesheets are more elegant than TeX macros. TeX isn't paticularly display-independent, seeing as it's designed for typesetting. Many of its core commands are for precise layout, not semantic markup.
For another thing, most web pages are at least in part machine generated these days. Between imports from WYSIWYG text editors, templating systems with simplified HTML input, web publishing platforms, databases and so forth, the winning language is the one that programmers can write generators for more easily. HTML wins here, and XML pretty much wraps it up, with nice high-level APIs for generating them from every programming language from RPG and LotusScript to Perl, VB, C++ and Java. As for generating TeX, I think there are some Perl classes and maybe if you rip through the code for LyX you could patch something together for C.
I daresay, Microsoft's XML representations of Word documents have a better shot at supplanting HTML than TeX does, and that's not exactly likely.
Next, as for viewing TeX in a web browser: ou already can, at least on certain platforms. IBM has a plugin for Win32 (ant least) caled TechXplorer or some such. It's been around for years. It renders TeX just fine for the several hundred scientists and mathematicians who want to do such things. If you're curious, sniff around their Alphaworks site.
Good grief.
I was about to suggest that LaTeX is much harder to parse, but given the length of time it takes to display some of the web pages out there, I don't know.
Parsing LaTeX is easier than parsing HTML; LaTeX keywords are escaped with a backslash, and keywords end with a space, or with bracket pairs to define parameters. HTML requires parsing of bra/ket (less than/greater than) pairs, and the attribute values within the first pair. Not much more difficult, but harder than parsing LaTeX.
LaTeX output looks orders of magnitude better than HTML output. It's designed to be rendered on very high DPI printed page output - the algorithm takes a lot of time with kerning, line breaking, placing of floats/diagrams, and the like. By comparison, HTML just spews text on the page. For web pages, this is a perfectly functional alternative, but make no mistake - (La)TeX does a lot more than HTML ever does.
There's the added bonus that TeX works. For years, Donald Knuth was offering monetary rewards for bugs. He recently declared that he didn't think there were any more bugs in TeX and was going to halt development to maintain compatibility.
`Monetary reward' makes it sound like it's a lot more than it really is - you get a cheque for $2.56 if you find a bug in his textbooks, or $3.14 if you find a bug in TeX. It's mostly a kudos factor of having a cheque from the man.
The last reported bug in TeX was about 10 years ago (IIRC), and Knuth has declared that at the time of his death, any bug still remaining will be declared an official feature.
Russ %-)
... and never, ever play leapfrog with a unicorn.
See the HyperTeX FAQ for details.
I like LaTeX's ability to separate semantic structure from layout logic, but any language that will allow style sheets can do the same thing, including HTML4. Also, TeX has a Turing-complete macro language, which I tend to dislike in a document description language. So while I like some aspects of this idea, I can't altogether support it.
--
Some keywords for the NSA in the Lord of the Rings universe: One Ring bind find Sauron quest Nazgul freedom
Math ML
But I still think it is a pity that people insist HTML should be a powerful page layout tool. TeX is pretty anal in its requirement that page layout be predictable, which is cool for paper documents but way uncool for web pages. When I resize my web browser to show a long, narrow page, I'd appreciate it if the text flowed in such a way that the page would still be legible. This is already broken in a lot of web pages that insist on specifying table widths in pixels or using images to enforce a certain size, but TeX is even more inflexible in allowing the user to determine what his screen should look like.
Putting the user in control was one of the advantages of HTML in the old days. These days, one is glad if windows full of ads aren't popping up left, right and center, and obviously there must be someone around who thinks that is somehow a good idea...
Bert Driehuis -- All I asked was a friggin' rotatin' chair. Throw me a bone here, people.
The TechExplorer mentioned has kept up with the times. The plug-in browses TeX, LaTeX, and MathML documents in Netscape and IE. Yet I seriously doubt that any of these three will triumph as the final answer... There is little overlap... or should I say mathematically, "LCD(LaTeX,MathML) << Need". ;-)
Many LaTeX conventions are great for typing up formula descriptions conversationally. Netscape 6 does a bit of (optional) automatic conversion, like smilies, carets-to-superscripts and underscores-to-subscripts, and this is but a step toward what what is needed in places like sci.math and the web.
LaTeX PROs
- A few ASCII keystrokes can compose well-balanced formulae.
- A variety of fonts conventional to math are readily deployed.
- Formulae can be expressed inline with text or in their full glory 'equation mode.'
LaTeX CONsMathML PROs
- It adds a LOT of missing pieces to HTML that are needed in Math.
- It provides some very abstracted content that could be cut-and-pasted into powerful (XML-based?) applications.
MathML CONsIdeally we should get be able to start with a lightweight comprimise, but extensible by fonts and stylesheets that are readable to all clients/browsers. Neither format offers this at present. Hopefully, programmers will turn to cultures like sci.math to see how they converse, and gleen the best of latex AND HTML.
The problem with latex on the web is that there already is a nice platform for those that wish to completely control page layout: Acrobat. And it has the advantage of playing nicely with mainstream word processors.
This is a rather naive question- have you used LaTeX at all? I say this as a dedicated LaTeX user: LaTeX just isn't suited for web applications for a huge number of reasons.
First of all, you say HTML is a nightmare to code in. Perhaps if you are trying to go all the way with CSS, sophisticated visual layout, and so fourth, but I can knock out a simple, standards-compliant web page in 15-20 minutes. Not a pretty one, but a functional one. I can do that with LaTeX, but only with a library of templates which I have built up over the years. You just can't do LaTeX quick-and-dirty. It's not designed for it.
Second, there is the issue of visual formatting. LaTeX and HTML both, in theory, are based on the principle of content-based markup- you specify the data in content terms, and the browser/LaTeX engine determines how best to format it for display. Anyone who has ever used either of these languages knows that this is a total lie, especially for HTML. All professional HTML work centers on various hacks to achieve direct visual formatting of the page, something which HTML is fortunately quite amentable to. LaTeX, on the other hand, is a huge pain in the ass if you're trying to control the look and layout of a document- the LaTeX engine knows what's best , and it's sure as hell not going to take advice from you! You can do visual formatting the proper way, by redefining commands and LaTeX variables to get LaTeX to understand the visual format you are looking for. However, this is an enormous time outlay, and is completely impractical for anything less than, say, a book.
More fundamentally, LaTeX and HTML, although they were originally concieved for similar purposes (content markup for visual display of academic papers), have evolved in radically different directions. While LaTeX has stuck pretty close to that original intent, HTML has become almost a GUI specification language, with all kinds of capabilities which LaTeX simply doesn't have. The proof is in the pudding: Show me a LaTeX version of the Amazon page. Or the Slashdot main page. Even ignoring the issues like linking that you mention, it is for all practical purposes impossible. It would require literally weeks of dedicated LaTeX hacking, and the result would be a horrific kludge. LaTeX is, and is likely to remain, a language for typesetting documents for the purpose of conventional, dead-tree publication. Any other application of it would be a gross violation of a fundamental principle of hacking: the right tools for the right job.
In short, LaTeX and HTML have only their theoretical conception in common. For all practical intents and purposes they are so vastly different that using LaTeX as a general web language is inconceiveable. There is, however, a new language emerging which promises to clean up the blurred boundaries of content and visual formatting, and get rid of the most flagrant horrors of HTML. If you want to see an HTML alternative, go look into XML.
"Never let your sense of morals prevent you from doing what is right" -Salvor Hardin
For example, I have a LaTeX macro which will quote and cite from a source in the margin of my document. The Web has no concept of a margin. Sure, I could make Netscape 4.76 lay out a web page as if it were a technical paper, but why should I have to "flip pages" on the Web? And what if I want to read this super LaTeX-enabled web page in lynx? on my Visor? on my cell phone? with a screen reader?
Sure, you can simulate a lot of physical markup items with style sheets, but that's not the point. The point is, HTML is designed to embellish text with simple, logical markup; one of HTML's greatest strengths is that it can be rendered faithfully by a variety of different tools with myriad differences in capability. LaTeX, OTOH, is designed to target one medium: a DVI file which is tied to a particular page size. So you have some logical markup, but in general a lot of the "logic" is tied to physical realities of the page. (how many times have you typed \vspace{1.0cm}, for -- albeit a trivial -- example?)
In addition, LaTeX doesn't lend itself to interpreting -- the more powerful features, like indexing, citations, and TOCs all require multiple passes. Add to this that it's a LOT harder to parse and (to be honest) to write than semi-valid HTML, and it's just not a viable standard. The final nail is inertia. The web is based on HTML, and it has for a long time. People are OK with extending HTML in bizarre ways to give them an approximation of TeX-like control over their document's appearance, so there's no room for a better, cleaner language. :-)
~wog
LaTeX (based on TeX) is a fine typographic markup language. That is, it is specifically designed for describing pages of text in a elegant fashion.
SGML is a markup language designed to describe a document's contents, not layout. The layout of an SGML document is determined by a stylesheet.
HTML, was based upon SGML because the idea behind HTML was not to design a page description language, but a document description language. A language that describes the elements of a document and not how they are to be displayed on the screen or be printed. Unfortunately thanks to the commercial interests of Netscape and Microsoft, it failed to seperate layout and content.
XML is an attempt to simplify SGML, eliminating the more esoteric features. XML documents do not describe layout, but rely upon Stylesheets to determine how a page is layed out. This proves superior to LaTeX because a seperation between content and layout can be made.
The idea is, you can mark up data with XML, and then using a stylesheet, change how it is presented to the user. Even more impressive, the content's presentation (or stylesheet) can be modified dynamically through scripting.
XHTML is HTML represented in terms of HTML, it is the future, and as time progresses (we can hope) that XML and Stylesheets will eventually replace HTML.
LaTeX is not the answer for HTML. The goals of LaTeX is for the final presentation to be printed pages. LaTeX does a splendid job of that. The goal of XML is data-description. Add stylesheets and you have the means to present content in many ways.
XML is the replacement for HTML. XHTML is the gateway from HTML to XML.
Adding the new feature should take "only a few more weeks" according to them team, although there were suggestions that LaTeX support would also be added to the mail client, futher delaying the browsers release. Another programmer noted that "we might also want to make this LaTeX thing skinable".
Users waiting for Mozilla to release seemed suprisingly unsurprised by the announcement, although one slashdot reader was heard to say "it's a pity - i might have even used mozilla if IE crashed."
Drag n' Drop DVD Recommendations