GNU TeXmacs and Structured Text Editing
Joris van der Hoeven writes "It is a common belief that structured texts are best conceived using ASCII-based text editors like Emacs or VI. It is true that word processors like MS-Word have done a bad job on this issue. But does this mean that wysiwyg structured text editing would be impossible? We firmly believe the contrary and
argue
that such editors are both technically conceivable and desirable. Judge for yourself by taking a look at the GNU TeXmacs
program, whose version 1.0 has just been released."
How does it compare with Lyx?
It suffers from the same ailment (although far less so) as HTML; The layout is intimately linked with the content. As long as font size information, background color, text alignment, etc. are part of the document and mixed with section, paragraph, bibliography, etc. there will be trouble.
This is precisely why things like DocBook came into being. It contains absolutely no layout information. It is all about structured content. Layout is handled later by a separate processor. This does not necessarily mean that input must be so sterile. In fact, I believe that a WYSIWYG DocBook editor would be a godsend in providing a "way out" for all of those Word authors but still applying content structure. Perhaps a variation on the Mozilla Composer concept?
And for those who will let go of their emacs when you pry it from their cold, dead fingers, there is at least one XML editor that takes DTDs for input to aid in tag creation and allows for hooks into XSLT processors for "pretty" previews. It called XAE (XML Authoring Environment for Emacs).
- I don't need to go outside, my CRT tan'll do me just fine.
But TeX enthusiasts seem to be stuck on the idea that Tex is also useful for structured documents. Sorry, it just isn't. If you want to impose structure on a document, you can't use a format designed around layout. Even if you add constructs that describe document structure (as LaTeX does) you can't prevent the user from using non-structure elements "because it looks right". So you end up with a convoluted mixture of structure and layout that's impossible to maintain. That's why HTML is such a mess. That's why maintaining large technical documents with traditional word processors is a nightmare.
If you need to maintain a large structured document, you need to use a format that makes no attempt at all to describe layout. So the writer is forced to think purely in terms of how the document is organized. You keep layout description in a separate thing, a "style sheet". Not only does that end your document maintainence nightmare, but it allows you to deliver the same document in different ways just by providing the appropriate style sheet. You have a single source that's accessible as a set of web page, or as a printed document, or whatever.
What formats am I talking about? Since this is 2002, I'm talking about XML. Not XML in general (most XML apps are data-centric not doc-centric) but specific appropriate XML applications, such as DocBook or DITA. For the stylesheets there's Cascading Style Sheets and/or XSL. But these are just the best technologies that happen to available now. The basic idea has been around for a long time: in structured documents you have to separate markup and layout.
I found LyX and excellent way to start using LaTeX.
I'd be interested in people's comments who have used both.
If all this should have a reason, we would be the last to know.
I know they mention it converts to Latex, and according to the GNU site there is no real problem with that? The problems seem to be converting from LaTeX/TeX into their XML format? So, I'm not clear on this. Anyone have any experience using this stuff and how well it will convert TO Tex, etc.? That would seem to be the main concern..... By the same token, I don't like the idea of having to drop vim.
Research is what I am doing when I don't know what I am doing." -- Wernher von Braun
Here my knowledge of LaTeX is admittedly lacking. Is LaTeX style data separate and distinct from its structural info? If memory serves, it was not. This leaves it no better than HTML once again. Yes, someone can write relatively clean HTML and have all layout info in a separate stylesheet file (CSS), but the language is intimately tied to how it is displayed and there is nothing but "good taste" keeping things neat.
For DocBook, there is *nothing* layout-related. It is all semantic markup. While you could conceivably add (for example) HTML to a DocBook file, it would have to be in a separate namespace and therefore clearly defined as a separate entity. The same thing for MathML. There is nothing keeping you from embedding MathML in a DocBook document. In fact, it's easy. But you clearly see where the two meet. There is never a blurring of the line.
Please correct me if I'm wrong about LaTeX. It would not be my intention to slam it without just cause.
- I don't need to go outside, my CRT tan'll do me just fine.
I think this GNU thing suffers from the same problems as LyX, Scientific Workplace, and every other GUI front-end to (La)TeX -- it relies on menus and the mouse.
In the time it takes someone to remove his hand from the keyboard, search through a menu or click a button to make a fraction, and search through another menu or palette to find a gamma, I could have typed \frac{\gamma}{2} ten times.
My experience has been that people look to these things so they don't have to be bothered by knowing all the commands. I think that's a waste of time. After writing one paper using LaTeX, you will have memorized all of the symbols commonly used in your discipline, and you'll soon discover that LaTeX is so much faster than a GUI application.
Spending a couple of hours learning LaTeX is time well spent, and you will certainly be repaid many times over in the long run.
Sig (appended to the end of comments you post, 120 chars)
I did point out that TeX works well when you're doing small documents where structure and maintainability isn't important. A presentation certainly seems to fall in that category. Indeed, TeX strikes me as particularly well suited as a file format for a Powerpoint replacement.
This is the second "rebuttal" to my post that simply repeats what I said in the first paragraph. I said it myself -- structure isn't always important. But when you design a "structured" editor, you're obviously targeting documents where it is
Try typing \frac {\gamma}{2} in LyX (first hitting C-m to start math mode, equivalent to typing $ ... $ in LaTeX:
LyX gives you all the power of LaTeX plus the advantages of a GUI and WYSIWYM (not WYSIWYG) display. An extra nicety is that when you select something in the GUI (e.g. click the gamma icon in the math panel), it tells you the keyboard shortcut (\gamma) in the status bar...so it teaches you the relevant TeX as you go, but you always have the GUI as a backup if you forget.
Not that LyX is perfect (I'm required to hack in LaTeX more often than I think should be required), but it is really a step in the right direction. Note that it can output DocBook too.
If a thing is not diminished by being shared, it is not rightly owned if it is only owned & not shared. S. Augustine
As seen here Here TeXmacs behaves well as a code documentation tool. I can imagine why it's being worked into a literate programming environment.
Why use LyX when there's TeXmacs
whenever I want to see what I just wrote I just look at the screen.
Why use LyX when there's TeXmacs
But, in the end, it is the layout that matters, is it not?
.net</a></div>
Don't worry, I was like you too once. I beleived that it was possible to define a document format so that I could separate the "look" of the document with the actual information that I was trying to convey. But it turns out that this only works with simple documents. As your document increases in complexity, you shouldn't need to define new markup to make the document truly structured and portable. So what has to happen, and it does happen all the time, is that the document author goes beyond the markup and considers the presentation of the document.
You don't believe me? Take a simple example, I'm rewriting my homepage. On the first page, I want to put my email address and I want it to be set apart from the rest of the page. This is what I have in html:
<div><b>Email:</b> <a href="mailto:kholmes@sedona.net">kholmes@sedona
Div is a generic tag used for block elements. Why did I use it? Its definitely against the structured document approach. Otherwise I could simply use div tags for the entire document. Well, the above is certainly not a paragraph. Even if I was defining my own markup language for XML, there is no word to describe it. I could create a specific tag <email-heading>, but again this goes against the structured document approach. Specific tags are not generally useful.
The problem is that for structured documents to work, you need to *write* in a structured fashion. You have to keep track of what the semantics of each part of the document is called and be sure to limit yourself to these semantics.
Any writer would know that this would be far too limiting. Writing, inevitably, is a right-brain activity and is fundamentally unstructured. Technical documentation has to more structured than most since it needs to be easily referenced but inevitably the technical writer is plagued with the same problems that what is natural to write is not structured well to markup.
Then, when you leave the arena of technical documentation structured writing becomes almost pointless. Newspaper columns and magazine articles are really just a sequence of paragraphs written after each other. There is really no need for markup at all. Markup is barely useful for writing fiction since it comes in so many forms. And if the author chooses a new form you can't say "Wait now, we need to write markup for that." When in the end, the author is writing to a reader and not to a computer. Presentation is important. And if you intend on using a structured editor for typesetting poetry, I think even the most stringent holdouts would agree that this is a hopeless cause.
When you define a markup with a DTD or XML schema, you are saying that these are the only things you may write. To write new semantics, the writer must in his head determine the presentation.
Presentation can not be separate from content.
It rocks--check it out.
S.
If you want a specific look and feel and don't care if your site is accessible to everybody, go ahead and do that weird clunky stuff. But if you want to do a professional web site that meets accessibility standards, is easy to internationalize, and doesn't break any widely-used browsers, then you have to put a lot of thought into structure.
Sorry, but you know jack about journalism IT. News stories have never been "streams of paragraphs". All publishers, but periodicals in particular, have always had complicated markup and layout conventions. When newspapers started computerizing back in the 70s, they invented very complex file formats to describe these conventions. Which formats turned out to be a real problem when newspapers started having online editions. A classic case of the problems created by not separating structure and layout! When I see psychobabble like "right-brain activity" (which isn't valid neurology, btw) I'm tempted to think that somebody is confusing creativity with intellectual laziness. Well, that's not quite fair. But there is a certain laziness in sweeping statements about exactly what writing is and what goes on in your brain when you're doing it. If you're brainstorming and just trying to get a lot of ideas down before you forget them -- sure it's stupid to worry about structure. Indeed many writers prefer to get as low-tech as they can. Even a simple word processor is too much of a distraction for some -- or even a typewriter. Which is why they still sell so many of those 13" yellow pads...But for any bit of serious writing, be it a technical manual or a novel, there comes a time when you have to start thinking about the structure of the thing. I suppose there are "stream of consciousness" novels that went straight from tape recorder to the printers. But I'm sure not interested in reading them.
Now that's funny. Many years ago, I was a typist for a well-known poet. This guy wrote in a modern style, not a lot of rhyming and assonance. But even so, structure was a big issue. I had to be very careful that my typing did not contradict the subtle expressiveness of his very. Even a carelessly placed page break could be an issue.Poetry is almost the classical markup application. The first SGML tutorial I ever read (can't seem to find it now) used an old book of poetry as an example. If you want to bring this book online you have to do a lot of careful thinking. There are obvious tags that express the structure of the poem (<verse>, <line>, etc.) But if you don't know the poet's conventions (is that line break in the middle of a verse or not?) you might get something wrong. So you invent other tags and entities that describe the precise contents of the book.
Wait, am I mixing content and layout? No, because the original layout is not an absolute part of the document -- it's just information you want to preserve. Most users will view the document using a style sheet that ignores the tags that don't describe the basic structure of the poem. Some scholars will be interested in this old information, and will use a style sheet that reproduces the original book. With the right style sheets there's no end to the way you can present the poem. The scope for creativity is enhanced, not limited.
this has been bugging me for a while
and nobody answers me.
Why is it called TeXmacs is it's not based at TeX/LaTeX (as stated in the doc and faq)?
it's a bit misleading.
Yes... there's exporting (not perfefct) to latex.
But that's like saying that since MSWord exports unperfect html, it should be called MSHTMLword
one of the things I value most about TeX/LaTeX is its portability, if you stay away from the most esoteric latex packages, you can compile your docpretty much everywhere, and gots bettter with TeX.
That's why kinds of bugs me the name. But again, TeXmacs lovers always tell me it'0s named after TeX and emacs which doesn't really enlighten me.
Math is the weapon!!
Another milestone in scientific computing has
alied itself with EvilMACS >:-/
The Vi(m) community is boycotting all this nonsense;
remaining true to the spirit of Unix, nroff,tbl, eqn, and ed.
Death to the elitist bloat, long live crude, macho, computing.
--
Is it possible to run Macaulay2 in TeXmacs? (the site
says it is under development, but I'm impressedwith
they've done with Pari)
In order to do something (anything) with structured text stored in ascii one has to parse it. If the structured information is somehow more than the syntactical structure parsed by your parser, you lose that information when you save (because your parser does not know how to parse it).
This is the main reason why there's a reverse engineering option in many case tools. You have a nice diagram, you generate code (ascii!!!), all the nice info you had in the diagram is lost because it is not part of the syntax of your programming language and after you change the generated code you have an outdated diagram and source code that simply does not contain all the necessary information to reverse engineer it (e.g. any constraints you defined in rose). Why throw away all this information? Because developers want emacs/vi/whatever to edit their code! Don't get me wrong, emacs is a wonderful text editor, so is vi. But they are text editors, not structured information editors. They don't enforce syntactical correctness, they don't allow for new syntactical concepts, they don't store meta information, etc.
That's why in addition to ascii, advanced development tools like rational rose, visual age, togetherJ, intelliJ, netbeans, etc have some sort of internal program database while running. Visual Age doesn't even bother to generate ascii anymore and uses the internal DB for storage and on the fly compilation.
The advantage of this is that you never really lose the design information (unless you export to ascii) and as long as you stay within the tool your fine. That's also why it's easy to implement stuff like refactoring (essentially refactoring is a transformation of the structured data) since you already have a queryable structure. Of course these tools save their data in some format (maybe even ascii), however the data these tools store contains all the relevant data these tools generate so nothing is lost. Netbeans for instance uses program comments to specify non editable parts of code generated by the gui builder and Visual age uses some binary format.
Jilles
There's a very nice Doxygen-generated site with inheritance & composition graphs for the TeXmacs C++ classes and other goodies:
Take me to the TeXmacs source code.
Why use LyX when there's TeXmacs
Comparing the M$Word generated HTML to the TeXmacs generated LaTeX only shows that you have never exported to LaTeX from inside TeXmacs
Go give it a try, drini. You can!Why use LyX when there's TeXmacs
Yes Framemaker is great, but:
Why use LyX when there's TeXmacs
Besides, the specific format is beside the point. You can't manage large structured documents with a format that embeds formatting information. This should be obvious to anybody who's compared the markup approach with the word-processor approach. Alas, nobody can be bothered to do this.
Since I know very little about XML, could someone who knows both XML and TeX/LaTeX clue me in?
FrameMaker has a _lot_ of problems.
/vskip 36pt +1fil), fakes small caps, no automatic ligatures beyond fi and fl, etc.
.fm files back. :(
1 - pathetic typographic controls / tools. It can't even baseline shift beyond super/sub script, and its H&J algorithm is basic, brain-dead, one-line at a time, similarly, controlling where it adds space requires manual intervention at _every_ point (no equivalent to
2 - Really, painfully bad equation editor
3 - Bad pre-press, it prefers RGB for colour models, and getting spot colour out of it is a nightmare
4 - no Linux version, they did it in beta, then pulled it leaving testers high and dry.
There's lots more, but I've gotta hurry home to finish up a ~400 page, 2 colour book which is being done in FM 'cause the author insisted on getting
William
Sphinx of black quartz, judge my vow.
I've read the other replies here, and I agree with your point about small, naturally unstructured documents. However, I think it's perfectly possible to use (La)TeX to produce a large, highly structured document as well.
To give a concrete example, I have recently been helping my partner to typeset her masters thesis. The subject matter is a combination of English and Hindi literature. The thesis is arranged in sections as one might expect, requires standard extras such as a table of contents and bibliography, footnotes, the occasional picture, and plenty of quotations (some English, some Hindi). On my advice, she chose to use LaTeX to typeset her thesis.
Why did I advise that? Pure flexibility, if nothing else. Aside from generally professional-looking results, it is easy in LaTeX to introduce things like pictures, Hindi text (in a custom font we created ourselves using METAFONT), footnotes, bibliographic citations and cross-references in a structured way. There is always the chance with these things that the paper, or excerpts from it, will be republished in a journal, incorporated as an appendix to a PhD thesis, presented at a conference or otherwise reused. Structure and flexibility in the presentation are of vital importance.
Of course, generality is all very nice, but what really counts this time is how the finished thesis looks. Several experienced researchers have commented on the professional presentation of the finished product, so clearly that did not suffer as a result. The whole document was marked up structurally, but with LaTeX, we were also able to incorporate hints about where page breaks should go, etc. These are not concrete statements -- "Page break HERE!" -- but advice to the typesetting engine. As a result, the finished paper was more elegantly typeset than anything that would have been produced from an XML document with purely structural markup, however smart its stylesheet might have thought it was. Ultimately, good visual design is possible with stylesheets, but great visual design still requires the human touch, and a little effort to get it just right.
On this basis, I claim that LaTeX, at least, is emminently suitable for general structured documents, and not just for scientific papers.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
The writer has personal control over the document file, so she can enforce rules the ensure its maintainability. She has the services of an expert consultant (you) to help her set up those rules, and to tutor her in their proper use. Finally, she only has a single deliverable: a simple hard-copy printout in a standard thesis format.
Now if this fully describes her needs, your partner should certainly ignore my Markup Dogmas and do things the way she's doing them.
But if her document maintenance needs ever broaden, even a little, she's going to regret her initial design choices.
Suppose, for example, some academic publisher reads her thesis and says, "Hey, this is good work! If you expand it a little, it would make a good book!" So now she needs to start collaborating with some editor in another city. It'd be nice if they send revisions to each other electronically. Problem: the publisher doesn't use the same file format. Maybe it's something totally proprietary to the publisher; maybe it's TeX with some special macros they invented themselves. Maybe there's some Hindi word processor that's common in South Asia, but your partner has never heard of. Maybe it's an XML app with a special Hindi back end...
Getting the existing text into the publisher's format will be a nightmare. The cheapest solution might well be to re-enter the entire thesis from scratch.
Scenario 2: Some South Asian web site wants to put the thesis online. Delivering web pages in Hindi isn't hard, but you have to translate the thesis into HTML with the appropriate character sets. Another journey into format hell.
Scenario 3: Your partner learns that some of the quotations and statistics in her thesis are incorrect, and need to be updated in a hurry. No time to do it herself, so she hires a typist. You carefully explain the structure rules for editing the document, but the typist is stubborn and/or stupid and does things her own way, leaving you with a document that "looks right" but is full of manual page breaks, raw formatting instructions, and other garbage that renders the document unmaintainable.
Scenario 4: Some time-travelling troublemaker prevents her from even meeting you. She doesn't have the technical skill to do the fancy thesis on her own, and can't afford a consultant. She tries to put it together with the old scissors-and-glue method. She's so disappointed with the results she drops out of grad school and goes into real-estate.
The markup approach addresses each of these scenarios.
Electronic collaboration is much easier when both parties use a XML-based, content-oriented schema. This is true even if they're not the same schema, because XML is very easy to transform.
Speaking of transformation: HTML is hard to maintain and transform. Which is why more and more web sites maintain the content in XML and only deliver it in HTML.
XML is stupidity-proof. Your typist doesn't need to remember all your rules. They just need to use a validating XML editor.
And if you have a validating XML editor and know how to use it (not a common skill, but one that any serious writer should consider acquiring) you don't need a technical expert holding your hand. You just need an appropriate XML schema and back-end software. I don't know what there is mixed Hindi-English documents, but given the number of programmers who come from that part of the world....
Netbeans has an XML module (soon w/XLST transformations) that includes a pretty nice editor; in my experience, it's almost as far as XML Spy (tho' admittedly my experience with both is limited).
Just make sure you have the RAM; validating a document under 256MB w/ JDK 1.3.1 kills my system, and I'm not sure why.