Converting TeX to Microsoft Word?
belmolis asks: "For many years I've done almost all of my writing in TeX. This has increasingly caused problems with publishing in journals. For a long time, many journals reset what you sent them, so they didn't care what program you used. More and more, I find, they do, and in most cases, what they want is MS Word. Is there any good way to convert TeX to Word?"
"I've seen some advertised. Some only work with LaTeX, which doesn't help. One claims to use a full-scale TeX interpreter, but my queries as to whether it can handle home-brew Metafont fonts, PIC graphics etc. have gone unanswered. These products also all seem to be plugins for MS Word. I don't use MS Windows or any other MS products, and hate WYSIWYG word processors (I hated Bravo before it was reincarnated as Word) so a Word plugin is not a great solution, even if it works.
Furthermore, I wonder what exactly these programs do. If they interpret the TeX and then generate very low level Word, that may result in a document that looks similar, but a journal editor probably won't be able to edit it the way he wants to. In some cases the editor can be persuaded to accept a camera-ready PDF, since it turns out that the publishers often want PDF and the reason the editor wants Word is so he can edit the text, but when the editor can't or won't budge, is there any alternative to reformatting the document entirely in Word or a clone?
The larger question this raises is, where are we going? Even if formats are open, translation is difficult if they are only commensurable at a very low level. Is the solution to write in something very abstract like DocBook? And if so, will the market go this way?"
Furthermore, I wonder what exactly these programs do. If they interpret the TeX and then generate very low level Word, that may result in a document that looks similar, but a journal editor probably won't be able to edit it the way he wants to. In some cases the editor can be persuaded to accept a camera-ready PDF, since it turns out that the publishers often want PDF and the reason the editor wants Word is so he can edit the text, but when the editor can't or won't budge, is there any alternative to reformatting the document entirely in Word or a clone?
The larger question this raises is, where are we going? Even if formats are open, translation is difficult if they are only commensurable at a very low level. Is the solution to write in something very abstract like DocBook? And if so, will the market go this way?"
"I don't use MS Windows or any other MS products, and hate WYSIWYG word processors (I hated Bravo before it was reincarnated as Word) so a Word plugin is not a great solution, even if it works."
You refuse to accept a solution that works, because you don't like it? Then write it yourself. Somebody might recommend a valid solution, but then you'll just complain that the background color needs to be blue, so it won't work for you.
Maybe you could just accept that the world doesn't revolve around you and if getting something published is meaningful to you, you'll adapt to the requests of the people doing the publishing.
The F/OSS LaTeX2rtf is probably your best bet. Coverts cross-references, eps pictures to jpeg, or png (pdflatex users will be happy to know rtf supports jpeg and png), equations to either an EQ field or to a bitmap picture, and does tables right. It isn't perfect, but it is good.
Most journals I've worked with accept TeX/LaTeX or PDF files, given that you use the journal's .sty file (which they supply). I've never seen a scientific journal which doesn't accepd LaTeX output. Some don't accept MS-Word.
If it's only a few journals, I guess no respectable researcher would submit to those, so just submit to better journals.
Make even shorter URLs - 8LN.org
You're not going to get as good output from Word as from TeX, so just forget about keeping the document ready for print. The journals will change the lay-out anyway. You need only to keep the basic structure; paragraphs, chapters, lists, figures, etc. And footnotes.
I would try converting to html instead of Word, (and maybe to Word from html). There are several command line tools that claim to do this. Since YMMV and all that, I can only suggest that you try it yourself. It shouldn't be too time consuming.
Try looking for a TeX to RTF converter that'll handle your documents. If you're as much of a TeX power-user as it sounds like you are though, probably nothing will convert cleanly. At least with RTF you can edit it by hand if worse comes to worst. Word can read/write RTF, so some of your hard-nosed editors may not even notice the difference...
It would be interesting to know the field in which you publish. I gather it isn't math or science, so why not just use GNU Texinfo. It would also help if you explained what you write that makes TeX more useful than MS Word. You mentioned DocBook, but have you tried it? I guess latex2rtf (http://latex2rtf.sourceforge.net/) doesn't work so well either, hunh?
2. Eat printed code.
3. Wait 12-24 hours.
4. Collect the word docs at "the other end".
--
"we live in a post-ideological world..." - Billy Bragg.
Doesn't it make more sense to have content seperate from presentation when preparing articles for journals. Surely they must do some editing and layout changes and stuff. latex does a pretty damn good job or being "generic" unless you use non-standnard templates or whatnot. I sure as hell don't like spending hours laying things out then to have it all in the first place yet have it be "re-processed" later on. What confuses me more is so many scientists and engineers use latex in the first place. I see authors use it lots too.
----
Go canucks, habs, and sens!
Write what? It's not that Word is a bad wysiwyg, it's that wysiwyg is bad per se. It's not a matter of taste. LaTeX is MUCH more productive, gives better result, and you concentrate on content, rather than fighting with Word about format details. Fighting, because Word keeps changing the breaks, formatting, and stuff.
Just use a good old PFY conversion filter. This is, after all, why Our Lord Jesus Christ invented the idea of assistants to handle the busywork. Don't you scientists have TAs and research assistants and whatnot?
--- php: perl hates people
- The document format is application specific.
- Although you can use styles, few people know this, leading to unstructured documents.
- Even if you use styles, the format is still a bastard between page layout and structured layout, leading to unstructured documents.
This leads to a lot of extra work for the designer. For instance, if you use Quark, all italics have a tendency to get lost when you import the text. If you use unicode, it often gets fubar'ed. All habitual errors from the user (very few people know how to use Word properly) that Word hides because it's a bastard, show up again when you do the page layout, and have to be fixed.So why do journals insist on Word documents? Because InDesign and those other apps have to support Word in some way, and do. But don't expect that turtlenecked designer to know how to handle TeX. So yeah, we should all accept that the world revolves around Microsoft, not around sound technical decisions (or aesthetical, for that matter).
I have a large application written Common Lisp. It makes heavy use of macros and is written in a functional paradigm. Also, it uses a sophisticated code-walker macro to optimize the code and convert it to CPS style, and includes a full Java JVM written in Lisp to ease training new hires, as well as a type inference engine. About 50% uses CLOS multimethods and "around" methods.
However, my new manager only knows Visual Basic on Windows 95. How can I translate? I'm pretty sure it's not a "1-to-1" port. For instance, how do I do continuations in VB? Thanks!
If your journal is telling you that they won't accept latex, tell them you won't submit your articles anymore, thank you very much.
In physics we have it good due to the existence of the arXiv, where we put our articles first. Therefore journals are already limited by the fact that your article is already published on the web, and they have to accept the consequences of that. e.g. they cannot have too draconian copyright terms. I know in many disciplines the situation with journals is much worse. But remember, journals are totally dependent on us, the scientists, and not the other way around. With the advent of the web and email we can diseminate our work to our colleagues and perform peer review all without the intervention of a journal.
The physics community accepts latex as the standard, and people are (rightfully) suspicious of articles which appear on the arxiv in only .doc or .pdf format.
So, I suggest you keep using latex, investigate adding a section to the arxiv for your specialty, and tell your journal that they will accept latex or be replaced.
-- Bob
1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
It's not a matter of taste. LaTeX is MUCH more productive,
/.
hahahahahahahhahahahahahahahaahahahahahhahaha
Oh god that is probably the funniest thing I've read recently on
Why don't you give both MS Word + no book and a copy of a text editor + a book on Latex (your choice of book) to an administrative assistant making minimum wage and see which environment they are more "productive" in.
MS Word may suck for many reasons but don't use the "productive" argument.
Compromise a little, use LaTex. :-)
You can probably live with the crushing limitations relative to using TeX
And, if there's no other way then use MS Word, its character building (bad pun intended). I'd say that it won't kill you but if you have a lot of equations it might. After about 15 pages of equation intensive stuff you end up using the find function instead of scrolling because it gets so bogged down. It also regularly decides that your equation laden document won't fit on the XX or so gigbytes of free space on your harddrive. It has a long standing bug that causes it to miscalculate the size of some formulas so that no matter how much space you have left on your drive it won't save your document until you remove the offending equation segment. Hilarious, I know. I'd send a document with the problem in it to MS so that they could see the bug but then I can't save the document to send it to them. Chuckle chuckle. Those funny guys at MS have such a great sense of humor. They're worth every hundred dollar bill I send them for their fine products (sarcasm intended). What's really over the top is that people look me straight in the eye and tell me that they never have a problem using Word. Since all my friends are completely honest about anything regarding their computer use (oh dear, more sarcasm, must be past my bedtime) you can probably safely ignore my ranting.
I've started using Publicon by WRI. Interesting product. A little bit beta. If you feel like just saying f&$k the editors then this is something that you might like to dink around with even though you say you don't like WYSIWYG. Given your other proclivities I'd suggest taking Publicon for a spin around a document or two. It also claims to export TeX or LaTeX or both and it uses a bibliography database and a bunch of other nice stuff. It has a Mathematica front end so its a nice outlining tool too. The cell thing takes a little getting used to but I've come to really like it.
Yah, the dude's on total crack. In TeX you often stuck getting DEBUGGING YOUR DOCUMENT, for chris' sake! WYSIWYG may puck with you when you wanna insert here or there, but on the other hand you don't have to go add extra words to get it to shut up about "overfull hbox".
I know a bunch of programming langauges. I've done some crap in TeX. When I need to get shit done, I sure as hell do not used TeX!
The wrong direction.
Pay attention. You're obviously not from Massachusetts.
You should not even be thinking of going to a proprietary format controlled by the darkside.
You are being MICROattacked, from various angles, in a SOFT manner.
You seem to have missed the fact that in part of the world the journals want TeX and some won't won't even accept Word, so things aren't as simple as you make them out to be.
In any case, your analogy doesn't work. Rejecting a solution because you don't like the background color is (under most circumstances) silly. But that's not at all the situation here. I've given a number of good reasons for not having used or wanting to use MS Word, as have some other posters. Obviously using MS Word is a solution to the demand that you use MS Word, but it doesn't help much if you already have lots of stuff in TeX, and repeating this obvious point doesn't address all the other reasons I have for not using Word. If you're happy with Word, fine, but the entire world doesn't revolve around Microsoft or around you.
Unfortunately, most of the converters will do only a subset of the markup languages & so few (if any) will work well with custom macros.
The Chikrii TeX2Word MIGHT do it. TeX4ht may also be worth a try (->HTML/XML, which can easily become other formats). Can't comment on TeXPort. Those are really your only options. If worse-comes-to-worse, you can also look fo ps/pdf->word solutions, but those are just as bad as (La)TeX->Word.
The first key to productivity is that you are comfortable in the environment. Additional keys are that it is expressive & doesn't force you through tedium & allows you to script away as much tedium as possible. Certain people ARE more comfortable with LaTeX & know it well enough (and use the right tools) such that it isn't tedious. The most tedious parts about LaTeX are not knowing how to do something (which is combatted by knowledge or good tools or good code to steal) and compilation errors (which is combatted by knowing the syntax well, by using editors that prevent/fix/point out errors, and by compiling frequently (sometimes in the background)). LaTeX is CERTAINLY more scriptable than Word & automating references & formatting can be quite trivial. An example I recently used was a solution to placing a series of dozens of figures & captions. It is easy to generate the plain text code to do this. Less easy to write a VBA script in Word. LaTeX is also more reusable & versioning CAN be better. In short, people CAN BE PRODUCTIVE in LaTeX
Products with shallow learning curves have simple interfaces. It is true that Word has an easier-to-understand GUI than many of the LaTeX GUIs. More importantly, it is (whether we like it or not) omnipresent & most administrative assistants already have some experience with (or at least knowledge of) it. Shallow learning curves do mean increased productivity for the novice. They don't translate to increased productivity for ALL users or ALL applications.
Let's see, do I want to spend 20 hours writing out all my math formulas in Word or 5 minutes using tex?
Do I really care to fiddle around making sure the figure, table, and citations are all referencd correctly in Word, or have them automatically managed in tex?
But I guess you don't use Word for any sort of real document do you?
Hohoho, children these days.
"The document format is application specific."
And TeX isn't? Just because a document uses only ascii characters doesn't mean it's format isn't application specific.
MS Word also has problems with all the metadata. I've seen some of our scientific staff get into lots of problems when submitting Word docs for publication only to find that the 'keep past revisions' and 'authorship data' has caused embarrassment, for various reasons. I wrote an article about it: "Why Microsoft Word may be bad for your health". http://www.sungate.co.uk/articles.html or http://www.sungate.co.uk/badword.pdf (written, appropriately enough, using LaTeX) ...
"If you think the problem is bad now, just wait until we've solved it." --- Arthur Kasspe
Other journals accept or even require PDF -- it cuts down on the MS virus problem and guarantees correct rendering, unlike what you get with the diverse MS Word formats.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
Nothing is stopping you from writing a perfect clone of TeX -- all the details are published. However, nobody knows what bit 3 in byte 7 of MS's .doc format does, so you can't clone it. Or make any other software be able to read all the data in the file.
That's a problem.
My other car is first.
Why don't you give... to an administrative assistant... and see?
This has been done. With LaTeX, 80% of the work can be done in 20% of the time. Unfortunately, the remaining 20% of work take the remaining 80% of time... I wish I knew what kind of work that was (Cliparts maybe?). The impact on the sanity of the workers has not been assessed.
Anyway, AAs are not as stupid as you seem to imply, you insensitive clod.
I've been looking over your comments in this discussion, and also comparing this to what my girlfriend deals with (she's working on a linguistics PhD, and uses LaTeX for much of her work for similar reasons to you). I get the impression that you strongly prefer a "programmatic" approach to WYSIWYG, and ultimately you mostly produce plain-text-ish files with a wide range of characters, some limited formatting, and various custom diagrams. You also sound pretty technically competent generally. Is that about right?
If that's the case, then have you considered going the XML/XSLT route? I don't say this to be buzzwordy; I actually designed and maintain a fairly large web site that uses a custom XML schema to define the content (easily editable by our non-technical people so certainly possible for you) and then XSLT to do various clever tricks with it. We generate HTML output, but you could apply many of the same tools and techniques we use to generate a mostly-plain-text format that could be conveniently imported into any word processing package instead, Unicode glyphs and such included.
If you're willing to invest a few days of effort to develop the system, I can't see why you couldn't write a fairly simple customised mark-up language for yourself. You could use character entities or tags to access the Unicode glyphs for all your linguistic symbols, so instead of \phoneticsymbol, you now just need &phoneticsymbol; or <phoneticsymbol/>, depending on how clever/context-sensitive you need the interpretation to be. You can mark up document structure in much the same way as you would with TeX-based macros. Potentially, you could even define shorthand ways to represent common types of diagram as well: SVG plays nicely with XML, is rapidly becoming a viable graphics format in its own right, and might provide a convenient intermediate format to convert your diagrams into any common format required by the journal staff.
There are apparently some quite decent editing tools available to work with XML-based documents, but it sounds like you'd have about as much time for them as me and would probably prefer to work directly with the underlying mark-up. Converting your existing TeX-based documents could probably be mostly automated if you wanted, and using a structured, text-based format to represent your document has the advantage that you can support different output formats relatively easily in the future, so you wouldn't have to do all this again in five or ten years' time.
The only non-trivial work to be done in any specific word processor would then be applying the WP's heading styles, footnotes, etc. as required by the particular journal you're contributing to. You could deal with this by including a little processed mark-up in the output from your XSLT, and writing some trivial macros in any modern word processor to search for that, and apply whatever functions needed doing to that bit of text.
Without knowing more about the kind of documents you produce, it's hard to know whether this idea would be useful to you, but there it is for whatever it's worth. Good luck.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
There are many legitimate gripes about Word compared to TeX, but in fairness, its font handling isn't one of them. TeX's font-handling is a poorly standardised mess. I can go and buy any number of professional quality OpenType fonts and download them in seconds, install them just as fast, and use them immediately in Word.
Sure, there are a few good fonts available for free on CTAN too, but for anything else, it requires a PhD and access to half a dozen HOWTOs just to get the thing installed and working. Even then, there are ludicrous (by today's standards) limitations on the number of glyphs, the kerning and ligature tools, etc. I managed to hit them with annoying frequency just designing a relatively simple font with METAFONT, so how someone's supposed to design a professional set with comprehensive ligature support, real small caps, numbering variations, etc. I have no idea. This isn't just the font format, either. TeX itself is poorly equipped to deal with some aspects of professional-standard typography these days. (Hanging punctuation springs to mind.) Meanwhile, the state of the WYSIWYG art is probably InDesign at present, which will do adaptive scripting using Zapfino and supports essentially the whole range of OpenType goodies.
Bottom line: Word on Windows blows any font technology in TeX away already, and when the more advanced OpenType stuff filters down, the gap will be even wider. Given that nearly all the serious professional font companies are now moving to OpenType, this is going to be a fatal flaw for TeX before too long. You can have the best paragraph justification algorithm in the world and allow the insertion of quarter-spaces, but if your fonts are ugly, no-one's going to notice.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
I played around with a variety of converters a couple of weeks ago. The best luck I had was: .doc file (or rtf)
1) convert (la)TeX to html (there are a number of tools)
2) read html that into word
3) save as Word
I imagine that OpenOffice would do step 3 fine as well.
I use LaTeX2e on a daily basis for a great variety of documents. While at it, I also had to interact at the professional level with people who seem to think that the one and only way to do rich text is with MS Word, so I had to see what could I do to preserve interoperability.
In the Free Software realm, the two best options seem to be latex2rtf and tex4ht.
The first one, latex2rtf, is the one I use. It works decently, does its job, and does it well. The only glitches I saw are that the resulting document has an user--defined page size and margins that are too big (i.e., 3.5--4.2cm), but both of these are easily surmountable. For your needs, the trouble with latex2rtf is that it only does LaTeX, AFAIK.
The second one, tex4ht, is said to be an excellent tool. It is aimed mainly to the production of hypertext documents for the Web, and it could be used as a (La)TeX -- HTML tool. But it can also generate XML/CSS, and OpenOffice.org .sxw documents, too. From there, converting to a Word format would be trivial.
For me, tex4ht looked interesting, and really worth checking out. Additionally, it is not bound to the LaTeX format; it can do plain TeX. However, and this is the sad part, its installation instructions for Unix systems are incredibly hard to understand, especially for those of us that do not use the C shell. And, it does not integrate well with the defacto TeX distribution for Unix, teTeX.
My take: if you can manage to get it going, perhaps tex4ht might be the way to go for you.
-- Look to the Rose that blows about us--"Lo, Laughing," she says, "into the World I blow..."
I don't even use latex or tex and I can answer this.
Both concentrate more on content and allow correct mathematic symbols to be used rather than on formating.
Word sucks, for that matter so does Open Office writer. Both are good for short letters, and documents, but when it comes to accurately reproducing symbols, mathematics and physics concepts and numbers there is nothing better for that kind of formatting.
Also Tex and latex will print exactly it's shown. Yuo know exact how things will look when your done. Unlike Word and even OO.o Writer, What you spend hours formatting may not print how you think it will. I have seen this happen on nearly a third of the small number of stuff that I print. whenever possible I switch to pdf then print as I know the PDF will print as desired, and I have a chance to test how it will look before wasting ink.
i thought once I was found, but it was only a dream.
Graduate mathematics students can be given a text editor, 45 minutes of verbal instruction, and a LaTeX book and be very productive. The LaTeX book is used for referenece; 95% of what they need to know can be be communicated in the 45 minutes of instruction. I'm speaking from experience here.
Find free books.
It's not that Word is a bad wysiwyg, it's that wysiwyg is bad per se. It's not a matter of taste.
Rubbish. Applications like Adobe FrameMaker show that wysiwyg can be done well. I certainly don't spend my time fighting Frame over format details. Generally, I spend no more than a day (out of a 120-day budget to write a 600-page manual) on formatting, and that includes creating the formatting from scratch. That's pretty productive.
Even better, we rarely encounter problems with Frame that we don't understand and can't solve easily. Word, on the other hand, has us going "WTF" on a regular basis.
Which means that yes, wysiwyg versus TeX IS a matter of taste, and that Word IS a bad example of wysiwyg.
this is getting off topic, but i'll react nonetheless.
the reason this phenomenon occurs often in wysiwyg word processors, is because people do not make use of built in style functions consequently, and format page layouts etc during editing.
i'm only experienced in publishing in biology journals, and the amount of formatting you need for that is minimal. text is supplied as a word document in a single font (plus of course a font for symbols). images are supplied separately, and one could do the same with the occasional equation.
eps is an often used format that covers just about anything one would like to do in an image. most people i know make their figures in adobe illustrator and save as eps.
this might have changed slightly since i am not in science and publishing anymore (thank god!) but not drastically. word is a perfectly suitable tool for this kind of simple situation. hell, wordpad+rtf could do the job already.
"Nothing is stopping you from writing a perfect clone of TeX -- all the details are published."
.doc format does, so you can't clone it. Or make any other software be able to read all the data in the file."
Why on earth would I want to do that?
"However, nobody knows what bit 3 in byte 7 of MS's
So Open Office's claim of MS Word file-format compatibility is a lie?
Mind Booster Noori