Tools for Publishing in Multiple Formats?
Truist asks: "What are the best tools (windows or *nix) to use to publish a single source document in multiple formats, specifically plain text, multi-page HTML, and PDF? I'm trying to publish a (60-page+) NetBSD installation guide/documentary online, and I want plain text for easy download and 'less'-ability, HTML for easy browsing and search engine indexing, and PDF or Postscript for easy printing. It's currently a Word document (I know, I know - I'm happy to manually convert it to something else) with multiple styles, including regular text, lists, internal links, external (web url) links, code, and notes, and I'd like to preserve as much as possible of each in the final output. Some additional notes: there are no graphics, and I expect to update this document periodically, or to split it into parts and maintain the parts (think master document / subdocuments). It won't be updated too often, but if re-publishing could be scriptable, that would be fantastic."
I am currently publishing several several-hundred-page technical manuals using the following workflow:
All documentation is edited using an ordinary plaintext editor.
The documents are marked-up using ReStructured Text conventions. This has satisfied 99% of my needs. I've decided the convenience of ReST outweighs the need for the remaining 1% of the frills I want.
I use CVS for revision control. There may be an RCS involved in the backend; I don't operate the server that hosts my repository.
The ReST documents are converted to XML using DocUtils. The project coordinator, by the way, has proven himself a superlative programmer. DocUtils rocks, and will also transform ReST to HTML or Latex.
The XML is converted using XSL templates that I've created. Saxon then transforms the DocUtils XML to XML:FO, and FOP transforms that into PDF.
Pretty fucking spiffy, if I do say so myself.
I also currently use HT2HTML to transform ReST to HTML. I use it in preference to DocUtil's native HTML transformation because it allows me to do a few nice tricks. In the future I plan to migrate entirely to another set of custom XSL tranformations.
This system has proven extremely productive. At any time I could pop a few bucks for a commercial XSL:FO->PDF engine and stomp the few gripes I've had with FOP (my number one issue is lack of keep-with-next functionality; however, FOP is under a complete refactoring, and will emerge with full functionality). Saxon has been superb, DocUtils has been wonderful (and I've been able to contribute to the overall design), and ReST is quite pleasant to read and write.
Overall, I highly recommend this workflow.
Your source material becomes extremely reusable, eminently accessible, and free from commercial encumberances.
(footnote: if you do go this route, please don't flood the DocUtils developers with suggestions and ideas. Work out your idea in detail, consult the developers' mailing list archives, and make full consideration of side-effects. Only then suggest it. They've been at this so long, and had so many discussions, that they've become a little short of patience with loud-mouthed newbies. I suspect most popular open-source projects get that way...)
--
Don't like it? Respond with words, not karma.
I couldn't agree more. Docbook is pretty slick, but turning your Docbook source into a useable format is ridiculously hard. And what's more, chances are fairly good that when you are done it won't look good. PostgreSQL, to cite an example, actually dumps their Docbook to RTF and then edits it in RTF before creating their Postscipt and PDF files. What's more, despite the fact that they released 7.4 yesterday they don't have 7.4 documentation ready to go because they can't get the docs to build.
Texinfo, on the other hand, is relatively easy to use, and it has well-supported tools for the creation of html, pdf (with hyperlinks), rtf, info (emacs rocks), and plain text.