Tools for Publishing in Multiple Formats?
Truist asks: "What are the best tools (windows or *nix) to use to publish a single source document in multiple formats, specifically plain text, multi-page HTML, and PDF? I'm trying to publish a (60-page+) NetBSD installation guide/documentary online, and I want plain text for easy download and 'less'-ability, HTML for easy browsing and search engine indexing, and PDF or Postscript for easy printing. It's currently a Word document (I know, I know - I'm happy to manually convert it to something else) with multiple styles, including regular text, lists, internal links, external (web url) links, code, and notes, and I'd like to preserve as much as possible of each in the final output. Some additional notes: there are no graphics, and I expect to update this document periodically, or to split it into parts and maintain the parts (think master document / subdocuments). It won't be updated too often, but if re-publishing could be scriptable, that would be fantastic."
Open Office can save to all of those formats. It isn't scriptable, but all those outputs are done easily enough. Has that been taken into consideration or am I off base on what you are looking for?
That's scary.
I actually worked on a ~500pg. documentation project with a couple of other developers a couple of years back, and after about six months of debate, they finally agreed to let me recode the thing from TeX to DocBook XML.
The conversion was a PITA, but once that was finished, we had about 40 source XML files which were independently version-controlled, some minor customizations to the standard DocBook XSLT stylesheets, and slick, easily-updated HTML, plain text, and PDF versions of the document being produced straight out of CVS by a cron job.
A nice benefit of the conversion was that we were actually able to add another few hundred pages of documentation that was automatically generated from grammar definitions and source code to the batch build, and they could be integrated into the style and distribution methods we worked out for the hand-generated docs.
Lyx is how I do what you're talking about. It's a WYSIWYM (What You See Is What You Mean) document processor, and it's great. I use it to write term papers, HOWTO documents, and lots of other stuff. You can export your document to many different formats, including HTML, PDF, plain text and Postscript. You should try it out, I really like it.
My blog
I've written some extensive docs in texinfo and moved it rather easily to pdf, html and plain text.
I've tried doing the same for docbook and it plain sucked. While the DocBook format itself is nice, the tools for transforming are too complex (for me?), esp. if you want to customize conversion to HTML or PDF. This definitely goes for DocBook/SGML, and by what I've seen so far DocBook/XML too to some extend.
Thus I'd rather say "texinfo", at least unless someone comes up with a foolproofed suite of tools for DocBook->PDF+HTML.
My $0.02.
- Hubert
I've seen some posts here on XML, but most seem misleading. I've found that the most expressive and most flexible format is manual XML -- as in, your own dialect.
That is, you define your own tags, and define what they mean. Then you create stylesheets to convert them to other things. Because the original XML contains your intention, not the eventual formatting, it makes it easy to convert, or to make broad, sweeping changes to presentation (as presentation is detached from content).
The simplest example is, suppose you started out just using some HTML-like <pre> tag for code. You could easily search and replace that with something else, like say <div id="code"> for an HTML-style CSS implementation. That would actually be a step towards XML. The problem is, if you start out this way, and then you've got your hundreds of pages, you may have occasionally used <pre> for other things, like ASCII art.
The right way to do this is create an XML tag <code> and stylesheets that convert it to, say, a little padded space for text, a <pre> tag for HTML, or something literally to do with fonts for postscript. That way you can change any detail of presentation in any format in one place (the stylesheet).
There's also the fact that XML, especially encoded in UTF-8, is immortal, whereas say a Word upgrade (or any software upgrade, for that matter) could make your document obsolete. And even if there is never an XML parser left in the world, it is human-readable.
Don't thank God, thank a doctor!
Too bad they are such a large investment just to publish. Applescript allows for doing very complicated workflows using multiple applications. You can even 'compile' it as a droplet for drag and drop functionality.
If I was doing this on my Mac I would create a script to, in order: Save my Word file as Plaintext, Save it as HTML, Print it as PDF (OS X can print to PDF from any and all applications), use the ColorSync Utility to regenerate the PDFs with your desired compression settings, then use an HTML cleaner such as HTML Tidy to eliminate all the crappy MS HTML markup. With Applescript it's a point and click operation to create the script, just hit record and go through the motions described above, hit stop and save as a droplet. You can drag and drop any number of Word docs onto it when ever you need to 'publish'. You could add an FTP action or save to an iDisk as part of the workflow just as easily.
The only thing you have to worry about is some of Word's [table] markup as it seriously blows when you try to convert to normal html.
There are plenty of tools for XML/XSLT transforms that could be scripted as well but it could be overkill... or maybe not.
If you had a Mac it would be easy.
A fool throws a stone into a well and a thousand sages can not remove it.
If you write your own major mode for Emacs, all these issues just go away.
However, if you are lacking lisp-fu and absolutely must have a GUI-based
WYSIWYG editor, OpenOffice may be a possible solution. You'll have to avoid
workflows that result in creating styles with meaningless names. (For example,
you can't just highlight some random text and start formatting it in various
arbitrary ways. Instead, define your styles properly with names using the
style catalog, and then apply your named styles to blocks of text. The styles
you define should represent things that have meaning independent of the format,
so that you can sensibly convert them into the various target formats.)
Getting the XML out of an OO document is a simple matter of renaming it from
foo.sxw to foo.zip and unzipping it. You can then use any XML parser you
like (e.g., one of the XML modules off of CPAN) to transform it into DocBook
or whatever. All of this can be automated.
Cut that out, or I will ship you to Norilsk in a box.