Slashdot Mirror


Tools for Publishing in Multiple Formats?

Truist asks: "What are the best tools (windows or *nix) to use to publish a single source document in multiple formats, specifically plain text, multi-page HTML, and PDF? I'm trying to publish a (60-page+) NetBSD installation guide/documentary online, and I want plain text for easy download and 'less'-ability, HTML for easy browsing and search engine indexing, and PDF or Postscript for easy printing. It's currently a Word document (I know, I know - I'm happy to manually convert it to something else) with multiple styles, including regular text, lists, internal links, external (web url) links, code, and notes, and I'd like to preserve as much as possible of each in the final output. Some additional notes: there are no graphics, and I expect to update this document periodically, or to split it into parts and maintain the parts (think master document / subdocuments). It won't be updated too often, but if re-publishing could be scriptable, that would be fantastic."

7 of 63 comments (clear)

  1. Docbook.. (again) by camilita · · Score: 4, Informative

    I have seen a variation of this question at least two times posted here. The unanymous answer is usually docbook and in this case is more relevennt, since the document is technical in nature.

    good pick is DocBook: The Definitive Guide written by Norma Walsh (who chairs the Oasis DocBook Technical Committee) and published by O'Reilly that. Of course the book is also available in HTML, PDF and plain text.

  2. Latex? by conantoniou · · Score: 3, Informative

    Come on... this has to be a planted question ;)

  3. OpenOffice? by HolyCoitus · · Score: 3, Interesting

    Open Office can save to all of those formats. It isn't scriptable, but all those outputs are done easily enough. Has that been taken into consideration or am I off base on what you are looking for?

    --
    That's scary.
  4. I like tex by jrstewart · · Score: 4, Informative

    Specifically latex, and more specifically pdflatex for pdf output and tex2page for html. With some hacking you should be able to script tex2page into outputting text as well.

    To some extent the texinfo folks have solved this problem as well. The DocBook stuff mentioned elsewhere might be very nice but I have no experience with that.

  5. The Near-Definitive Solution by FFFish · · Score: 3, Insightful

    I am currently publishing several several-hundred-page technical manuals using the following workflow:

    All documentation is edited using an ordinary plaintext editor.

    The documents are marked-up using ReStructured Text conventions. This has satisfied 99% of my needs. I've decided the convenience of ReST outweighs the need for the remaining 1% of the frills I want.

    I use CVS for revision control. There may be an RCS involved in the backend; I don't operate the server that hosts my repository.

    The ReST documents are converted to XML using DocUtils. The project coordinator, by the way, has proven himself a superlative programmer. DocUtils rocks, and will also transform ReST to HTML or Latex.

    The XML is converted using XSL templates that I've created. Saxon then transforms the DocUtils XML to XML:FO, and FOP transforms that into PDF.

    Pretty fucking spiffy, if I do say so myself.

    I also currently use HT2HTML to transform ReST to HTML. I use it in preference to DocUtil's native HTML transformation because it allows me to do a few nice tricks. In the future I plan to migrate entirely to another set of custom XSL tranformations.

    This system has proven extremely productive. At any time I could pop a few bucks for a commercial XSL:FO->PDF engine and stomp the few gripes I've had with FOP (my number one issue is lack of keep-with-next functionality; however, FOP is under a complete refactoring, and will emerge with full functionality). Saxon has been superb, DocUtils has been wonderful (and I've been able to contribute to the overall design), and ReST is quite pleasant to read and write.

    Overall, I highly recommend this workflow.

    Your source material becomes extremely reusable, eminently accessible, and free from commercial encumberances.

    (footnote: if you do go this route, please don't flood the DocUtils developers with suggestions and ideas. Work out your idea in detail, consult the developers' mailing list archives, and make full consideration of side-effects. Only then suggest it. They've been at this so long, and had so many discussions, that they've become a little short of patience with loud-mouthed newbies. I suspect most popular open-source projects get that way...)

    --

    --
    Don't like it? Respond with words, not karma.
  6. texinfo 0wns docbook by hubertf · · Score: 4, Interesting

    I've written some extensive docs in texinfo and moved it rather easily to pdf, html and plain text.

    I've tried doing the same for docbook and it plain sucked. While the DocBook format itself is nice, the tools for transforming are too complex (for me?), esp. if you want to customize conversion to HTML or PDF. This definitely goes for DocBook/SGML, and by what I've seen so far DocBook/XML too to some extend.

    Thus I'd rather say "texinfo", at least unless someone comes up with a foolproofed suite of tools for DocBook->PDF+HTML.

    My $0.02.

    - Hubert

  7. On a Mac it's easy... by foniksonik · · Score: 3, Interesting

    Too bad they are such a large investment just to publish. Applescript allows for doing very complicated workflows using multiple applications. You can even 'compile' it as a droplet for drag and drop functionality.

    If I was doing this on my Mac I would create a script to, in order: Save my Word file as Plaintext, Save it as HTML, Print it as PDF (OS X can print to PDF from any and all applications), use the ColorSync Utility to regenerate the PDFs with your desired compression settings, then use an HTML cleaner such as HTML Tidy to eliminate all the crappy MS HTML markup. With Applescript it's a point and click operation to create the script, just hit record and go through the motions described above, hit stop and save as a droplet. You can drag and drop any number of Word docs onto it when ever you need to 'publish'. You could add an FTP action or save to an iDisk as part of the workflow just as easily.

    The only thing you have to worry about is some of Word's [table] markup as it seriously blows when you try to convert to normal html.

    There are plenty of tools for XML/XSLT transforms that could be scripted as well but it could be overkill... or maybe not.

    If you had a Mac it would be easy.

    --
    A fool throws a stone into a well and a thousand sages can not remove it.