Slashdot Mirror


Tools for Publishing in Multiple Formats?

Truist asks: "What are the best tools (windows or *nix) to use to publish a single source document in multiple formats, specifically plain text, multi-page HTML, and PDF? I'm trying to publish a (60-page+) NetBSD installation guide/documentary online, and I want plain text for easy download and 'less'-ability, HTML for easy browsing and search engine indexing, and PDF or Postscript for easy printing. It's currently a Word document (I know, I know - I'm happy to manually convert it to something else) with multiple styles, including regular text, lists, internal links, external (web url) links, code, and notes, and I'd like to preserve as much as possible of each in the final output. Some additional notes: there are no graphics, and I expect to update this document periodically, or to split it into parts and maintain the parts (think master document / subdocuments). It won't be updated too often, but if re-publishing could be scriptable, that would be fantastic."

63 comments

  1. Docbook docbook docbook by T-Ranger · · Score: 1
    Docbook.org

    Since most of the docs are out of date and talk about stupid and near-impossible to configure tools, I also mention xlmto to do the actual conversion.

    1. Re:Docbook docbook docbook by Chexsum · · Score: 0

      xmlto :)

      Docbook/XML is nice but it needs more formatters - GNU info isnt supported AFAIK. :(

      --
      Pixels keep you awake!
    2. Re:Docbook docbook docbook by You're+All+Wrong · · Score: 2, Informative

      DocBook SUCKS. However, it's probably the best thing out there for the job.

      The problem with DocBook might also be considered its strength - basically it was designed by a committee, and evolved several humps. Each influential party behind it pushed the features that they wanted to see into it. Each individual feature set is a pretty good coherent package which will let you create documents just like [insert-project-name-here]'s own documentation - pretty neat! However, the different feature sets clash _horribly_, and if you pick and chose beween them you'll end up with an inside-out baboon.
      (And to be honest I've not discovered _any_ feature that the various admonitions don't look out-of-place near with! Most of the list types are pretty, erm, special too.)

      I've just taken on the role of producing documentation for a small OpenSource project, and I came _very_ close to regretting my choice of DocBook. However, once you've decided what coherent subset of the features you actually need, you'll probably end up with something that looks OK in all formats.

      (I was using the default Debian Jade configuration, perhaps I could tweak some of the stylesheets to look less quirky.)

      YAW.

      --
      Your head of state is a corrupt weasel, I hope you're happy.
    3. Re:Docbook docbook docbook by T-Ranger · · Score: 1
      There is a push for a docbook lite (or whatever) version.

      But as for it being a comprimise beteween different output formats... Well, ya. Thats kinda the whole point.

      And its not Docbook that sucks, its all the old documentation for it that points users to stupid, hard to configure, to down right broken and non-functional tool chain. The near volumes of (shitty) instructions on DSSLCrap this, XSLTblargh that, SGMLSuperKalaFragalisticBroken other thing can all be summerized with:

      xmlto.. If you dont have xmlto, get it. If your cant, give up.

    4. Re:Docbook docbook docbook by You're+All+Wrong · · Score: 1

      """
      But as for it being a comprimise beteween different output formats... Well, ya. Thats kinda the whole point.
      """

      You misunderstand - I mean it's a hotch-potch of different styles, and half of the list types clash with half of the other tags as they were injected into the standard by some big OS project that wanted lists just so. Someone else wanted a family of admonitions, and they have a different look to them. Someone else wanted to have BNF grammers just-so, and that style clashes with the other text.
      All of this is of course just the default rendering options, and in theory should be customisable. As you say - it's not DocBook that's really to blame it's the stuff that's been stuck on the side and its documentation.

      I need to change my disty to testing, but I'll pull down xmlto, and see if I can throw off some of my ati-DocBook feelings.

      Thanks for the recommendation of it.

      YAW.

      --
      Your head of state is a corrupt weasel, I hope you're happy.
  2. Docbook.. (again) by camilita · · Score: 4, Informative

    I have seen a variation of this question at least two times posted here. The unanymous answer is usually docbook and in this case is more relevennt, since the document is technical in nature.

    good pick is DocBook: The Definitive Guide written by Norma Walsh (who chairs the Oasis DocBook Technical Committee) and published by O'Reilly that. Of course the book is also available in HTML, PDF and plain text.

    1. Re:Docbook.. (again) by dpp · · Score: 1

      What about when you want to apply your own styles, layout, or formatting to the output? I've always found that rather tricky in docbook. Do you have any recommendations for where to start? (I've been using the simple docbook2pdf and docbook2html)

      --
      This post is strictly my own opinion and not necessarily that of my employer.
    2. Re:Docbook.. (again) by T-Ranger · · Score: 2, Informative
      Docbook defines a standard for a document markup language.

      The XML style sheets (XSLT) that invarrably happen to come with 'a docbook distribution' are not a component of that standard. Your free to change them at will.

      How you do that, I haven clue 1 beyond 'edit the .xsl' Im sure ORA has a book or 10 on the subject.

    3. Re:Docbook.. (again) by __past__ · · Score: 2, Informative

      Read DocBook XSL: The Complete Guide, a pretty good (and free, unless you want dead trees) book on how to use and customize the DocBook XSL stylesheets for web and print. Knowing both DocBook and a little XSLT before you start doesn't harm, though.

    4. Re:Docbook.. (again) by dpp · · Score: 1

      Thanks for the link. I'll check it out.

      --
      This post is strictly my own opinion and not necessarily that of my employer.
  3. Latex? by conantoniou · · Score: 3, Informative

    Come on... this has to be a planted question ;)

  4. You're in luck! by Dr.+Photo · · Score: 2, Informative
  5. Docbook+FOP by notfancy · · Score: 2, Informative

    The subject says it all. Apparently, it's the standards-based, open-source-conforming way to do it. I've heard paeans sung to FOP but I haven't used it, yet.

    1. Re:Docbook+FOP by adamshelley · · Score: 0

      Good Advice. Fop works great but you may find that it doesn't support the other formats as well as it does PDF. What you should try to do is create a source XML document for your starting point and come up with some XSL + 3rd party apps to do what you need. This will give you full control over each output document. You may not find your solution with just one "document processor" but as long as you are starting with XML pretty much anything is possible.

    2. Re:Docbook+FOP by Anonymous Coward · · Score: 0

      I don't want FOP, goddammit. I'm a Dapper Dan man.

  6. 5 illegal Indian Senior IT consultants by Anonymous Coward · · Score: 1, Funny

    One knows how to convert to PDF, another one knows where the Save As document is in Word, the third one can tell him that you can also save in RTF from Word, the fourth one to save to OpenOffice, and fifth one to convert it to HTML. Sorry, no DocBook, Senior Consultants are not familiar yet with this just-announced technology, and MS Word apparently doesn't support it.

  7. OpenOffice? by HolyCoitus · · Score: 3, Interesting

    Open Office can save to all of those formats. It isn't scriptable, but all those outputs are done easily enough. Has that been taken into consideration or am I off base on what you are looking for?

    --
    That's scary.
    1. Re:OpenOffice? by oo_waratah · · Score: 1

      OpenOffice is scriptable and there are web backends being produced using OpenOffice.org that convert documents on the fly. LOok up the mailing lists for more information.

  8. Not Open Source... by Johnathon_Dough · · Score: 1
    But
    Aside from the scriptable part, InDesign seems to be able to do all that you are asking for.

    My portfolio is held in an InDesign document, which i have routinely saved out to HTML, PDF, printed etc.
    It supports basic HTML code, CSS, and all the links you could want. It carries the links into the PDF as well if that is what you chose in your output.

    Just don't expect anything fancy from your HTML, but if it is only text then no problem.

    Also, i am pretty sure it imports word docs, but i am not sure as i don't have word.

    --
    If you are one in a million, then there are six thousand people who are just like you.
    1. Re:Not Open Source... by Johnathon_Dough · · Score: 1

      Oh, also I believe Quark 5 and 6 will do it as well, but from all i have used it, it's PDF output is crap.

      --
      If you are one in a million, then there are six thousand people who are just like you.
    2. Re:Not Open Source... by Anonymous Coward · · Score: 0

      Adobe Framemaker + WebWorks Publisher is a much better choice than InDesign for this sort of task.

    3. Re:Not Open Source... by Anonymous Coward · · Score: 0

      Except that Framemaker is only available on legacy platforms. InDesign, by contrast, just recently got a full point upgrade. Really good one, too.

    4. Re:Not Open Source... by anotherGuy214 · · Score: 1

      InDesign will export tagged XML, PDF(very easily), HTML, etc. Additionally, if you are running on a mac, it is very scriptable using AppleScript. It has a large detailed scripting manual available online at adobe developer site. And yes, it imports word docs including styles, tables, etc.

  9. DocBook, and some reasons... by baka_boy · · Score: 2, Interesting

    I actually worked on a ~500pg. documentation project with a couple of other developers a couple of years back, and after about six months of debate, they finally agreed to let me recode the thing from TeX to DocBook XML.

    The conversion was a PITA, but once that was finished, we had about 40 source XML files which were independently version-controlled, some minor customizations to the standard DocBook XSLT stylesheets, and slick, easily-updated HTML, plain text, and PDF versions of the document being produced straight out of CVS by a cron job.

    A nice benefit of the conversion was that we were actually able to add another few hundred pages of documentation that was automatically generated from grammar definitions and source code to the batch build, and they could be integrated into the style and distribution methods we worked out for the hand-generated docs.

  10. Indexing by nomadx · · Score: 0

    if you want it searchable, and in sections, why not have some sort of database that divides it up into its respective sections and make the display a perl program? the output would of course be html which is universal.

  11. XML by MBCook · · Score: 1
    Isn't this what XML is for, among other things? Write it in some format then convert it to XML (or use OpenOffice.org, which already IS XML). Then you just use a XSLT or some program you (or someone else) writes to convert the XML into as many different formats as you chooose.

    Might not be easy to set up at first, but it should work fantastically.

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    1. Re:XML by nomadx · · Score: 1

      xml would be a perfect example of a database to display method like how i mentioned. it would take a while to set it up though. i suppose the author wrote themself into a corner anyway by just writing it all out raw in word.

    2. Re:XML by rjshields · · Score: 1

      XML is to Docbook as SGML is to HTML. You wouldn't write web pages in SGML, so why write documentation in XML?

      If you were to write your documentation in XML, then you would need to define a meaningful DTD/schema and all the tools that go with it to make it useful.

      But why bother when someone's already done the hard work for you? eg. Docbook.

      --
      In this world nothing is certain but death, taxes and flawed car analogies.
  12. Lyx by jmt9581 · · Score: 2, Interesting

    Lyx is how I do what you're talking about. It's a WYSIWYM (What You See Is What You Mean) document processor, and it's great. I use it to write term papers, HOWTO documents, and lots of other stuff. You can export your document to many different formats, including HTML, PDF, plain text and Postscript. You should try it out, I really like it.

    --

    My blog

    1. Re:Lyx by Zach+Garner · · Score: 1

      Right. I use LyX a lot. It's very nice. But one thing to remember, LyX currently is really only good at generating LaTeX code.

      You can use LyX without ever knowing anything about latex, but for conversion, you've got to deal with a few issues.

      PDF output is nice. Postscript/PDF format is what latex is all about. But, HTML output via latex2html isn't very great. It's functional (for the most part), but is a pain to customize, and in my opinion, not professional enough.

      LyX does have some nice features, and I certainly recommend it, but it's not an ideal solution (still the best I've found though)

  13. I like tex by jrstewart · · Score: 4, Informative

    Specifically latex, and more specifically pdflatex for pdf output and tex2page for html. With some hacking you should be able to script tex2page into outputting text as well.

    To some extent the texinfo folks have solved this problem as well. The DocBook stuff mentioned elsewhere might be very nice but I have no experience with that.

  14. The Near-Definitive Solution by FFFish · · Score: 3, Insightful

    I am currently publishing several several-hundred-page technical manuals using the following workflow:

    All documentation is edited using an ordinary plaintext editor.

    The documents are marked-up using ReStructured Text conventions. This has satisfied 99% of my needs. I've decided the convenience of ReST outweighs the need for the remaining 1% of the frills I want.

    I use CVS for revision control. There may be an RCS involved in the backend; I don't operate the server that hosts my repository.

    The ReST documents are converted to XML using DocUtils. The project coordinator, by the way, has proven himself a superlative programmer. DocUtils rocks, and will also transform ReST to HTML or Latex.

    The XML is converted using XSL templates that I've created. Saxon then transforms the DocUtils XML to XML:FO, and FOP transforms that into PDF.

    Pretty fucking spiffy, if I do say so myself.

    I also currently use HT2HTML to transform ReST to HTML. I use it in preference to DocUtil's native HTML transformation because it allows me to do a few nice tricks. In the future I plan to migrate entirely to another set of custom XSL tranformations.

    This system has proven extremely productive. At any time I could pop a few bucks for a commercial XSL:FO->PDF engine and stomp the few gripes I've had with FOP (my number one issue is lack of keep-with-next functionality; however, FOP is under a complete refactoring, and will emerge with full functionality). Saxon has been superb, DocUtils has been wonderful (and I've been able to contribute to the overall design), and ReST is quite pleasant to read and write.

    Overall, I highly recommend this workflow.

    Your source material becomes extremely reusable, eminently accessible, and free from commercial encumberances.

    (footnote: if you do go this route, please don't flood the DocUtils developers with suggestions and ideas. Work out your idea in detail, consult the developers' mailing list archives, and make full consideration of side-effects. Only then suggest it. They've been at this so long, and had so many discussions, that they've become a little short of patience with loud-mouthed newbies. I suspect most popular open-source projects get that way...)

    --

    --
    Don't like it? Respond with words, not karma.
    1. Re:The Near-Definitive Solution by abulafia · · Score: 1

      Are you using images with Fop?

      How's that going for you?

      We've had no luck, and Apache's image examples are currently broken, which is not giving me a good feeling...

      --
      I forget what 8 was for.
    2. Re:The Near-Definitive Solution by FFFish · · Score: 1

      I am using illustrations. My ReST text defines image classes for both PDF and HTML output, and the XSL selects the appropriate-resolution PNG and sizes it accordingly. For PDF I use 600dpi images; for HTML 72dpi.

      It has been necessary to significantly increase the memory allocations for FOP. The current command is

      (WinXP) java -Xms64m -Xmx256m -Xss64m -cp "%LOCALCLASSPATH%" org.apache.fop.apps.Fop -c "%LOCAL_FOP_HOME%conf\userconfig.xml" %1 %2 %3

      (Bash) "$JAVACMD" -Xms64m -Xmx256m -Xss64m -classpath "$LOCALCLASSPATH" org.apache.fop.apps.Fop -c $FOP_HOME/conf/userconfig.xml $FOP_OPTS "$@"

      I'm using the standard JIMI jar, not JAI.

      Note that embedded EPS does not display on Acrobat, though it will print fine on PS printers. This is one reason I've chosen PNG.

      --

      --
      Don't like it? Respond with words, not karma.
  15. Check out the FreeBSD Documentation Project by Foozy · · Score: 1

    http://www.FreeBSD.org/doc/en_US.ISO8859-1/books/f dp-primer/index.html

  16. What I tell you three times is true... by jefu · · Score: 1
    Docbook.

    Docbook is a flexible, configurable, way to do just what you're asking. You can change output formats - produce PDF or RTF or HTML or Latex or text. You can parameterize it, and script it pretty easily. There are already Docbook to filters available and you can adapt them to other uses with a bit of poking around.

  17. texinfo 0wns docbook by hubertf · · Score: 4, Interesting

    I've written some extensive docs in texinfo and moved it rather easily to pdf, html and plain text.

    I've tried doing the same for docbook and it plain sucked. While the DocBook format itself is nice, the tools for transforming are too complex (for me?), esp. if you want to customize conversion to HTML or PDF. This definitely goes for DocBook/SGML, and by what I've seen so far DocBook/XML too to some extend.

    Thus I'd rather say "texinfo", at least unless someone comes up with a foolproofed suite of tools for DocBook->PDF+HTML.

    My $0.02.

    - Hubert

    1. Re:texinfo 0wns docbook by neves · · Score: 2, Informative

      Red Hat linux comes with a simple set of tools: docbook2pdf, docbook2html, docbook2rtf to help converting from docbook to other formats. It's a lot easier than directly using jade et al.

    2. Re:texinfo 0wns docbook by Jason+Earl · · Score: 2, Insightful

      I couldn't agree more. Docbook is pretty slick, but turning your Docbook source into a useable format is ridiculously hard. And what's more, chances are fairly good that when you are done it won't look good. PostgreSQL, to cite an example, actually dumps their Docbook to RTF and then edits it in RTF before creating their Postscipt and PDF files. What's more, despite the fact that they released 7.4 yesterday they don't have 7.4 documentation ready to go because they can't get the docs to build.

      Texinfo, on the other hand, is relatively easy to use, and it has well-supported tools for the creation of html, pdf (with hyperlinks), rtf, info (emacs rocks), and plain text.

  18. it's all about latex baby! by hoyosa · · Score: 1

    use latex.

    after creating your document in latex, you can use a simple makefile to create all your formats.

  19. OO and scripts: by Futurepower(R) · · Score: 2, Informative

    It's highly likely that OO is scriptable.

    1. Re:OO and scripts: by Chexsum · · Score: 0

      For the conversions you might find that OO uses xslproc externally?

      If it does [like a few other programs do] then itd be easier to just get the stylesheets to change the XML youve written.

      --
      Pixels keep you awake!
    2. Re:OO and scripts: by HolyCoitus · · Score: 1

      Oh, wow... I was never even aware that stuff existed... OpenOffice keeps on surprising me with all the power that it has along with all the refinement. It feels like OpenOffice has passed Microsoft office in some ways, which is saying a lot. I guess I should check before I speak next time, and that OpenOffice can do exactly what he is looking for.

      --
      That's scary.
  20. Re:Umm...hello by Anonymous Coward · · Score: 0

    Ah well, PG, it looks like they've got you posting at 0 these days. It may be time to make a new account...I think you're just too well known. Sorry, man.

  21. XML-Cocoon. by Anonymous Coward · · Score: 0

    Since this is going online. I recommend Cocoon. It can do text, PDF, Doc, and I've even seen Excel.

  22. Conglomerate by Chexsum · · Score: 0

    Conglomerate handles Docbook/XML.

    --
    Pixels keep you awake!
  23. xml by SanityInAnarchy · · Score: 2, Interesting

    I've seen some posts here on XML, but most seem misleading. I've found that the most expressive and most flexible format is manual XML -- as in, your own dialect.

    That is, you define your own tags, and define what they mean. Then you create stylesheets to convert them to other things. Because the original XML contains your intention, not the eventual formatting, it makes it easy to convert, or to make broad, sweeping changes to presentation (as presentation is detached from content).

    The simplest example is, suppose you started out just using some HTML-like <pre> tag for code. You could easily search and replace that with something else, like say <div id="code"> for an HTML-style CSS implementation. That would actually be a step towards XML. The problem is, if you start out this way, and then you've got your hundreds of pages, you may have occasionally used <pre> for other things, like ASCII art.

    The right way to do this is create an XML tag <code> and stylesheets that convert it to, say, a little padded space for text, a <pre> tag for HTML, or something literally to do with fonts for postscript. That way you can change any detail of presentation in any format in one place (the stylesheet).

    There's also the fact that XML, especially encoded in UTF-8, is immortal, whereas say a Word upgrade (or any software upgrade, for that matter) could make your document obsolete. And even if there is never an XML parser left in the world, it is human-readable.

    --
    Don't thank God, thank a doctor!
  24. On a Mac it's easy... by foniksonik · · Score: 3, Interesting

    Too bad they are such a large investment just to publish. Applescript allows for doing very complicated workflows using multiple applications. You can even 'compile' it as a droplet for drag and drop functionality.

    If I was doing this on my Mac I would create a script to, in order: Save my Word file as Plaintext, Save it as HTML, Print it as PDF (OS X can print to PDF from any and all applications), use the ColorSync Utility to regenerate the PDFs with your desired compression settings, then use an HTML cleaner such as HTML Tidy to eliminate all the crappy MS HTML markup. With Applescript it's a point and click operation to create the script, just hit record and go through the motions described above, hit stop and save as a droplet. You can drag and drop any number of Word docs onto it when ever you need to 'publish'. You could add an FTP action or save to an iDisk as part of the workflow just as easily.

    The only thing you have to worry about is some of Word's [table] markup as it seriously blows when you try to convert to normal html.

    There are plenty of tools for XML/XSLT transforms that could be scripted as well but it could be overkill... or maybe not.

    If you had a Mac it would be easy.

    --
    A fool throws a stone into a well and a thousand sages can not remove it.
    1. Re:On a Mac it's easy... by TheSunborn · · Score: 1

      Have you ever seen the "html" that word create? It is so ugly, and non-standart that even internet-explore 5.5 got problems showing some of it.

  25. Straight XML, seriously? by jtheory · · Score: 1

    I think we need a solution that is workable with an editor that's designed for writing formatted text. There are just too many benefits to a proper editor (various tools, shortcuts, but especially simplicity of presentation).

    Say you're writing a bulleted list with 100 items. In pure XML that's a minimum of 2 tags per item, plus tage for the list. Now put the first word of each item in bold and the rest of the line in italics. This is pretty basic formatting... but you'll have to edit that list while wading among an extra 2800+ chars (assuming very short XML tags). You won't even be able to find your text after a while.

    Also don't forget that the writer already has 60 pages, fully formatted in Word. That would be unpleasant to reformat by hand.

    I'm skirting your real point so far though, for a reason -- I'm guessing you'd argue that the bulleted list above should just have a tag around the whole thing, like chapterEndSummary, and the stylesheet would control whether it was formatted as a bulleted list, a table with alternating colored rows, or whatever. This doesn't work out well in practice, though. I've spent a lot of time working with XML and XSL, and if you're going to be really flexible about the presentation, you need your data to be very organized (which means lots of tags making your text harder to read!). Think about it. How will you know when each item begins and ends? How will you know which is the word (or words) that will end up bolded in your stylesheet?

    Plus you will simply have to make a lot of formatting decisions as you write, because the content would differ depending on the format. If you want more than 20 words or so per item, that bulleted list is going to start looking really ugly.

    I would argue for saving the document in XHTML (possibly from OpenOffice - you can import the Word doc). Most of the tags will hold formatting information, but you can toss in class or id attributes into higher-level tags (maybe just divs) so that a stylesheet could modify the formatting below if as desired for specific kinds of content. Your document won't be obsolete even if OpenOffice disappears, and you can still run it through XSL with a lot of flexibility.

    Note: I haven't worked with DocBook at all, so I don't know how that might address these issues.

    --
    There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
    1. Re:Straight XML, seriously? by Anonymous Coward · · Score: 0

      I personally use LaTeX to do any reports or whathaveyou, but they're very small scale. If I had to do something very large scale, there's a possibility I'd use manual XML simply because it's at the top of the food chain, i.e. even though I've never used DocBook or some of these other tools mentioned, I know I can convert XML to anything else.

      That was my disclaimer. I know if I had a huge bulleted list to do, I'd probably do it with a straight text editor, using tabs to denote the bullets, then use a Perl script in a makefile somewhere to twiddle that into XML. Muhahahahaaaa.

      This thread has really only made me more sure of my decision to never use Word again. After a few months of using LaTeX, it's slower to use Word, no matter how large or small a document is.

    2. Re:Straight XML, seriously? by SanityInAnarchy · · Score: 1

      The writer has already offered to manually convert it.

      What we really need are better tools for editing XML. For me, I'm fine already with my 100 wpm + typing speed and pretty XML color scheme for vim, making it very easy to read.

      Have you ever used CuteHTML? It's proprietary, stupid, and yet has one genius feature -- as you start to type an html tag, it provides a drop-down list of all tags it knows of to match yours. What we need is something like that which can match evolving XML code -- learn as you type.

      Also, as you may or may not have seen from various online half-assed WYSIWIG html editors, it's easy enough to map CTRL-I to wrapping <i></i> tags around the highlited text, though I still say <emphasis> or something similar would work better.

      Remember, too, that while OOo, AbiWord, etc. do provide XML documents which can be edited, it's all not much better than XHTML in that it's presentation-based. You may be deciding you want something to look like a bulletted list, but if you only have an XHTML-style bulletted list, it's harder to change your mind and make it numbered, lettered, or ASCII-art.

      With straight XHTML or OOo/AbiWord plus div tags with id/class attributes, what is the advantage? You've got your presentation positively locked into your content, and the id/class attributes aren't any easier to add/manage/read than straight XML (perhaps harder).

      --
      Don't thank God, thank a doctor!
  26. OpenOffice could be the answer by sonamchauhan · · Score: 2, Informative
    I've been impressed by Openoffice. Even though it's slow and resource-hungry, here are some of OpenOffice's abilities that may apply here:

    1. Ability to save as non-cluttered HTML or XML
    2. Ability to publish to PDF
    3. Scriptability - to automate everything
    4. Ease of document maintenance (see the two caveats below)


    The other solutions presented so far suffer w.r.t # 4 - document maintenance. After all, if someone created their document in a visually rich editor like Word, it was probably because of ease of use and they will find it easiest to maintain it there. However, two conditions must be met for the system we're discussing:

    (A) It should be possible to constrain all aspects of the document, so its has a defined, machine-comprehensible structure

    (B) The maintenance application must integrate with a version-control and access-control system (a nice to have - so that document-maintenance is transparent)

    Regarding (A) - it seems useful to emulate an ability from the new "Pro" version of Word 2003 - custom XML schemas that constrain document content. From what I've read about it, its like the old document field macros and templates, but more powerful and using XML Schemas for validation. I'm *guessing* (not sure - can someone with OpenOffice expertise chip in?) OpenOffice could be made to do the same thing. For example - could a 'document schema' be defined that the document is *forced* to have it's title in the middle of the first page, it's index auto-generated on the second, text in a particular style, chapter and subchapter headings in other per-selected styles, a list of figures and list of tables auto-generated as Appendices 1 and 2... ?

    Regarding (B) - the scriptablity of OpenOffice should support invocation of CVS/Subversion clients.

    1. Re:OpenOffice could be the answer by oo_waratah · · Score: 1

      There is a project in Canberra Austrlia that uses a variation of the XML format to do the archiving. The OpenOffice.org file is a zipped file with xml files within it, there is a flat version of the xml file that can be used specifically for archiving. I don't know specifics but this will work with CVS. you might want to ask for more information on users@openoffice.org

      The XML format for MS Word is not very useful because it is highly restricted who is going to be able to use it, it will cost too much money for the average company to upgrade.

      For the rest of it there are always new features being added, I have not specifically seen these features on the list. It really depends on whether these really are that useful to the people using the software. My personal opinion is that most will continue to rely on templates and forcing a format is too restrictive for most document developers.

      I am really curious whether people see restrictive templates as positive or negative.

  27. OpenOffice. Really. by jonadab · · Score: 2, Interesting

    If you write your own major mode for Emacs, all these issues just go away.

    However, if you are lacking lisp-fu and absolutely must have a GUI-based
    WYSIWYG editor, OpenOffice may be a possible solution. You'll have to avoid
    workflows that result in creating styles with meaningless names. (For example,
    you can't just highlight some random text and start formatting it in various
    arbitrary ways. Instead, define your styles properly with names using the
    style catalog, and then apply your named styles to blocks of text. The styles
    you define should represent things that have meaning independent of the format,
    so that you can sensibly convert them into the various target formats.)

    Getting the XML out of an OO document is a simple matter of renaming it from
    foo.sxw to foo.zip and unzipping it. You can then use any XML parser you
    like (e.g., one of the XML modules off of CPAN) to transform it into DocBook
    or whatever. All of this can be automated.

    --
    Cut that out, or I will ship you to Norilsk in a box.
  28. Help&Manual by Anonymous Coward · · Score: 0

    Since you are not averse to using Windows programs, and commercial ones at that, Help&Manual is a possible solution. It will output to WinHelp, HTML Help, straight HTML and PDF with no problems, plus several other formats. Not sure about text, but a scripted HTML->Text transformation is easy.

    Doesn't help much for NetBSD stuff, but if you ever do Windows stuff too, the WinHelp and HTML Help can be very useful.

  29. plain roff by mzs · · Score: 2, Informative

    I use roff. It is a very simple document formater. The plus is that you automatically get unix style man pages for free. Use it with make to simplify your life as well.

    Here is a concrete example. I create a roff file rwlock.man as the source. Say I want a postscript doc, then I add the following to a Makefile.

    rwlock.ps : rwlock.man
    groff -man rwlock.man > rwlock.ps

    This uses GNU troff, on other systems you might use the troff included with your system and pipe through dpost.

    If I need a pdf file, that is easy from the postscript file.

    rwlock.pdf : rwlock.ps
    ps2pdf -dCompatibilityLevel=1.1 rwlock.ps rwlock.pdf

    You can use all sorts of other options to ps2pdf just do a 'man ps2pdf' to learn more. You can install ps2pdf in the usual ways for your system, it is a common package. If you are on MacOS X 10.3 you already have pstopdf which is similar in functionality.

    Say I want a plain text file of the documentation, then I add something like this to the Nakefile.

    rwlock.txt : rwlock.man
    nroff -man rwlock.man | col -bx > rwlock.txt

    If your system includes GNU nroff, then you can use something like gtroff -man rwlock.man | grotty -buo in the command instead. Some nroff's and their 'an' macro files are a bit old fashioned and do 66 characters per line and 66 lines per page so you may have to experiment a bit on your system. Solaris nroff is a pain with respect to this while FreeBSD gets it right.

    Then if you want html, GNU troff is the best way to go.

    rwlock.html : rwlock.man
    groff -man -Thtml rwlock.man > rwlock.html

    It generates fairly lean html with some comments that is easy to follow if you ever need to look at the source.

    If you have a whole bunch of files to process, then you can use suffix rules in make to simplify that job. There numerous troff, nroff, and man page HOWTOs on the web that you can read that make it a breaze to get started with roff. There shuld be standards if you want to conform to say FreeBSD or Linux man pages.

    You can go here to see the results from the 'rwlock' examples from this comment:

    http://www-bd.fnal.gov/controls/micro_p/rwlock.h tm l

    (Slashcode may break-up that url, I did this post in text because I did not want to deal with the lameness filter for all of the make rules.)

    You can do a man on any of the commands above to learn more about them. Also roff can do a whole lot more because you can have it run various other processors as it formats. In this way you can get tables, figures, pictures, references, and even primitive equations.

    Once you start doing that though, man pages cannot really look right anymore on a text console and xman is a kludge so often gets this wrong. If you start getting into more sophisticated equations for example, I would recommend latex. It is straight forward to get ps and pdf out of latex and you can try an add-on such as latex2html to get html output.

    Hope this helps.

  30. Frame by Anonymous Coward · · Score: 0

    FrameMaker is the Gold Standard for this kind of work.

    http://www.adobe.com/products/framemaker/main.ht ml

  31. A doc's end destination is formatted text by jtheory · · Score: 1

    The writer has already offered to manually convert it.

    Sure, if he has to, he will. Any solution that doesn't require that work has at least one good thing going for it, though.

    And I agree that better XML editing tools would help the pure-XML solution, but there are serious limits to what's possible especially when you're using your own invented tag set. The tool can't check to see if your DDL is well-designed. It can't represent the tag data in any other way (like showing the bolded text instead of the <strong> tags), so you still have to wade through that tag data when trying to edit your content. And you just can't compare auto-completion or even hilight-and-key-shortcut-for-each-tag-pair to just hilighting a whole page and clicking "bulleted list".

    I'm not a point-and-click kind of guy either. Almost all of the XML I work with is custom-designed DDL, usu. populated from a database. All of my XSL and HTML is hand-tweaked and sometimes completely hand-written. When I'm writing documentation, though, I always develop the content in an WYSIWYG editor, just because I can't follow a complicated explanation with all those tags in the way, and I don't want to have to constantly think about formatting (and you will have to think about formatting when you structure your XML, just because in the end every structural decision impacts the options you will have when building the presentation).

    After all, we're not generating an XML doc here that will be parsed by a computer and pushed into a database or object graph. It's human-targetted documentation. The main thing we're going to use the XML structure for is, in the end, formatting the content.

    So why design a DDL for formatting when we already have a standardized one? It has weak points, but it has been thought out, plus now some other writer will be able to work on the docs without learning your DDL. Besides, XHTML has a great advantage over plain HTML in that you can easily automate your tweaks to the formatting with XSL.

    With straight XHTML or OOo/AbiWord plus div tags with id/class attributes, what is the advantage? You've got your presentation positively locked into your content, and the id/class attributes aren't any easier to add/manage/read than straight XML (perhaps harder).

    No, you're missing out the XSL step.

    The id tags are there so that you can alter the presentation if you want to -- just push the thing through a stylesheet that (for example) converts the unordered list to a table with pretty images on the corners if your id attrib = 'chapterSummary'. It's pretty straight-forward. You could also tweak it (to a lesser degree) with CSS.

    None of these are ideal solutions, but this seems like the most logical for writing documentation.

    --
    There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
  32. I like Lyx by the_womble · · Score: 1

    same functionality and a nice simple to use GUI.

    I think it can import word.