Slashdot Mirror


Using the DocBook DTD for Internal Documents?

Saqib Ali asks: "These days, most of the Linux Documentation is created using DocBook DTD. I was wondering if it will be useful for a large Enterprise to create Internal IT documents using DocBook DTD. Any success stories where a large enterprise converted all of its internal IT documentation to DocBook, with management's support? Any other things/issues to keep in mind before embarking on such a mission?"

37 of 58 comments (clear)

  1. I end up having a lot of the same questions by quinto2000 · · Score: 3, Interesting

    I was looking into doing this for a while with a number of the formatted documents my school needs to deal with. It turned out that the DTD was much more complex than warranted for the kind of stuff we were doing, but of course YMMV.

    --
    Ceci n'est pas un post
    1. Re:I end up having a lot of the same questions by ichimunki · · Score: 3, Interesting

      I was recently looking heavily at DocBook/XML and comparing it to (La)TeX. I found all the tools for docbook completely lacking, and the XML format to be completely unfriendly to actually writing. LaTeX on the other hand, seems to kick ass for writing, since the markup is short, sweet and easy to learn/use. Not to mention that the algorithms used to perform the layout were designed by a damn genius instead of mere mortals. I've now used LaTeX in conjunction with pdflatex and latex2html to use a single set of source docs to generate both a web site and a PDF file (not to mention that you could also crank out postscript or just about anything you might need to do with documentation... TeX was designed so Knuth could write computer books after all).

      DocBook, on the other hand, has a lot of complicated markup-- I mean who enjoys using the PARA tag to open and close each and every paragraph? It would drive me insane. Then, after you finally find an editor that suits your needs, you still have to monkey around trying to convert the documents. I was able to get a DB file into HTML without too much pain, but PDF? Never managed it. I spent too many hours on what essentially would have translated the DB XML into TeX source anyway! Why not just write in TeX and be done with it.

      Finally, there is LyX for LaTeX which looks to be a WYSIaboutWYG editor, although I find it very convenient to just use emacs. I think the only problem I've had so far is getting figures to lay out on within text how I want, whereas TeX is pretty happy shoving them later, so that the body of the text can remain as fluid as possible. You can see the results on my site (where I suppose I ought to include a tarball of the actual LaTeX source files and the simple shell script that drives all the processing).

      --
      I do not have a signature
    2. Re:I end up having a lot of the same questions by ttfkam · · Score: 4, Informative

      There is also a Simplified DocBook DTD. We used it at my last job. It is a small but useful subset of DocBook that can get you started.

      All Simplified DocBook files are also completely valid DocBook documents. But there are far fewer elements and constructs to keep in your head. It's also geared toward smaller items such as articles instead of complete books. At my company, we made a couple of template documents and then just had people fill in the blanks. People ended up working faster once we got them to stop worrying about formatting and styling (non-trivial).

      Start writing in SD and as the collection of documents grows, you can look into combining them into a cohesive DocBook collection as time permits and your experience level grows.

      --

      - I don't need to go outside, my CRT tan'll do me just fine.
  2. one open source approach ... by Boiotos · · Score: 4, Informative

    uses Cocoon2 as a web-publication engine. The Norm Walsh xslt sheets are your best general-purpose transformation, but they sometimes choke on Xalan. This Wiki Page should clear up that problem.

  3. It may work... by AndyElf · · Score: 2

    ...or not. YMMV to a very great extent. I have tried to do it, and I liked what was coming as a result (almost) except being the only one in the group doing that was not much of a help. The greatest problem was interchanging docs with others. RTF stylesheets are ok and can be used, but...

    Check out NTSGML pages (though they have not been updated for some time) if you end up doing this all under Windows. Also, I'd recommend sticking with generic SGML, not XML -- RTF converters for XSLT are not that good (I was not able to produce a single readable doc).

    --

    --AP
  4. I try using XML to structure my docs... by scrytch · · Score: 3, Insightful

    But the structure navigator in every single bloody XML editor I have ever tried, free or commercial, tends to look like this:


    book
    |
    +--chapter
    +--chapter
    | |
    | +--section
    | +--section
    |
    |--chapter


    ad nauseum. Not chapter titles, not section titles, the literal words chapter and section. Multiply this by hundreds of sections.

    How. Completely. Useless.

    Until I can find an XML editor with some bloody sense to its structure navigator, I would rather use word. And no, I don't really want to use a WYSIWYG editor, because I want to know what XML it generates for my custom xslt snippets (which I might add I also have similar problems navigating with these brain dead editors)

    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
    1. Re:I try using XML to structure my docs... by aridhol · · Score: 2

      The problem is with the way XML works. Unless your XML editor only handles a limited set of document types (eg DocBook and HTML only), it doesn't know where to find chapter titles, section titles, etc. Is it ? Or is it foo? Or something completely different? Unless there's a standard way of marking up the titles, your editor has no way to extract the titles from the document for you.

      --
      I can't say that I don't give a fuck. I've just run out of fuck to give.
    2. Re:I try using XML to structure my docs... by Fweeky · · Score: 3, Interesting

      Um, then you tell it, using XPath, or even better; generate the listview using an XSLT like IE and Mozilla.

    3. Re:I try using XML to structure my docs... by Enry · · Score: 3, Interesting

      Look at how the Linux Documentation Project handles SGML/XML files. There are ways of handling this a lot better.

  5. We did it. by Some+guy+named+Chris · · Score: 5, Insightful

    It was a nightmare.

    Anyone who was not a programmer balked at the idea of having to write documentation in a (Gasp!) markup language. "Just give me Word!" they would whine.

    There is a lot of overhead associated with DocBook that most non-technical people don't want to deal with. They want a WYSIWYG editor, and will cry, kick, scream, and intentionally be completely unproductive until they get it.

    1. Re:We did it. by aridhol · · Score: 2
      intentionally be completely unproductive until they get it.
      There is no place for someone who is deliberately not doing their job. Discipline them. They should use legitimate channels to handle their problems, not act like spoiled two-year-olds.
      --
      I can't say that I don't give a fuck. I've just run out of fuck to give.
    2. Re:We did it. by Hard_Code · · Score: 3, Funny

      "There is no place for someone who is deliberately not doing their job."

      Except a union apparently.

      --

      It's 10 PM. Do you know if you're un-American?
    3. Re:We did it. by pete-classic · · Score: 3, Interesting

      How about WYSIWYM (What You See Is What You Mean)?

      Try LyX.

      Just click "title" and type the title. Click a button to turn italics on/off, etc.

      See http://bgu.chez.tiscali.fr/doc/db4lyx/ and http://www.lyx.org/help/xml/xml.php

      -Peter

    4. Re:We did it. by Twirlip+of+the+Mists · · Score: 2, Insightful

      There is no place for someone who is deliberately not doing their job. Discipline them. They should use legitimate channels to handle their problems, not act like spoiled two-year-olds.

      You know, I normally find your posts pretty thoughtful, and I often agree with them. But this time I think you're way off the mark. "Discipline them?" If you treat people like children, you shouldn't really be surprised if they act like children in return, should you?

      General-purpose computers are great things because they allow people to use the tools they find most effective to get the job done. In this example, what's the job? Producing documentation. (The submitter was talking about internal documentation, but the OP was talking about docs in general, evidently.) To produce documentation, you should use the tool that's best suited for producing documentation, not the one that looks coolest on paper or that has the neatest feature set or whatever.

      Writing structured documents in something like LaTeX (with which I have some experience) or XML (with which I have less) works well up to a point... but only up to a point. If your document is going to be basically prose-- unformatted paragraphs organized into sections, chapters, and books-- then writing with a markup language will probably work well. The ratio of content to markup will be small, so you can just concentrate on your words.

      But if you want to create even something as simple as a bulleted list, suddenly you have to deal with markup. Creating a bulleted list in Word is trivial; you click the "bulleted list" button and go to town. Creating a bulleted list in LaTeX or XML is more work, and it scatters markup throughout your document in an unappealing and unpleasant way.

      So markup works in some situations, but in others it's not a good solution. This is what we should be talking about here. Not talking about disciplining coworkers who "act like spoiled two-year-olds."

      I just think you're forgetting what the purpose of computers and IT is: to give people the tools they need to do their jobs. Any system that requires its users to work in a way that they're not happy with is flawed, and could be improved somehow.

      (Sorry about the rant.)

      --

      I write in my journal
    5. Re:We did it. by aridhol · · Score: 2
      I agree with you. However, if management had dictated that you must do something in a given way, you have the following options:
      • Talk to management about it. Follow your company's procedures for bringing up issues.
      • Propose another method (works well in conjunction with the above)
      • Deal with it.
      • Get a different job
      • Deliberately avoid doing your assigned tasks
      Do whichever of the above you want. However, if you chose not to work, don't be surprised when your employer chooses not to pay you.
      --
      I can't say that I don't give a fuck. I've just run out of fuck to give.
    6. Re:We did it. by Christopher+Cashell · · Score: 2

      It's unreasonable like carving a roast beast-- er, sorry, too much Dr. Seuss-- carving a roast beef with a screwdriver is unreasonable. If the person doing the job finds the tool inappropriate, maybe the mandate should be reconsidered.

      Or perhaps the person doing the job should realize that no job is perfect, and at some point they're going to have to accept some restrictions from their employer on how they do their job. At least, if they want to get paid. ;-)

      Your analogy of carving a roast with a screwdriver doesn't really hold up, because most of the things we're discussing here, LaTeX, XML, etc, were specifically designed for authors. A better analogy would be that you are carving a roast, and need to pick a knife. LaTeX would be one type of knife, while XML/DocBook would be another type.

      Just because someone doesn't like the knife they were given doesn't mean that it's the wrong knife. They may just be ignorant of it. Or it may be that the company is standardizing on a single type of knife so that it can more easily share the knives among employees.

      Ah, but that's the thing. Mandating the use of XML for technical writing gets in the way of the job. If you're spending time tweaking document structure in an obscure language, you're not writing.

      Have you ever used XML (Assuming that we're specifically talking about DocBook, as that was designed specifically for use by authors, particularly technical writers)?

      DocBook/XML was specifically designed for creating documents and books. Additionally, XML is not an osbscure language, nor very difficult to work with. Espcially in this age of the Web, everyone is familiar with HTML, making DocBook fairly easy to pick up. As if that wasn't easy enough, there are numerous XML editors available that can make it even easier to work with.

      Unless all writing is done in plain text, you will have to deal with some work to make it presentable. Whether that be in a word processor, in LaTeX, in DocBook, whatever, it will have to be done. The question that has to be asked is which format will provide the greatest benefits with the fewest detriments. Depending on the goals of the company, the individual authors may very well not be the best person to make these decisions.

      All I'm saying is this: you will almost certainly gain more efficiency and productivity by letting your people do their jobs with the tools they prefer than by requiring the use of any one tool, not matter what its technical or political merits might be.

      Ah, but you're looking at this in a very limited way. Yes, you may gain more efficiency in the short run, by individual authors, by letting each person use whatever they want. But in the long run, you could end up spending literally 10 times as long making the end product meet the company's needs.

      It's easy for an individual person to look at the situation and say, "I could write this document in only three hours if I could do it in 'foo', but doing it in DocBook/XML will take me four hours", and think that it would be much more efficient to write it in 'foo'. But if this individual is writing a single article that will be combined with four other articles into a single work, and it will take six hours for someone to combine the five differently formatted articles into that single work, then collectively, you've just lost an hours worth of work.

      And no, this isn't a purely theoretical example. At a previous employer, we had a situation like this occur. Eventually, we standardized on a single framework for all technical writing and documentation. At first, it did slow people down a little bit, as they were forced to learn the new system. Once everyone became used to it, though, it worked *much* better than before. Being able to easily share and merge documents allowed us to create a single, central, information repository, easily accessible and usable by everyone.

      Lastly, while you throw out technical merits with a single statement, it's not something to be overlooked. Depending on what your end goals are, you may *need* to consider technical merits in order to get the job done. For example, if your end result needs to be available as a PDF file, then you better be using tools that support PDF generation. If you're not, then no matter how productive you might think you are, you're never going to get your job done. Sometimes it's more important to fit your tools to your job, than to fit them to a specific person.

      --
      Topher
  6. No Suitable Editors by GOD_ALMIGHTY · · Score: 4, Interesting

    Essentially your choices are Adobe Framemaker (~$800), Lyx (Open Source) and XMLmind (Freeware). There may be some others, but these are the ones I've looked at. These are the ones you can use like a WYSIWYG, but are more WYSIWYM (What you see is what you mean). For more info on WYSIWYM, look at Lyx's site.

    DocBook is a great spec, but the editors suck for the most part. Lyx can't import DocBook in reliably, and your Docbook is stored as a lyx file (latex I think). Lyx's Docbook stuff can be a bear to set up, even on a system like RedHat where most of the software comes installed. I only recommend Lyx to people who have experience with Lyx, to someone who just wants to write docs, it tends to be more trouble than it's worth.

    Framemaker will probably do everything you want and be a godsend with lots of nice features, but you'll pay for it, $800 for Win/Mac and ~$1300 for Unix.

    XMLmind is pretty cool, it does Docbook well but is a little slow, it has a little bit of a learning curve, but is prolly the best Docbook editor I've found for free. It's not Open Source though. It is written in Java, so you might have some speed issues, depending on the platform you run it on. I've been recommending XMLmind to everyone I know that asks about Docbook, it has a tree view of the DOM as well as a WYSIWYM view with stylesheets applied on the fly. It has property editors and a pretty smart insert tool that follows the DTD, only allowing you to insert allowed tags into other tags. It feels like more of a programmer's tool than Framemaker, but it should be fairly easy for most WYSIWYG users to adjust.

    <rant>
    I don't understand why on God's green earth OpenOffice or Abiword or KOffice, or anyone else in the OpenSource world has neglected this area. It's been three years since the LDP went to DocBook, GNOME uses DocBook as their doc format. Why in the hell don't we have decent document writing tools when everyone is always screaming about the lack of documentation in the OpenSource world?

    If we want more docs written, it needs to be easier to write them and shouldn't involve learning all about SGML or XML engines as well as a markup language to do it. DocBook is too big to keep in my head and I shouldn't have to think hard about how to write docs when my focus is the content I want to write for. Organizing technical info on a difficult subject is hard enough, stopping every five minutes to look up a DocBook tag or trying to better understand the structure is a huge barrier to getting the work done.
    </rant>

    But that's just my $.02

    --
    Arrogance is Confidence which lacks integrity. -- me
    1. Re:No Suitable Editors by KnightStalker · · Score: 2

      Wow, I've been looking for a tool that does that for a long time.

      What is really unfortunate is that, even if you somehow convince people to use this tool, once they discover that <citation> produces essentially the same formatting as <image_caption> (or whatever two tags), then they'll either use the two interchangeably for whatever, or they'll use one or the other exclusively for things that are unrelated to citations or captions. Nobody except programmers cares at all about document structure, and you can't force them to. All people want, and all they'll think about, is pretty layout.

      (rant mode off)

      --
      * And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."
    2. Re:No Suitable Editors by booch · · Score: 2
      What is really unfortunate is that, even if you somehow convince people to use this tool, once they discover that <citation> produces essentially the same formatting as <image_caption> (or whatever two tags), then they'll either use the two interchangeably for whatever, or they'll use one or the other exclusively for things that are unrelated to citations or captions.

      I think that most documentation people can understand such distinctions. To drive the point home better, use different styles for each -- at least while they are editing. You can do this with the WYSIWYG editors such as Morphon -- just use a different color for each. Or you could create preview stylesheets out of the standard Norm Walsh templates.

      --
      Software sucks. Open Source sucks less.
    3. Re:No Suitable Editors by KnightStalker · · Score: 2
      I think that most documentation people can understand such distinctions. To drive the point home better, use different styles for each -- at least while they are editing. You can do this with the WYSIWYG editors such as Morphon -- just use a different color for each.

      No doubt the best documentation people understand this, but in my experience, most either don't understand it or don't care. And if you enforce the difference between types like this, then what they see is ugly, and it'll be nothing like what they eventually get. This, reasonably, makes them resistant to using the software.

      Actually, in my experience, most people working on documentation were dragged there from something else they'd rather be working on, and often even have to be shown such advanced concepts as copy and paste. Therefore creating documentation should be really easy, but worrying about structure just isn't easy. Making this bold, and that italic is easy, though. This problem won't be solved until we can create heuristics that just figure out what you mean when you make a block of text such-and-such a style, or at the very least can separate the "styled text" part of a document from the "containing layout" part, and can reliably extract the important styles from the ones that change between presentations. Either that, or every company hires expensive professional documentors.

      Part of the problem, I think, is that many people who work on documentation were trained on typewriters or desktop publishing software. And though those have justifiably gone out of fashion, nobody except programmers is interested in learning what they see as the paradigm of the week.

      --
      * And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."
    4. Re:No Suitable Editors by ttfkam · · Score: 2
      Why in the hell don't we have decent document writing tools when everyone is always screaming about the lack of documentation in the OpenSource world?

      Because good editors are hard to write and a vast majority of the sufficiently talented coders who could do it still don't grasp the concept of content being separate from layout. You can't code what you don't understand.

      That coupled with -- what other have touched on -- users who can't accept that what they edit is not necessarily what it will ultimately look like.

      "I want to put this in italics."
      "Why?"
      "Because the image captions should all be in italics here."
      "So put the text in a <caption> tag."
      "But it's not in italics in my editor."
      "It will be in italics when it's published."
      "But it's not in italics in my editor."
      *sigh*

      You're right, we need better editors.
      --

      - I don't need to go outside, my CRT tan'll do me just fine.
  7. Your site.. by Fweeky · · Score: 3, Interesting
    http://www.ichimunki.com/: (pretend I have <cite> around that ;)
    ``Deep'' linking discouraged because the page names are dynamic

    Ignoring the utterly braindead ``foo'' quotes, those filenames are ultra lame.

    DocBook lets you specify a section ID which ends up being mapped to a filename when generating HTML; doesn't LaTeX haeve something like that?
    1. Re:Your site.. by ichimunki · · Score: 2, Interesting

      The quotes are an artifact of LaTeX, which I'm sure could be easily removed by tweaking the latex2html script (it may even be an option)... however, as you see them, they are being strictly translated from the correct inputs to get left/right quotes in TeX, which then ensures they look right in printed copy. I'll have to look into it. While it doesn't bother me much, obviously it bugs someone. :)

      And yes, there is an option to have the resulting .html files have better names, but latex2html does not have any provision to prevent name collision-- so I opted out of it in this case (not that I needed to worry about that, so your point is valid and I will change that). LaTeX (like DocBook) has a facility for both regular names for chapters/sections/whatevers and a place to put abbreviated names (for use in places like tables of contents, references, and headers). The filenaming in latex2html does not use this, but rather a set number of words from the title (IIRC).

      In a perfect world, I'd like to see a system that combined the best of Wiki, TeX, and DocBook (I have nothing against XML, I just don't know if I'm in love with DB's DTD yet), so that you could have the pages be fairly interactive for online references (especially useful in a corporate setting), but still generate standalone documents from the entire work. All with complete revision control, of course.

      I settled on what seemed to be the best compromise available so that I could have a single set of source files produce both printed matter and a website. Ultimately the possibilities with XML seem greater (via stylesheets and xsltproc and custom document parsers written in languages like Perl or Ruby), but getting from XML to PostScript or PDF is the part I had problems with. I like to think if I had problems with it, so would others. But then I limited myself to Free Software, whereas someone willing to use non-Free software might easily find an off-the-shelf package to get around the PS/PDF hurdle.

      --
      I do not have a signature
    2. Re:Your site.. by ttfkam · · Score: 3, Informative
      But then I limited myself to Free Software, whereas someone willing to use non-Free software might easily find an off-the-shelf package to get around the PS/PDF hurdle.

      Check out Apache Cocoon and Norman Walsh's DocBook stylesheets at Sourceforge. It sounds very much like what you are looking for both for batch processing of documents (using command-line mode) and for online dynamic presentation. There is even a serializer to PCL5 in case you ever wanted to send directly to HP-compatible printers.
      --

      - I don't need to go outside, my CRT tan'll do me just fine.
    3. Re:Your site.. by dublin · · Score: 2

      If you don't really care about sticking with the arcane tags and XML syntax of DocBook, you might want to consider anoter option that will get the job done easier and quicker: HTMLDOC.

      This is a really clever program that allows you to take a regualr web page and produce very nice PDFs (or PostScript) from it. It supports a few new tags that let you do things like page breaks, headers/footers and such that always should have been in HTML (even if only as a hint for printing) but wasn't. It automatically builds tables of contents (fully clickable in the PDF), cover pages, and the like, too.

      I've started using this tool more and more often over the last few months. It's just too handy for words. You can find it at Easy Software. (And yes, it's open source.)

      --
      "The future's good and the present is nothing to sneeze at." - Roblimo's last ./ post
    4. Re:Your site.. by ttfkam · · Score: 2
      As a matter of fact, I don't have a love affair with XML syntax per se. As for arcane tags, I mostly use Simplified DocBook which has such arcane tags as , , , , and . It's different from HTML, yes, but no more arcane (and I think rather less) than
        ,
      1. , , , and . If you want to talk about arcane, at least call a spade a spade.

        You have a solution and you seem to like it. My problem with it is the mixture of content and layout. Bold, italics, strikeout, and underline have no intrinsic meaning: they are visual cues for underlying themes. When they are the only model, you by definition lose the semantic background to the document. "What's the problem," you ask? For static HTML and PDF presentation, there is no problem for human readers. But it removes the possibility of automated, intelligent indexing and categorization.

        ...another option that will get the job done easier and quicker: HTMLDOC.

        I retort with Yoda. "Is the dark side stronger?" "No! Quicker. Easier. More seductive." :)
      --

      - I don't need to go outside, my CRT tan'll do me just fine.
    5. Re:Your site.. by Fweeky · · Score: 2
      The quotes are an artifact of LaTeX, which I'm sure could be easily removed by tweaking the latex2html script (it may even be an option)... however, as you see them, they are being strictly translated from the correct inputs to get left/ right quotes in TeX, which then ensures they look right in printed copy

      Well, you can specify left and right quotes in HTML:
      <!ENTITY ldquo CDATA "&#8220;" -- left double quotation mark,
      U+201C ISOnum -->
      <!ENTITY rdquo CDATA "&amp;#8221;" -- right double quotation mark,
      U+201D ISOnum -->
      latex2html should either use them, <q> tags, or normal double quotes. Not abusing backticks (note they don't look anything like the mirror of ' in an awful lot of fonts, including ones very popular online, such as Verdana).

      Doing it TWICE to emulate double quotes means the author of latex2html is going to hell for sure (along with 1001 online newspaper editors) :)

      And yes, there is an option to have the resulting .html files have better names, but latex2html does not have any provision to prevent name collision-- so I opted out of it in this case (not that I needed to worry about that, so your point is valid and I will change that)

      Ah, yes, that's better. Better to be dependent on title and have some meaning than be dependent on order in the table of contents and have little meaning :)

      In a perfect world, I'd like to see a system that combined the best of Wiki, TeX, and DocBook (I have nothing against XML, I just don't know if I'm in love with DB's DTD yet), so that you could have the pages be fairly interactive for online references (especially useful in a corporate setting), but still generate standalone documents from the entire work. All with complete revision control, of course.

      Yes, that would be nice. WebDAV with a versioning backend like SubVersion has some potential for document management - better imo than the million-and-one forms approach.

      Document formats are a little more hairy. Sometimes I feel like using something like AFT, which is pretty close to plain text. Other times I want to use XHTML, or DocBook, or my own schema. Some front-end which handles all of them would be nice :)

      I'm not really bothered by print, but I do want my documents to be stand-alone from the website. I want navigation elements to grow dynamically from the metadata in my documents, or from some external metadata file. I also want to be able to generate documents from databases etc and have them plug in nicely with the filesystem and keep nice abstract and stable URI's.

      Unfortunately I'm pretty sure I'm gonna have to write this myself. Being a professional slacker, this will likely take a while :)

      But then I limited myself to Free Software, whereas someone willing to use non-Free software might easily find an off-the-shelf package to get around the PS/PDF hurdle.

      DocBook to PS/PDF isn't too hard. If you can find a generalised XSL:FO engine you would be able to use an arbitrary XML document provided you have a stylesheet for it. Failing that, a browser, CSS print media rules and an option to print to a PS file would probably be ok; converting PS to PDF shouldn't be a problem, and CSS can style any XML document you like.
  8. Sounds like you already made up your mind by ttfkam · · Score: 2, Insightful

    You've obviously use LaTeX quite a bit already. That's hardly a fair comparison. You compare something with which you are already comfortable with something you haven't used at all before.

    As far as markup goes, one of the reasons for using the open/close tag pair in XML was because so many people have written HTML and are used to that model.

    As for complicated markup, there is a Simplified DocBook that reduces the amount of elements you have to know and keep track of while still remaining 100% DocBook compatible. Write a little now, and as your experience and comfort grows, so can your markup choice. Simplified DocBook now, full DocBook when the volume of documentation requires it later (By that time, more editors will have come out hopefully).

    DocBook to PDF is handled by converting to XSL:FO (not to be confused with XSLT) syntax and serializing with something like FOP. LaTeX is actually closer to XSL:FO than to DocBook. If you're trying to convert to PDF by hand, you're expending more effort than you needed to. You can find premade stylesheets for HTML and FO and documentation about how to use them without reinventing the wheel. The advantage of going to XSL:FO instead of a direct DocBook-to-PDF is that there are serializers out there to output FO syntax to PDF, PostScript, PCL5, and RTF. It would be a shame to just make a one trick pony.

    As for emacs, there are emacs extensions written for DocBook that help you with tag choices and automatically close the tags for you. Isn't that one of the main complaints you had about the syntax? And you're comfortable with emacs, right?

    Note that you are using LaTeX to drive the layout. This is not how to use DocBook. In fact, DocBook goes out of its way to avoid any layout information in the file. Say you want to search for all documents with a section title that contains "apple". Anyone with a document parser can implement this no matter who wrote the DocBook file at any organization. LaTeX you could do this as long as everyone agreed upon the element identifiers -- which doesn't happen at every company. DocBook is content, HTML and PDF are layout, and never the twain shall meet...except during the transformation step.

    If you prefer LaTeX, peace be with you. But they cannot really be compared as LaTeX -- while possible in implementation -- does not enforce a disctinction between semantic content and layout presentation. DocBook does. This adds some complexity for the initial startup sometimes, but it pays off when you actually have to organize and index those documents in an archive. You should talk to the folks at the Linux Documentation Project for more insight on this.

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
  9. Oh for heaven's sake by ttfkam · · Score: 2

    don't tell him to use the SGML version. New development around DocBook is definitely centered around the XML variant of DocBook. As for RTF, I recommend using stylesheets that convert to XSL:FO and serializing them to RTF with something like jfor.

    In my opinion, XSLT should not be used to generate something like RTF directly. XSLT was made to transform one XML schema to another. Period. Anything else is like trying to put the square peg in the round hole.

    --

    - I don't need to go outside, my CRT tan'll do me just fine.
    1. Re:Oh for heaven's sake by AndyElf · · Score: 2
      In my opinion, XSLT should not be used to generate something like RTF directly. XSLT was made to transform one XML schema to another. Period. Anything else is like trying to put the square peg in the round hole.

      That is what I used. Problem is, I guess, that I was trying to do it under WinNT and there may have been a few quirks that just would not let it work fully. For one, jfor would nver produce anything anywhere resembling what was expected.


      Another annoying thing was that I actually had to run a web server on my lap top to be able to generate anything: all the tools (except, I think xsltproc) were very insistent on going to OASIS website to read latest & greatest DTD! Maybe again, I ahve missed something, but I could not persuade neither saxon, nor xerces/xalan to use local copy of DTD...

      --

      --AP
    2. Re:Oh for heaven's sake by ttfkam · · Score: 3, Informative
      Yes, this isn't documented well enough. (I'm not being sarcastic -- it actually took a bit of hunting to find this)

      From http://xslt-process.sourceforge.net/docbook.php

      A better solution is to create a local copy of the DocBook DTD files. To do this go to http://www.oasis-open.org/docbook/xml/4.1.2/ and download the ZIP file containing the DocBook DTD. Put it in an accesible place on your file system, for example in /usr/local/share/docbook-4.1.2. Then modify the DOCTYPE of your DocBook documents to be:
      <!DOCTYPE book
      PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
      "file:/usr/local/share/docbook-4.1.2/docbookx.dtd" >

      I also know that there's a way to specify it as a general resource and to have a catalog that keeps from having to hardcode each file to a path, but I don't remember the syntax or the steps offhand.

      Hope this helps with your laptop problem.
      --

      - I don't need to go outside, my CRT tan'll do me just fine.
    3. Re:Oh for heaven's sake by AndyElf · · Score: 2

      This was what I logically concluded myself as well, and tried. I had a 'proper' URL, i.e. 'file:///path/to/DTD' -- maybe that's what was wrong, because still it did not work.

      --

      --AP
  10. What about OpenOffice.org by stonebeat.org · · Score: 3, Interesting

    At one point in time I was very involved in OpenOffice.org. Now I have lost track of the developement. There were some talk to including DocBook DTD in the distribution. Does anyone, if any progress has been made on that?

  11. As it should be by ttfkam · · Score: 2
    A title is a part of an article or chapter just as the content is.
    book
    |
    +--title
    +--chapter
    | |
    | +--title
    | +--section
    | |
    | +--title
    | +--para
    | +--para
    |
    +--chapter
    |
    +--title
    etc. This is because the editor is strictly following the DocBook schema. Chances are that the editor authors wanted their editor to be schema-agnostic. If your comment is saying that editors should be better, I wholeheartedly agree.

    As for wanting to know what the underlying XML is, "why!?!" For something like Word, where only formatting information is saved, I could see your concern. This is like the HTML output of Frontpage and Dreamweaver. But DocBook is a semantic construct with no formatting information. What you see in a GUI should be far less variable in the output data below.

    With DocBook, you already know what code snippets it is generating without even looking at your editor; it's rigidly defined in the DTD. Your XSLT should be written to the DTD, not to a document.
    --

    - I don't need to go outside, my CRT tan'll do me just fine.
  12. In progress of converting by Leknor · · Score: 2

    So far we've completed converting 3 of our "books" from Script to DocBook. The largest book being over 175 chapters with about 600 pages. The most time consuming problem was the project requirements were that the DocBook version must look very similar to the Script version. We used the XSL stylesheets from docbook.sf.net and FOP.

    Script is a formatting language (think RTF) and DocBook is a markup language. There was a lot of inconsistant formatting in the Script versions which decreased readablilty. The consistant formatting of correctly marked up DocBook is a very good thing.

    I spent a lot of time customizing the XSLT stylesheets. XSLT has a nice mechanism that allows you to import and then overide parts of the imported stylesheets. This is real nice because we can upgrade the upstream style sheets without modifing our customizations. This isn't completely true if there are big structual changes to the upstream stylesheets but since our changes are in seperate files it's rather easy to refit our customizations.

    We had two people working on this project. One customizing the stylesheets, me, and another who took the Script source and added DocBook tags. This worked quite well. We were commited to the project and were able to stick with it until completion. This worked very well.

    I encouraged another department to give DocBook a try and this didn't work so well. They currently only publish their interal docs to HTML and their documentation source was written in HTML. For them the overhead of DocBook and their lack of desire for paper output made it not worth it for them.

    Previously we could only print to paper. Now we have a single source to generate HTML, PDF, Paper (from pdf), and Windows Compiled HTML Help files (basicly HTML with extra meta info).

    Some people seem to just not understand the advantages of marking up the structure of the document instead of the formatting. If you want to use DocBook because of the hype then odds are you'll piss people off in the short time, maybe long term too, by forcing it on them. If you and management understands the long term advantages of structed documentation then I really recomend DocBook.

  13. don't use markup, generate it by madmaxx · · Score: 2, Insightful

    A common mistake in the wysywig paradigm is pre-mature markup. People get slowed down making sure their masterpiece looks right (or worse fighting with the fsking tool), when really writing isn't related to how it looks - it's communication. Talk to any real writer, and you will probably find they use a plain format (paper, typwritten, textfiles, plain word docs).

    Markup should always happen /after/ the writing itself. My personal approach is to use a text editor, and then some simple custom scripts to convert it's obvious format into pdf, html/css, xml, troff, etc. The biggest win is I never fight with my editor, and I can concentrate on writing. And, I can export to any format I choose - though I do have to write the filter.

    At work when doing professional documentation, our layout people extract the raw text and apply to their own Framemaker setups - so all the formatting our developers do is really in vain. The doc dept. has no trouble with my plain text stuff ;-) I've even extracted some of it using filters to simplify their life more.

    Docbook itself is fine - but make life simple for the writers, don't make them think about markup (as much as possible anyway). My vote is on the plain-text editors + filters ... but word docs and the same can work, though the tool tends to get in the way of thinking about communicating.

    My CDN$.02.

    --
    mx
  14. hunh? by ttfkam · · Score: 2

    DocBook -> XSL:FO -> PDF

    XML processed with XSLT and serialized through FOP. Where is LaTeX used? XSLT doesn't have anything to do with LaTeX and FOP has nothing to do with LaTeX. Where do they rely on LaTeX?

    Oh! You were talking about the LaTeX converters that Norman Walsh made available. Sorry. There's the confusion. If you use the FO stylesheets and FOP or iText for the PDF serialization, things are much much simpler. LaTeX shouldn't come into play unless you really want to use LaTeX.

    And you are right that it is quite possible to make layout-free LaTeX. My statement was only that it does not enforce the separation of content and layout. This is the same as saying that there is nothing stopping a programming team from making clean, readable C with uniform indentation of code blocks, but Python doesn't allow the choice: clean, uniform indentation is an intrinsic piece.

    It was not my intention to say that LaTeX made it impossible or even unduly difficult. Sorry for the confusion.

    --

    - I don't need to go outside, my CRT tan'll do me just fine.