Using the DocBook DTD for Internal Documents?
Saqib Ali asks: "These days, most of the Linux Documentation is created using DocBook DTD. I was wondering if it will be useful for a large Enterprise to create Internal IT documents using DocBook DTD. Any success stories where a large enterprise converted all of its internal IT documentation to DocBook, with management's support? Any other things/issues to keep in mind before embarking on such a mission?"
I was looking into doing this for a while with a number of the formatted documents my school needs to deal with. It turned out that the DTD was much more complex than warranted for the kind of stuff we were doing, but of course YMMV.
Ceci n'est pas un post
uses Cocoon2 as a web-publication engine. The Norm Walsh xslt sheets are your best general-purpose transformation, but they sometimes choke on Xalan. This Wiki Page should clear up that problem.
...or not. YMMV to a very great extent. I have tried to do it, and I liked what was coming as a result (almost) except being the only one in the group doing that was not much of a help. The greatest problem was interchanging docs with others. RTF stylesheets are ok and can be used, but...
Check out NTSGML pages (though they have not been updated for some time) if you end up doing this all under Windows. Also, I'd recommend sticking with generic SGML, not XML -- RTF converters for XSLT are not that good (I was not able to produce a single readable doc).
--AP
But the structure navigator in every single bloody XML editor I have ever tried, free or commercial, tends to look like this:
book
|
+--chapter
+--chapter
| |
| +--section
| +--section
|
|--chapter
ad nauseum. Not chapter titles, not section titles, the literal words chapter and section. Multiply this by hundreds of sections.
How. Completely. Useless.
Until I can find an XML editor with some bloody sense to its structure navigator, I would rather use word. And no, I don't really want to use a WYSIWYG editor, because I want to know what XML it generates for my custom xslt snippets (which I might add I also have similar problems navigating with these brain dead editors)
I've finally had it: until slashdot gets article moderation, I am not coming back.
It was a nightmare.
Anyone who was not a programmer balked at the idea of having to write documentation in a (Gasp!) markup language. "Just give me Word!" they would whine.
There is a lot of overhead associated with DocBook that most non-technical people don't want to deal with. They want a WYSIWYG editor, and will cry, kick, scream, and intentionally be completely unproductive until they get it.
Essentially your choices are Adobe Framemaker (~$800), Lyx (Open Source) and XMLmind (Freeware). There may be some others, but these are the ones I've looked at. These are the ones you can use like a WYSIWYG, but are more WYSIWYM (What you see is what you mean). For more info on WYSIWYM, look at Lyx's site.
DocBook is a great spec, but the editors suck for the most part. Lyx can't import DocBook in reliably, and your Docbook is stored as a lyx file (latex I think). Lyx's Docbook stuff can be a bear to set up, even on a system like RedHat where most of the software comes installed. I only recommend Lyx to people who have experience with Lyx, to someone who just wants to write docs, it tends to be more trouble than it's worth.
Framemaker will probably do everything you want and be a godsend with lots of nice features, but you'll pay for it, $800 for Win/Mac and ~$1300 for Unix.
XMLmind is pretty cool, it does Docbook well but is a little slow, it has a little bit of a learning curve, but is prolly the best Docbook editor I've found for free. It's not Open Source though. It is written in Java, so you might have some speed issues, depending on the platform you run it on. I've been recommending XMLmind to everyone I know that asks about Docbook, it has a tree view of the DOM as well as a WYSIWYM view with stylesheets applied on the fly. It has property editors and a pretty smart insert tool that follows the DTD, only allowing you to insert allowed tags into other tags. It feels like more of a programmer's tool than Framemaker, but it should be fairly easy for most WYSIWYG users to adjust.
<rant>
I don't understand why on God's green earth OpenOffice or Abiword or KOffice, or anyone else in the OpenSource world has neglected this area. It's been three years since the LDP went to DocBook, GNOME uses DocBook as their doc format. Why in the hell don't we have decent document writing tools when everyone is always screaming about the lack of documentation in the OpenSource world?
If we want more docs written, it needs to be easier to write them and shouldn't involve learning all about SGML or XML engines as well as a markup language to do it. DocBook is too big to keep in my head and I shouldn't have to think hard about how to write docs when my focus is the content I want to write for. Organizing technical info on a difficult subject is hard enough, stopping every five minutes to look up a DocBook tag or trying to better understand the structure is a huge barrier to getting the work done.
</rant>
But that's just my $.02
Arrogance is Confidence which lacks integrity. -- me
Ignoring the utterly braindead ``foo'' quotes, those filenames are ultra lame.
DocBook lets you specify a section ID which ends up being mapped to a filename when generating HTML; doesn't LaTeX haeve something like that?
help.unc.edu, UNC-Chapel Hill's technical support website, uses Docbook (and XML) extensively. The publication framework is Cocoon 2 under Tomcat, but I'm sure if you like Perl you could use Axkit too =)
You've obviously use LaTeX quite a bit already. That's hardly a fair comparison. You compare something with which you are already comfortable with something you haven't used at all before.
As far as markup goes, one of the reasons for using the open/close tag pair in XML was because so many people have written HTML and are used to that model.
As for complicated markup, there is a Simplified DocBook that reduces the amount of elements you have to know and keep track of while still remaining 100% DocBook compatible. Write a little now, and as your experience and comfort grows, so can your markup choice. Simplified DocBook now, full DocBook when the volume of documentation requires it later (By that time, more editors will have come out hopefully).
DocBook to PDF is handled by converting to XSL:FO (not to be confused with XSLT) syntax and serializing with something like FOP. LaTeX is actually closer to XSL:FO than to DocBook. If you're trying to convert to PDF by hand, you're expending more effort than you needed to. You can find premade stylesheets for HTML and FO and documentation about how to use them without reinventing the wheel. The advantage of going to XSL:FO instead of a direct DocBook-to-PDF is that there are serializers out there to output FO syntax to PDF, PostScript, PCL5, and RTF. It would be a shame to just make a one trick pony.
As for emacs, there are emacs extensions written for DocBook that help you with tag choices and automatically close the tags for you. Isn't that one of the main complaints you had about the syntax? And you're comfortable with emacs, right?
Note that you are using LaTeX to drive the layout. This is not how to use DocBook. In fact, DocBook goes out of its way to avoid any layout information in the file. Say you want to search for all documents with a section title that contains "apple". Anyone with a document parser can implement this no matter who wrote the DocBook file at any organization. LaTeX you could do this as long as everyone agreed upon the element identifiers -- which doesn't happen at every company. DocBook is content, HTML and PDF are layout, and never the twain shall meet...except during the transformation step.
If you prefer LaTeX, peace be with you. But they cannot really be compared as LaTeX -- while possible in implementation -- does not enforce a disctinction between semantic content and layout presentation. DocBook does. This adds some complexity for the initial startup sometimes, but it pays off when you actually have to organize and index those documents in an archive. You should talk to the folks at the Linux Documentation Project for more insight on this.
- I don't need to go outside, my CRT tan'll do me just fine.
don't tell him to use the SGML version. New development around DocBook is definitely centered around the XML variant of DocBook. As for RTF, I recommend using stylesheets that convert to XSL:FO and serializing them to RTF with something like jfor.
In my opinion, XSLT should not be used to generate something like RTF directly. XSLT was made to transform one XML schema to another. Period. Anything else is like trying to put the square peg in the round hole.
- I don't need to go outside, my CRT tan'll do me just fine.
Are there any good classes/school/online courses where the document writers can learn to develope DocBook based content. I have been writting for Linux documentation project for a while now. And I learned by looking at other XML/SGML content created by other people or machines. Is that the best way? Convert your existing non XML document to XML and go through it? I found that very useful in the begining. Any comments?
Consensus is good, but informed dictatorship is better
At one point in time I was very involved in OpenOffice.org. Now I have lost track of the developement. There were some talk to including DocBook DTD in the distribution. Does anyone, if any progress has been made on that?
Consensus is good, but informed dictatorship is better
As for wanting to know what the underlying XML is, "why!?!" For something like Word, where only formatting information is saved, I could see your concern. This is like the HTML output of Frontpage and Dreamweaver. But DocBook is a semantic construct with no formatting information. What you see in a GUI should be far less variable in the output data below.
With DocBook, you already know what code snippets it is generating without even looking at your editor; it's rigidly defined in the DTD. Your XSLT should be written to the DTD, not to a document.
- I don't need to go outside, my CRT tan'll do me just fine.
--
Simon
So far we've completed converting 3 of our "books" from Script to DocBook. The largest book being over 175 chapters with about 600 pages. The most time consuming problem was the project requirements were that the DocBook version must look very similar to the Script version. We used the XSL stylesheets from docbook.sf.net and FOP.
Script is a formatting language (think RTF) and DocBook is a markup language. There was a lot of inconsistant formatting in the Script versions which decreased readablilty. The consistant formatting of correctly marked up DocBook is a very good thing.
I spent a lot of time customizing the XSLT stylesheets. XSLT has a nice mechanism that allows you to import and then overide parts of the imported stylesheets. This is real nice because we can upgrade the upstream style sheets without modifing our customizations. This isn't completely true if there are big structual changes to the upstream stylesheets but since our changes are in seperate files it's rather easy to refit our customizations.
We had two people working on this project. One customizing the stylesheets, me, and another who took the Script source and added DocBook tags. This worked quite well. We were commited to the project and were able to stick with it until completion. This worked very well.
I encouraged another department to give DocBook a try and this didn't work so well. They currently only publish their interal docs to HTML and their documentation source was written in HTML. For them the overhead of DocBook and their lack of desire for paper output made it not worth it for them.
Previously we could only print to paper. Now we have a single source to generate HTML, PDF, Paper (from pdf), and Windows Compiled HTML Help files (basicly HTML with extra meta info).
Some people seem to just not understand the advantages of marking up the structure of the document instead of the formatting. If you want to use DocBook because of the hype then odds are you'll piss people off in the short time, maybe long term too, by forcing it on them. If you and management understands the long term advantages of structed documentation then I really recomend DocBook.
for all the reasons stated above and...
i was unable to produce a simple Howto document (bulleted list) because the docbook.xsl file had error(s).
when i reported these to the author (?) i was ignored.
now over a year later i'm kicking myself for not finishing my version of what docbook should be: doc-this!
i have been asked recently to finish this so i guess maybe it's woth the effort.
My father was hired by a publisher to translate some chapters of a physics book. They provided him with a few copies of the book along with theses instructions:
Don't use any formatting when writing your text, no bold, no italics, nothing. When there's a figure, place [FIGURE ##] where ## is the number of the figure. I repeat, do not do any formating, we won't accept your document if it's formated.
I'm pretty sure that they we're taking this unformated text and transforming it into docbook.
So you may want to do this: ask your non-technical people to write unformated text, and hire a technical person (programer) to do the markup.
A common mistake in the wysywig paradigm is pre-mature markup. People get slowed down making sure their masterpiece looks right (or worse fighting with the fsking tool), when really writing isn't related to how it looks - it's communication. Talk to any real writer, and you will probably find they use a plain format (paper, typwritten, textfiles, plain word docs).
/after/ the writing itself. My personal approach is to use a text editor, and then some simple custom scripts to convert it's obvious format into pdf, html/css, xml, troff, etc. The biggest win is I never fight with my editor, and I can concentrate on writing. And, I can export to any format I choose - though I do have to write the filter.
;-) I've even extracted some of it using filters to simplify their life more.
... but word docs and the same can work, though the tool tends to get in the way of thinking about communicating.
Markup should always happen
At work when doing professional documentation, our layout people extract the raw text and apply to their own Framemaker setups - so all the formatting our developers do is really in vain. The doc dept. has no trouble with my plain text stuff
Docbook itself is fine - but make life simple for the writers, don't make them think about markup (as much as possible anyway). My vote is on the plain-text editors + filters
My CDN$.02.
mx
DocBook -> XSL:FO -> PDF
XML processed with XSLT and serialized through FOP. Where is LaTeX used? XSLT doesn't have anything to do with LaTeX and FOP has nothing to do with LaTeX. Where do they rely on LaTeX?
Oh! You were talking about the LaTeX converters that Norman Walsh made available. Sorry. There's the confusion. If you use the FO stylesheets and FOP or iText for the PDF serialization, things are much much simpler. LaTeX shouldn't come into play unless you really want to use LaTeX.
And you are right that it is quite possible to make layout-free LaTeX. My statement was only that it does not enforce the separation of content and layout. This is the same as saying that there is nothing stopping a programming team from making clean, readable C with uniform indentation of code blocks, but Python doesn't allow the choice: clean, uniform indentation is an intrinsic piece.
It was not my intention to say that LaTeX made it impossible or even unduly difficult. Sorry for the confusion.
- I don't need to go outside, my CRT tan'll do me just fine.