DocBook 5
frisket writes "Definitive guides by the authors or maintainers of software systems tend to have the edge over other documentation because of the insight they provide. DocBook 5 — The Definitive Guide comes well up to scratch. DocBook has long been the de facto standard for computer system documentation in XML (and SGML before that), and Norm Walsh has revised and updated both the language and the documentation in a concise and valuable form, usable both by beginners and by tech doc experts." Read on for the rest of frisket's review.
DocBook 5: The Definitive Guide
author
Norman Walsh
pages
560
publisher
O'Reilly in conjunction with XML Press
rating
9/10
reviewer
frisket
ISBN
9780596805029
summary
Examines and catalogs the entirety of the DocBook specification.
DocBook is a rich XML vocabulary, primarily for the documentation of software systems. It provides markup both for the structure of your documents and for the descriptive detail of your writing, to an extent that few other XML systems match. Like XML itself, DocBook's popularity rests on its robustness, scope, and extensibility; and Walsh makes it clear that the Technical Committee has tried hard to balance stability and adaptability in releasing a new major version which does have a few backward-incompatible changes.
This is a reference book, so the initial chapters (1-5) are short (70 pages) but full of clear explanations of how DocBook works, what it does, and how to use it. Part II is 400 pages, covering every element type in the language, with a detailed description of what it is for, how and where to use it, and how it interacts with everything else. Both for the beginner and the expert, these descriptions are the key to effective use, and Walsh's explanations are clear and comprehensive.
For those of you who have been using DocBook in earlier incarnations, the changes are not deal-breakers, and many of them are welcome rationalizations of the way things have grown organically over the years. It still walks like a duck and quacks like a duck (and the book still has a duck on the cover), so it immediately feels like the same format that you're used to — the changes to element types are relatively few. Chapter 1 (Getting Started) has a brief history, a summary of the changes, and an explanation of the namespace and availability.
If you've never used DocBook before, its structure will still be familiar: in Chapter 2 (Creating DocBook Documents) Walsh explains the division of reference material like books, articles, and manuals into chapters, sections, and subsections, with all the conventional features like lists, figures, tables, and references, as well as the technically-oriented features like equations, programming constructs, interface descriptions, and code samples.
There is help in Chapter 3 (Validation) for those who construct or generate DocBook documents without the use of an XML editor (or even with them: more on editors below). The most common problems with misplaced markup (and the error messages they create) are clearly explained with examples.
Chapter 4 (Publishing) very briefly explains the role of stylesheets (CSS, XSL, and XQuery) in displaying and transforming your documents to other formats, but as these all have their own books and manuals, this book doesn't go into them in any detail.
Customizing DocBook is fairly commonplace, either to avoid the need to commit tag abuse, or to extend its structure into other fields (I added a new element type for typographical examples for my book on LaTeX, and it only took a few minutes). Chapter 5 provides some rules and explanation of customization layers and modularity for those who design schemas and DTDs.
The five Appendixes cover Installation, Variants, Resources, Interchange, and the GNU Free Documentation License — yes, you can read the whole thing online at docbook.org, for which Tim, Norm, and many others are to be thanked. It is a rare publisher who groks the need to be able to point someone at a reference, or quote it in email or a tweet, where a paper copy doesn't cut the mustard.
There isn't anything here about actually using an XML editor or about how to choose one. Editors do of course all come with their own documentation (much of it written using DocBook) and editor selection can be a complex business. However, there is a list of some common tools in Appendix C (Resources). Editors are a minefield, as my own research into the usability of editing software for structured documents is showing, so I can understand the omission, but some pointers to editor resources would have been useful.
The chapter on Publishing is useful for those who haven't been in the publication process before, but it could have emphasized more the need for accuracy and consistency. Experienced technical authors know this, but many other writers don't see the need for it, assuming that the publisher (or some elf) will automagically heal everything before publication. DocBook 5 and this book will help enormously, but author-edited documents sometimes unwittingly misuse or abuse the markup, no matter how exhaustive the manuals.
If you write computer documentation, or anything related to it, from a conference paper to a thesis to a book, DocBook 5 is probably what you should use if you want the document to survive and to be usable and reusable; and this is the book to help you do it.
You can purchase DocBook 5: The Definitive Guide from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This is a reference book, so the initial chapters (1-5) are short (70 pages) but full of clear explanations of how DocBook works, what it does, and how to use it. Part II is 400 pages, covering every element type in the language, with a detailed description of what it is for, how and where to use it, and how it interacts with everything else. Both for the beginner and the expert, these descriptions are the key to effective use, and Walsh's explanations are clear and comprehensive.
For those of you who have been using DocBook in earlier incarnations, the changes are not deal-breakers, and many of them are welcome rationalizations of the way things have grown organically over the years. It still walks like a duck and quacks like a duck (and the book still has a duck on the cover), so it immediately feels like the same format that you're used to — the changes to element types are relatively few. Chapter 1 (Getting Started) has a brief history, a summary of the changes, and an explanation of the namespace and availability.
If you've never used DocBook before, its structure will still be familiar: in Chapter 2 (Creating DocBook Documents) Walsh explains the division of reference material like books, articles, and manuals into chapters, sections, and subsections, with all the conventional features like lists, figures, tables, and references, as well as the technically-oriented features like equations, programming constructs, interface descriptions, and code samples.
There is help in Chapter 3 (Validation) for those who construct or generate DocBook documents without the use of an XML editor (or even with them: more on editors below). The most common problems with misplaced markup (and the error messages they create) are clearly explained with examples.
Chapter 4 (Publishing) very briefly explains the role of stylesheets (CSS, XSL, and XQuery) in displaying and transforming your documents to other formats, but as these all have their own books and manuals, this book doesn't go into them in any detail.
Customizing DocBook is fairly commonplace, either to avoid the need to commit tag abuse, or to extend its structure into other fields (I added a new element type for typographical examples for my book on LaTeX, and it only took a few minutes). Chapter 5 provides some rules and explanation of customization layers and modularity for those who design schemas and DTDs.
The five Appendixes cover Installation, Variants, Resources, Interchange, and the GNU Free Documentation License — yes, you can read the whole thing online at docbook.org, for which Tim, Norm, and many others are to be thanked. It is a rare publisher who groks the need to be able to point someone at a reference, or quote it in email or a tweet, where a paper copy doesn't cut the mustard.
There isn't anything here about actually using an XML editor or about how to choose one. Editors do of course all come with their own documentation (much of it written using DocBook) and editor selection can be a complex business. However, there is a list of some common tools in Appendix C (Resources). Editors are a minefield, as my own research into the usability of editing software for structured documents is showing, so I can understand the omission, but some pointers to editor resources would have been useful.
The chapter on Publishing is useful for those who haven't been in the publication process before, but it could have emphasized more the need for accuracy and consistency. Experienced technical authors know this, but many other writers don't see the need for it, assuming that the publisher (or some elf) will automagically heal everything before publication. DocBook 5 and this book will help enormously, but author-edited documents sometimes unwittingly misuse or abuse the markup, no matter how exhaustive the manuals.
If you write computer documentation, or anything related to it, from a conference paper to a thesis to a book, DocBook 5 is probably what you should use if you want the document to survive and to be usable and reusable; and this is the book to help you do it.
You can purchase DocBook 5: The Definitive Guide from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
"I added a new element type for typographical examples for my book on LaTeX, and it only took a few minutes."
Why are you using DocBook for a book on LaTeX?
Not only that, it sounds like a horrible format if you need documentation to write in the documentation language. Just looking at their What is DocBook page leaves me wondering what the hell it really is...
DocBook is being used for what HTML was originally intended - technical publications. Why not just use HTML? It even supports pictures!
HTML was originally meant as a subset or replacement of SGML. The primary goal was to be able to share documents; technical or not. Tim Berners-Lee's main goal in creating HTML was to have a way to share information easily.
"Maybe this world is another planet's hell"
Aldous Huxley
one word: chapters.
If everyone would just use the sensible choice: EMACS Vi Notepad Pico!
Best Slashdot Co
It's been so long since I visited the bookstore, I forget where it's at. Is DocBook 5 online?
Oh here it is: www.isohunt.com
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
One of the big reasons is that HTML lacks semantic meaning beyond simple paragraph constructs. Documentation-oriented markup languages (of which I'm more familiar with DITA) and schemas can seem arbitrarily complicated to a casual observer, granted; but having an identifier that clarifies "this" paragraph being an instruction that should be executed by the user, and "that" paragraph being merely an example can allow for some rules-based (automated) processing to exist between authorship and production that wouldn't be possible lacking some notion of the semantic purposes of a random collection of raw paragraphs.
XML (and SGML before it) is a meta language. From that you derive a description language for the specific use. HTML meets the needs for an on-line presentation of information. HTML is not designed and does not work well for printed materials. DocBook is designed to be used for multiple ways of presenting information and has the features for books and other printed media.
To use a bad analogy, think of XML and C. You can write the "hello world" example in C, but it doesn't replace a database application written in C. C can be used for big or small applications. XML can be used for relatively simple description languages (such as HTML) or very rich description languages for large, complex documents (such as DocBook).
The subtitle "...The Definitive Guide" means this book is for specialists that work with the DocBook publication tag language.
The information in this book isn't for the user of the word processor or editor program.
DocBook is a syntax and tag language and this is a book for people who work with the tag language.
DocBook is probably the absolutely worst document writing format I have ever had the displeasure of working with. It seems to have been born in some deranged xml-lovers wet dream in which "documents" are "self-documenting," semantic structure is more important than content and structure is kept separate from presentation. You know all those generally good ideas that become very dangerous when taken to far, which DocBook exemplifies. The more xml the better, seem to have been their guiding principle. In HTML, P is the tag for paragraphs, not so in DocBook, guess P wasn't descriptive enough so it had to be PARA instead. In HTML, to create a preformatted block you often use PRE. Well obviously that was to simple for DocBook so you have to nest two tags INFORMALEXAMPLE PROGRAMLISTING source code /PROGRAMLISTING /INFORMALEXAMPLE.
Maybe you are asking, who the hell came up with the INFORMALEXAMPLE tag? Well in DocBook you can not just say "give me a block with fixed-width font" you have to be "semantic" because you must separate presentation from structure. This is the reason why the maintainers of the DocBook standard has to continuously invent new tags for use cases they didn't think of. For example, there are all these tags for describing different programming language identifiers: KEYWORD, FUNCTION, CLASSNAME, STRUCTNAME, TOKEN, PROPERTY, TYPE.. etc. They all make it so the word within the tags are formatted using italic text. But what if the programming language you are writing about in the text has a different concept not covered by DocBooks standardized tags? Then you're out of luck. You either cheat and use a different tag which happen to produce the same presentational italicized text you wan't or you submit an enhancement proposal to DocBook and wait for them to standardize your new tag. If you choose the former, you quickly realize that your carefully marked up DocBook text is nothing more than glorified HTML, with retardedly verbose tag names, in the latter case you will never complete your documentation because there will always be tags you'll need that you can't have.
Football Odds
A short look at the Docbook element reference (about halfway down the page at http://www.docbook.org/tdg5/en/html/docbook.html ) will show some of the elements that are relevant when publishing a *book*; elements for citations, bibliographies, indexing, callouts, glossaries, etc. HTML does not provide these elements.
Not only that, it sounds like a horrible format if you need documentation to write in the documentation language. Just looking at their What is DocBook page leaves me wondering what the hell it really is...
Even how to write English is documented in English, so why do you argue that any language which can use itself to document how to make more of itself is bad?
DocBook is being used for what HTML was originally intended - technical publications.
True, DocBook is used mainly for technical publications. Not true, HTML was intended for implementing the hypertext (that's why HT is part of the name) and not specifically for technical publications.
Why not just use HTML? It even supports pictures!
Because DocBook provides much more meaningful elements for technical publications than HTML. Because DocBook is intended mainly for documents published on paper, while HTML is intended for Web pages displayed in a browser. There is a reason why nobody uses HTML for technical publications.
The real question must be, why use DocBook when we already have DITA? While both formats are designed specifically for technical publications, DITA is superior.
"HTML+CSS+CMS > My woefully inadequate understanding of DocBook"
FTFY.
How does reStructuredText stack up against DocBook? It's on my "look into later" list for technical documentation. My first impression of it was pretty good, especially combined with the Sphinx document generator.
"already have DITA"
That implies the reverse order of invention as actually occurred. DITA might be superior, I have no idea ... haven't really used either. DocBook seems a bit more actively developed though, no official RelaxNG schema for DITA for instance.
There's a reason why GNOME docs are moving away from DocBook...
I've used both DocBook and DITA. While you can do the same jobs with both of them, DocBook is better, in my experience, for linear documents. while DITA seems to work well for non-linear stuff. DITA also uses topic maps, which can be hard for people to understand.
They are switching to something more domain specific though, not to some general alternative.
I have some experience with Docbook, although probably not enough to qualify as an expert. From what I've seen so far:
Pro:
1. Generating pdf, html and (sometimes) man pages from a single source document. This is probably the biggest single win for Docbook.
2. Combining parts of documents with xinclude. If you have four documents of different types which need to contain the same introductory description of a tool (say) or a synopsis of command arguments (book, man page, short article, comprehensive encyclopedia, etc...) you can write the description once in one document and xinclude that specific piece of the document in other documents.
Cons:
1. Toolchain. TeX distributions get this right - install texlive with all the packages and you're done - you can handle any LaTeX document. For Docbook, it's a struggle to figure out what you NEED, never mind how to install it. Once you get it worked out you can integrate it into your build system and forget it, but it takes a while to get there.
2. You need to learn a lot of languages to customize the look of your output documents, and it's not exactly for the faint of heart. I suppose this is kind of a wash between TeX and Docbook, since both don't invite casual tinkering with the look of output, but it's a bit scary. I believe the Firebird RDBMS manual is an example.
3. Finding the "right" tags for what you're trying to do. Price of doing business of course, but there are a LOT of tags to sort through.
LaTeX of course mops the floor with Docbook when it comes to things like mathematics or pstricks, but to be fair about it that's not what Docbook was intended for.
"I object to doing things that computers can do." -- Olin Shivers, lispers.org
When writing documentation in word or open office I don't need to read an entire book in how to do it. That's why this bloated design by committee xml language is a complete waste of time. You analogy fails because it's not even slightly related and doesn't at all translate to real life. When typing this comment is wasn't constantly referring to a dictionary.
Wait a second... you're writing a book on LaTeX using DocBook?
Does not compute...
Shouldn't you be using LaTeX to write a book on LaTeX?
If you are one person writing a 50 page document Word may very well be perfect. However, imagine you have 20 people who need to collaborate on keeping a 1000 page documentation set updated.
Now imaginge doing this in Word.
Google Docs, you fail.
Of course you can abuse HTML for anything you want...just look at the web :-)
if you want rules-based automated processing, you can do that with html
You can, the same way that you can write a book in PostScript.
But if you want all the internal checks on consistency and effectivity that make a piece of documentation robust and persistent, you probably don't want to do it in HTML and CSS.
...some of the elements that are relevant when publishing a *book*; elements for citations, bibliographies, indexing, callouts, glossaries, etc. HTML does not provide these elements.
Earlier versions of HTML did provide a lot of these, but the W3C took 'em all out (deprecated them) becase it was clear that no-one in their right minds would author a complex technical book in HTML.
The real question must be, why use DocBook when we already have DITA
DITA is descended from DocBook in many ways. DITA is an architecture with which a willing participant can, if she tries long and hard enough, come up with a system that can be used for authoring large series of technical documents.
DocBook works right out of the box.
It's actually because good XML editors are expensive. The free ones are designed for XML experts who know all about markup: there are no usable XML editors (free or non-free) for non-experts in markup, as I showed last year.
It's an interest bit of research you are plugging ... but Gnome is switching to Mallard, which is XML+RelaxNG, not switching away from XML.