Slashdot Mirror


Creating and Using XML-Based Internal Documents?

Richard Emberson asks: "Once again into the breech...or at least the ground floor in a new startup. This time around, I would like to have all of the Engineering documentation internally online: a unified, internal, CVS-ed, web-based, development organization document tree covering the engineering process, methodology, coding standards, nightly build/test reports, FAQs, new hire information and help pages and the documentation for each project. Recently I've written documentation (on Linux of course) using the Apache XML-stylebook tags, stylesheets, and Ant-base publishing - and I like it. So my questions are: Has anyone done this and, if so, how were the links between documents managed?" Does your workplace use XML in its internal documentation? If so, how well does your system work, and what advice would you pass on to anyone else attempting something similar?

"If you start out with only one project (product), how do you structure it so that when new projects come into existence they can easily be integrated? Are there documentation templates out there upon which I can base the various development documents (like requirements, product development plan, design, coding walk-thru standards, etc.) and not have any of this swell too be so large that no one will be able to produce, maintain or read it?"

21 of 176 comments (clear)

  1. InfoImaging and Dig35 meta data uses XML by purduephotog · · Score: 3, Informative

    Don't know if you'd consider it engineering texts, but XML is used in moving metadata from pictures around. There's an open source and binaries downloadable.... might help your implementation.

    Good luck- it's quite impressive once you get the trees set up correctly :)

  2. Doc Book by jjr · · Score: 5, Informative

    Why not use DocBook it is XML based extendable what more could you ask?
    Have Fun

    1. Re:Doc Book by Anonymous Coward · · Score: 3, Informative

      Another vote for DocBook. We use it and it works great. You can process DocBook SGML (it also comes in an XML flavor, but we haven't used it) into pdf, rtf, and html. Books can come with multiple chapters so that many folks can work from a cvs repository at once. Then, you build the docs every night the same way you build the code. It takes a smart person a few hours to set up the build process, but it's no big deal. O'Reilly has a book on it, check Amazon.

      DocBook is excellent, no question about it.

  3. Zope by DOsinga · · Score: 5, Informative

    Have you considered zope? It is perfect for storing a document tree, it has strong support for XML, including an extension for DocBook and supposedly it integrates with Apache. Gives you also lots of options to format the
    XML as Html.

    1. Re:Zope by dgroskind · · Score: 2, Informative

      Three relevant links to read in considering Zope for XML are:

      Creating XML Applications With Zope

      Create a XML Based Document Repository

      Cant Handle Humongous XML

      In some data management scenarios, using Zope obviates the need for XML markup. In practice, content management issues like security, revision control, and online access through a browser are bigger issues than markup. Zope provides solutions to all these problems.

      My main caveat in using Zope is that finding all the relevant documentation for XML or anything else is a veritable Easter egg hunt. The Zope API doesn't seem to be documented in one place. More than once a Zope tutorial seriously proposed that the reader read the Python source code for further information.

  4. Stylebook is dead by Lt · · Score: 3, Informative

    I use stylebook for internal documents at planetu.com, but stylebook seems to be dead. Docbook is a much better choice. I just have not got around to converting to it.

    The really nice thing about using XML is that I can automate some of the documents. Such as the list of valid form fields for a HTML/jsp page.

    For the nightly builds, change logs and javadocs, I use Alexandria.

  5. Linking between documents by kampi · · Score: 3, Informative

    In your case I'd probably choose docbook, but if you're just looking for a way to automatically setup the links between several XML documents you might take a look at w3make (which I wrote) or a similar project called XWeb which was written by Peter Becker.

    --
    -- a blessed +42 regexp of confusion (weapon in hand) You hit. The format string crumbles and turns to dust
  6. Our company is using XML for all documentation. by MemRaven · · Score: 5, Informative
    I asked our docs infrastructure person to pipe up, I'll see if she does before this gets to like 400 comments and nobody sees.


    We standardized all our early documentation on XML, and it's been working great. Admittedly we're using Perforce, not CVS, but we're doing something very similar to what you want to do.


    All our documentation is in XML format, in a DTD that we defined. We then have XSLT transformation scripts that convert that documentation to HTML format, and scripts that automatically update our development intranet whenever changes get checked in, along with scripts that invoke javadoc and doxygen on all the code to convert that to HTML format. We're in the process of being able to convert the same documents to PDF format to be able to publish those same documents, in the same formatting, to pretty-formatted documents for printing.


    This, aside from the simplicity of not having to worry about formatting documentation when you write it, is pretty cool. It's easier for me (as an engineer) to write a very sparse, structured XML document that will end up looking very good on screen, than to learn enough HTML to make my documents look good. And it's easier for us to enforce a standard look and feel across all documentation this way, because only the XSL transforms have to change if we change our formatting.


    But the real advantage is coming out with more advanced uses. For example, when we have configuration files, we have a special DTD that we define the documentation for the configuration files in, and then any documents that describe the configuration files are automatically converted both to the HTML documentation, AND to an example configuration file for users. We can also mark some things as only visible internally, so that the same document can have data that's visible to end-users, and data that's visible to employees (so if we have advanced configuration options that we don't want customers mucking with because they're for debugging the system, we can document them in the same place, and just hide them from customers but let our support and professional services people in on the secret).


    The best part is that because our XML DTD is very structured, someone like me (an engineer), will actually use it because it ends up being easier for me than writing in plain text, whereas I wouldn't do it in HTML (or if I did, it would just look like crap). It also makes it much easier to do integrations across branches of code: because we know the DTD for our XML documents very well, it's more likely that integrations will go smoothly, which helps keeping multiple branches of code and docs in sync automatically. If you go with a binary format, you're not going to be able to do that, and every time you make a change, you're going to have to manually change the documentation for each and every branch. With ASCII or HTML, everybody's going to produce documents that look a little different, so you're not going to be able to have as easy a time in integrating between branches.


    Our docs infrastructure person can pipe up in terms of the particular technology that we're using, but it's turned out to be one of the best infrastructure decisions that we (actually, she) made, and it's saved uncountable hours and actually made it more likely that people will write documentation themselves, because they don't have to pull out some crazy windows tool: just edit the document in emacs, and it'll still look pretty for the customers.

  7. Introducing XML documentation in the company by thbzcrt · · Score: 3, Informative

    Unless you are the CTO or the only developer in your company, you may not have complete control over the documentation format. Other people may, and probably will prefer to write documentation in Word than in XML. And I won't condemn them because Word is a good editor for documentations in plain English.

    A way to introduce XML-based documentation in a company is to prove what it can do (and not just speak about it). In a previous company, I expressed the desire to have a documentation generated from the source code, but nobody seemed really interested. So I did it myself, and when they saw it, they loved it.

    The idea was to parse our source files (which were in various languages, more or less easy to parse) and generate an XML documentation for the APIs. In a second step, other programs read the XML documentation and transformed it in RTF (Word) and HTML, using SAX and XSLT (I tried both and preferred SAX).

    The HTML version was installed on the Intranet and the developers used it as a reference documentation in their everyday work. They knew they could trust it more than any other documentation, because it was regenerated every night. They also liked it because, unlike Javadoc, the source code parser worked very hard to gather information from the code without forcing the programmer to use constraining comment conventions.

    The Word version was delivered to the clients as an API documentation.

    Other documentations were written directly in Word. The system worked very well, and ensured a good-quality and up-to-date API documentation without too much work.

    I also used the intermediary XML documentation for other purpose, including some code generation, which proved the versatility of XML.

  8. Try Conglomerate by Colin+Smith · · Score: 4, Informative

    More worthy, full document management system than the efforts being put into word processing.

    Conglomerate

    --
    Deleted
  9. XML Document Architectures by under_score · · Score: 5, Informative
    Hi. I have a little experience with this. I'm not going to bore you with the story, rather just get to a simple description of possible architectures for what you want and why you might want them. Finally, I'll conclude by saying that what you are doing is extremely ambitious: don't falter when it gets hard and overwhelming.
    1. Plain XML, without schemas
      XML is a markup langauge that is supposed to be human readable. Thus anyone can whip up an XML document that describes some data (e.g. documentation on software). It helps if you have standards to make the XML consistent.
    2. Plain XML, with schema (or DTD)
      Creating schemas for all you different types of documentation is probably the first big pain in the butt you will deal with, but it is pretty essential to get a project like you describe to work. It helps by setting common standards which all participants in your org can use to understand the docs they are looking at. Now you also get some tool support for creating and validating your XML documents.
    3. Database -> XML
      Store all your documentation data in a database and use common db tools to extract it and format in XML. Why bother? Tool support! Lots of software development project tools support using a db as a repository for the various work products (documentation and code and stuff). This also allows you to have somewhat easier methods for serving your content to interested parties with appropriate security constraints.
    4. Repository -> XML -(XSLT)-> HTML
      Here we add the ability to transform the human-readable-but-cumbersome syntax of XML into html for viewing on a browser. The big effort for this sort of architecture is that you have to create the XSLT for all your different document types and you need some way of linking-to/searching your documents from the html into the repository. Some application and web servers help with this. I'm most familiar with the Java space, and Tomcat with various xml libraries can be made to do this.
    5. Repository -> XML -(XSLT)-> XML -(XSLT)-> HTML
      This is the most flexible architecture in which pure data XML is transformed into an intermediate form which represents an abstract presentation of the XML and which is then transformed into HTML (or WML or PDF or whatever). The first stage of transformation you need one XSLT style sheet for each document type to convert it into the presentation XML. Then for the second stage you need one stylesheet for each display format. The big advantage here is that if you need to publish to a new document format, you don't need to re-write _all_ of your first stage transformations, you only need to add one new second stage transformation.
    There are of course variations. Check out IS Architectures - Organizing the Web Server for more details when one of your outputs is HTML.
  10. Re:I love how MS is dealing with XML by mfarver · · Score: 2, Informative

    Really? My copy of Word 2k doesn't seem to save native in XML, nor can I find any options concerning it. We would love to see this feature, since most of our technical documents are in word and the inability to search the documents is killing us. In fact XML is the primary reason we are examining OpenOffice.

  11. Why XML? by dgroskind · · Score: 3, Informative

    A good place to start is Open Source XML Database Toolkit by Liam Quin.

    The key point is that the best approach depends upon how the data will be accessed, used and updated. There does not appear to be an off-the-shelf, one-size-fits-all solution, even if you go to a commercial platform.

    The advantage of XML is that you can start with a simple approach and migrate to a more complex approach without having to do an expensive data conversion.

    The disadvantage is that XML can be quite expensive to set up on legacy documents and expensive to maintain as well. For documents that change frequently, have multiple uses, or require precise retrieval strategies, XML is the way to go. It's particularly useful when version control must be tracked at the paragraph level.

    If version control takes place at the level of the whole document, retieval is done by keyword, and documents are displayed in one form only, XML may not add anything but trouble.

  12. Forget CVS; start a Wiki! by Sunlighter · · Score: 3, Informative

    The Wiki Wiki Web is a set of editable, cross-referenced web pages. Anybody can view them and anybody can edit them, and they are searchable. Wikis are pretty useful for internal documentation projects. It should be possible to extend the concept to add the security that is typically required and to add support for XML. Of course, all that means I am practically suggesting you write your own custom Wiki, which may take too long for you. But you could probably start with an existing Wiki and get good results. I have set up UseModWiki (which is a CGI script written in Perl) and gotten good results.

    Hope this helps!

    --
    Sunlit World Scheme. Weird and different.
  13. we use XML for our knowledge base by valmont · · Score: 3, Informative

    I work at an ISP ... (not AOL, not MSN) We have a whole department who's in charge of writing up procedural documentation, walkthroughs, how-to's, FAQ's, to solve just about any problem you could ever encounter on almost any platform and operating systems under the sun on your way to getting connected to the internet.

    As soon as XML standards and derived technologies and languages (XSLT, DOM, and more) started to be strongly established nearly 2 years ago, we moved whatever existing documentation we had into XML, conforming to internally developed DTD's and specs, after a couple guys and I built a handy HTTP-based authoring tool that leverages technologies built-in Internet Explorer 5.0 which I've previously described right here, allowing writers to not have to know anything about XML, and simply click their way thru easy interactive forms, in a fairly compelling user interface ...

    With all of our information stored in XML, we can easily present it to various audiences, may it be our members who can search it by keywords to help themselves in our online support area or our technical support reps who can browse directory trees to specific XML documents and have access to more detailed information about hardware and platform configurations, document revision information and more.

    The bottom line is this system works really well. And we have the amazing peace of mind of having GREAT information in a format that can never become completely obsolete, and that is always a couple XSLT stylesheets away from fitting just about *any* need.

    Whether you make up your own DTD's or follow existing standard DTD's like DocBook mentioned in other posts, as long as you put some thought into structuring your XML data at the beginning, you can only win in the long run: XML documents can easily be processed into other XML DTDs/formats to represent the information in a way that better fits another application, and/or transformed into other documents made of a markup language meant for presentation like HTML or WAP.

    yea. XML is nifty. :)

  14. Re:Use a wiki by Khalid · · Score: 3, Informative

    Yes I confirm this ! at work we use twiki (twiki.org) one of the best wikis I kow of, really a very nice collaboration tool. It can be used as knowledge management repository too, very easy to use and to start getting people using it.

  15. Similar problem here... by Eminence · · Score: 2, Informative

    At my company (in fact it's a local branch of an US based corporation) we have similar problem. There is a team here developing a system designed specifically for a customer. As one can expect along with such a system goes all the documentation - everything you could expect starting from the analysis, through functional specification and coding guidelines to end user and administrator's manuals. To make things more complicated part of the development - and the documentation - is being done by a subcontractor (which happens to be on another hemisphere) - and it is being prepared in English, but some parts of it (especially the manuals) have to be translated into local language.

    Up until now it has been a growing mess with documentation being written in Word (with all the usual problems Word has with large files, with lots of graphics - screens, no versioning etc.), with no standards, with people getting into one another's way while trying to update the numerous documents.

    Recently together with a friend we have came up with the idea to switch all that into neat XML/SGML files, with CVS based versioning and everything based on open standards and free software as much as possible. To our surprise the management liked the idea and we got a green light to do some research. And then the problems have begun.

    First, the editor. Coding XML files with vi or alike might be nice for a hacker - and is great for creating and testing XML formats used then for data storage etc. - but it is out of the question for documentation authors. And it is pretty understandable - to be able to concentrate on the content, on the text itself, the author needs to see only the contents, as nicely rendered as possible - no tags getting into way in each sentence, no learning for years how to use the editor (thus Emacs with its psgml mode is not an option - don't flame me, it's just a fact). After a long search I have to say that there is no working, finished GNU/free editor that would match our requirement of almost-WYWSIG presentation of an XML/SGML file. As to commercial ones the only two that look good are XML Spy 4.0 - but it is just a poorely working beta for now - and Arbortext's Epic - which is almost exactly what we need, but is a bit expensive at around $700 a license.

    Nevertheless, with no other options left we decided to go for the Epic when it comes to the editing side. We got an evaluation package and begun testing.

    Now, we were from the start convinced that DocBook DTD & tools that go along with it are the best choice for the kind of problem we faced. Epic supports the DocBook but comes along with their own version, which in turns doesn't work well with the Linux sgml tools that we use for translating the XML/SGML files to useful end formats. On the other hand not all Epic's features can be used when one just tries to edit the document based on an "external" DTD. To enable things like being able to see the graphics files inserted into the document one has to hm... "customize" the Epic by creating some additional configuration files (like .FOS files) using yet another expensive tool Arbortext sells - the Epic Architect.

    But that is not the end of the problem, because the stylesheets currently available for translating the Docbook based XML/SGML files into useful formats are not well documented and partially don't work (for example tags related to inserting pictures in the document are ignored when trying to generate a printable document). There is for example a project on Sourceforge that develops XSLTs and DSSSLs for translating Docbook based XML into various formats, but so far I was not able to make them work - and there is no documentation. Also the DSSSL based machinery for translating SGML files that comes with various Linux distros is far from perfect - HTMLs are generated mostly OK, but printed documents (.tex and .pdf) leave much to be desired.

    So, from our point of view it looks like we will have to buy an expensive editor and then someone would have to spend a month or so tweaking the editor, modifying the stylesheets for our needs, developing procedures and so on. And that someone would have to be quite a competent person (with deep knowledge of the subject), someone, who could be probably better used directly in the development project.

    As for now the future of our little plan of switching from mess to neat XML based solution is uncertain. Mainly because we would have to build that neat solution ourselves, as what we can get from outside at the moment are some bits and pieces that - although nice by themselves - just don't fit together.

    (And, BTW, I haven't even touched the nice catch with CVS - to be really useful in the kind of environment that we envisioned it would have to be integrated with the editor - and that doesn't seem likely).

  16. Re:Word to XML? by Anonymous Coward · · Score: 1, Informative

    Office 2000 includes Loseless XML support. That is if you save a Word document into XML, and read it back into Word, no formatting should be lost.

  17. Different problems by coyote-san · · Score: 3, Informative

    You're dealing with different problems here.

    DocBook, and any decent document XML DTD, gives you the ability to tag your text with some description of what it means. It might be "chapter" or "list," or it might be domain specific like "files," "bugs" and "see also" (for man pages). The presentation details are left to the processing software to handle.

    MS-Word, in contrast, is nothing but a paint tool for words. You can certainly give your styles names that have some domain meaning to you, but it's still ultimately nothing but a set of style instructions.

    For a single document, this isn't a big issue. But if you have a lot of documents and you want to reuse content, it's impossible with the MS-Word approach. With DocBook, in contrast, it's easy to set up your documents so that the same file can be reused in multiple places, but only selected content will be reused.

    IMHO, if your technical writers can't make the shift to meaningful tags, you're better off without them. (The writers, not the tags.) If they can't handle this level of structure, their writing is undoubtably muddled and confused no matter how pretty it looks.

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  18. coaxing structure from Word by Anonymous Coward · · Score: 1, Informative

    Here's a way to get and enforce structure in Word documents.

    Word allows named styles, and with View>Normal you can even show the stylenames on the screen (Tools>Options>Style area width).You can create a template with paragraph and character styles that correspond to the structure you want. The template can hack the toolbars: put your styles on a toolbar and in the pop-up menu, if you want. Have the template change the Save command to save docs in rtf.

    Users write in Word. Save into the CMS. In the CMS gui, choose validate or finish.

    The CMS gui then launches a series of processes. First, convert the rtf to xml-like tags, based on the styles used in the document. Second, run some clean-up script to make a well-formed xml document. Third, run a script to do any validations that xml can't handle. Fourth, run a standard xml validator. If the validator finds a problem, you fix it back in Word. You only edit Word files. To preview, translate down to xhtml.

    Users will have to cooperate, but then they're usually paid to cooperate. RTF is nasty, but this is as straightforward a conversion as you can get. The biggest problem is Word's lists. You could either guess at lists based on formatting, or require (hidden) begin/end list paragraph styles.

    I think this approach could let you escape Word. Next upgrade, you could switch to Star Office or KWord.

    A little more detail.

  19. The docs infrastructure person pipes up by shepk1 · · Score: 2, Informative

    I didn't realize this thread was already going when I posted, so I'm pasting it in again here:

    This is what we do. It works, and it solves many of the issues associated with traditional documentation "systems".

    1. Define a DTD of the elements you will need in your documents. You can go with a predefined DTD like DOCBOOK, or you can roll your own. Because my goal was to get engineers to write as many documents as possible (thereby making my life easier), I chose to create my own DTD - There's about 60 elements. Special documents (like projects, or resumes, for example) have their own
    DTD (and XSL) in order to keep the main documentation DTD as simple as possible.

    2. Create a general XSL sheet that transforms all
    documents which conform to the DTD into HTML for your internal web-site - all of the presentation
    logic is in this one XSL sheet (although it can use xsl:include if you'd like to break it up by header, table of contents, content, footer, etc).

    3. Set up documentation sub-branch(es) in the source control system. (In the
    code base and QA/infrastructure/whatever else you plan to document and publish). The closer it is to the code, the more likely that engineers will add reviewcomments, fix errors, update it, etc.

    4. Anyone and everyone writing documents does so
    according to DTD and checks them into the designated directory structure in the source control repository. (As far as they are concerned, this is the end of their work -- it just magically appears on the site)

    5. Set up Ant to use one of the Trax processors (I use Xalan2).

    6. Write general purpose ant targets to convert
    documentation with general XSL sheet, build indexes, etc. Write build.xml files that call these tasks and convert/index/etc all the documentation in each sub-branch.

    7. A monkey or chron-job converts the documentation and pushes it to the website.

    Benefits:

    Tables of Contents, Index pages, indexing, image-links, etc. are all generated. There is a much lower chance of links being broken.

    Once you get the XSL right, you don't have to worry about consistency with the look and feel.
    You really can just concentrate on content.

    In my experience, engineers are much more likely
    to use VI/emacs to edit/review a document than Word or Framemaker. The GUI XML editors are getting better...

    Drawbacks:

    Someone has to own the DTD, beat their head against the wonderful syntax of xslt, and be willing to decipher the Ant stack traces when things inexplicably bail.

    It's not yet commonly done (or at least they don't post on may newsgroups), so often you are charging ahead without much feedback.

    Technical writers are difficult to hire. We haven't tried to find another one yet, but I imagine asking people to write in XML will limit our talent pool. Of course... in theory I could make all this work with Framemaker+SGML...