Creating and Using XML-Based Internal Documents?
Richard Emberson asks: "Once again into the breech...or at least the ground floor in a new startup. This time around, I would like to have all of the Engineering
documentation internally online: a unified, internal, CVS-ed, web-based, development organization document tree covering the engineering process, methodology, coding standards, nightly build/test reports, FAQs, new hire information and help pages and the documentation for each project.
Recently I've written documentation (on Linux of course) using the Apache XML-stylebook tags, stylesheets, and Ant-base publishing - and I like it.
So my questions are: Has anyone done this and, if so, how were the links between documents managed?" Does your workplace use XML in its internal documentation? If so, how well does your system work, and what advice would you pass on to anyone else attempting something similar?
"If you start out with only one project (product), how do you structure it so that when new projects come into existence they can easily be integrated? Are there documentation templates out there upon which I can base the various development documents (like requirements, product development plan, design, coding walk-thru standards, etc.) and not have any of this swell too be so large that no one will be able to produce, maintain or read it?"
Why not use DocBook it is XML based extendable what more could you ask?
Have Fun
It sounds to me like you're already a step ahead of the rest of the world, for the most part.
My workplace uses a hodge-podge of formats including "special" ASCII text files, Framemaker, HTML and Microsoft Word. Needless to say, it's a mess. No open, standard, consistent tools to examine all of our documentation. Yeah, you can grep HTML, but the others are a pain. And don't even think about automatic script language based conversion among these formats.
I suspect you're more advanced in your thinking than 90% of the places out there. Why not continue with your thinking and let the rest of us know what you decide?
"Provided by the management for your protection."
Have you considered zope? It is perfect for storing a document tree, it has strong support for XML, including an extension for DocBook and supposedly it integrates with Apache. Gives you also lots of options to format the
XML as Html.
We standardized all our early documentation on XML, and it's been working great. Admittedly we're using Perforce, not CVS, but we're doing something very similar to what you want to do.
All our documentation is in XML format, in a DTD that we defined. We then have XSLT transformation scripts that convert that documentation to HTML format, and scripts that automatically update our development intranet whenever changes get checked in, along with scripts that invoke javadoc and doxygen on all the code to convert that to HTML format. We're in the process of being able to convert the same documents to PDF format to be able to publish those same documents, in the same formatting, to pretty-formatted documents for printing.
This, aside from the simplicity of not having to worry about formatting documentation when you write it, is pretty cool. It's easier for me (as an engineer) to write a very sparse, structured XML document that will end up looking very good on screen, than to learn enough HTML to make my documents look good. And it's easier for us to enforce a standard look and feel across all documentation this way, because only the XSL transforms have to change if we change our formatting.
But the real advantage is coming out with more advanced uses. For example, when we have configuration files, we have a special DTD that we define the documentation for the configuration files in, and then any documents that describe the configuration files are automatically converted both to the HTML documentation, AND to an example configuration file for users. We can also mark some things as only visible internally, so that the same document can have data that's visible to end-users, and data that's visible to employees (so if we have advanced configuration options that we don't want customers mucking with because they're for debugging the system, we can document them in the same place, and just hide them from customers but let our support and professional services people in on the secret).
The best part is that because our XML DTD is very structured, someone like me (an engineer), will actually use it because it ends up being easier for me than writing in plain text, whereas I wouldn't do it in HTML (or if I did, it would just look like crap). It also makes it much easier to do integrations across branches of code: because we know the DTD for our XML documents very well, it's more likely that integrations will go smoothly, which helps keeping multiple branches of code and docs in sync automatically. If you go with a binary format, you're not going to be able to do that, and every time you make a change, you're going to have to manually change the documentation for each and every branch. With ASCII or HTML, everybody's going to produce documents that look a little different, so you're not going to be able to have as easy a time in integrating between branches.
Our docs infrastructure person can pipe up in terms of the particular technology that we're using, but it's turned out to be one of the best infrastructure decisions that we (actually, she) made, and it's saved uncountable hours and actually made it more likely that people will write documentation themselves, because they don't have to pull out some crazy windows tool: just edit the document in emacs, and it'll still look pretty for the customers.
More worthy, full document management system than the efforts being put into word processing.
Conglomerate
Deleted
- Plain XML, without schemas
- Plain XML, with schema (or DTD)
- Database -> XML
- Repository -> XML -(XSLT)-> HTML
- Repository -> XML -(XSLT)-> XML -(XSLT)-> HTML
There are of course variations. Check out IS Architectures - Organizing the Web Server for more details when one of your outputs is HTML.XML is a markup langauge that is supposed to be human readable. Thus anyone can whip up an XML document that describes some data (e.g. documentation on software). It helps if you have standards to make the XML consistent.
Creating schemas for all you different types of documentation is probably the first big pain in the butt you will deal with, but it is pretty essential to get a project like you describe to work. It helps by setting common standards which all participants in your org can use to understand the docs they are looking at. Now you also get some tool support for creating and validating your XML documents.
Store all your documentation data in a database and use common db tools to extract it and format in XML. Why bother? Tool support! Lots of software development project tools support using a db as a repository for the various work products (documentation and code and stuff). This also allows you to have somewhat easier methods for serving your content to interested parties with appropriate security constraints.
Here we add the ability to transform the human-readable-but-cumbersome syntax of XML into html for viewing on a browser. The big effort for this sort of architecture is that you have to create the XSLT for all your different document types and you need some way of linking-to/searching your documents from the html into the repository. Some application and web servers help with this. I'm most familiar with the Java space, and Tomcat with various xml libraries can be made to do this.
This is the most flexible architecture in which pure data XML is transformed into an intermediate form which represents an abstract presentation of the XML and which is then transformed into HTML (or WML or PDF or whatever). The first stage of transformation you need one XSLT style sheet for each document type to convert it into the presentation XML. Then for the second stage you need one stylesheet for each display format. The big advantage here is that if you need to publish to a new document format, you don't need to re-write _all_ of your first stage transformations, you only need to add one new second stage transformation.
Helping with organizational effectiveness is our job.
Forget about how do you build the repository -- that's easy. (Well, okay, non-trivial, but with databases, cvs, and even just simple shared folders, storing the docs is the least of your worries).
:) ) But again, the long pole of the tent was the editor widget.
I still maintain that the biggest hurdle in any standardized document system (especially if you include multiple concurrent authors) is the front-end editor. I wrote a simple (and highly buggy, I'll admidt, so you who know me keep your traps shut!) VB application that provided a multi-user front end to a database. The back-end (PHP) pulled all the appropriate rows for any given doc together and mashed it into a nice, navigable HTML document. I even had PDF support at one time (but it was even flakier than the GUI).
However, it was not XML, so it was REALLY limited in how easy it was to create new views on the data. The biggest problem I ran into was trying to find a good GUI editor -- this thing was written for security engineers, not HTML experts, and I wanted them to concentrate on content, not tags. I eventually settled (and settled is the right word) on the Microsoft DHTML control. Worked well enough for the time (two years ago at this point), but I still think half my problems stemmed from that widget, or bad interface programming to it. The advantage? WYSIWYG (more or less) editing. Seamless multi-user editing of the same document (well, okay, we had some record locking issues.
Since then, I've wanted very much to rewrite the thing to handle full XML, and I understand there's an effort underway to do just that (I've since moved to different pastures), but it's slow going. I've looked at current technology (ABIword, for example), and i'm just not convinced that it's going to be easy to get a good semi-WYSIWYG XML editor going. At least not on the cheap.
Some time ago was posted here an app called Conglomorate, which I still think has about the best approach to visually representing an XML document. But it hasn't been updated in forever, and was slow/buggy the one time I played with it. More recently, the XMLmind XML Editor (XXE) has shown a lot of promise, even including CSS files for editing DocBook XML. They even have source available. Again, goes a long way to letting you edit diverse XML files in a logical way -- not by forcing you to look at ugly tree-views of an XML file, like so many first-generation editors. Finally, the latest XML Spy editor beta goes a bit father even than XXE, using a full XSLT transform to provide a WYSIWYG format for XML files. Theoretically, with this, you should be able to display any of your documents in whichever approach you like -- full WYSIWYG, tables, trees, block labels, whatever.
Of course, neither of these latter tools work in a concurrent editing fashion. But that's a "minor" enhancement -- put together a robust DB back end, allow for good record locking, editor-to-editor communications for lock management, transaction log to allow back-out of changes, etc. Lots of possibilities. Take XXE, put this kind of capability on the back-end, an integrated login and document management system, and you've got a kick-ass document solution. Work the backend to allow for multi-stage review and publishing, and provide output engines for HTML, PDF, WAP, whatever, serve different subtrees of the system to, say, internal project web servers, external web servers, sales and marketing (for glossies), etc., and everyone can manage everything, real-time, GUI, with one tool.
But I dream.
(seriously -- if anyone's really working this, I'd love to help. I just wanna use it at home for my own web pages.)
More exactly one of the best incarnation I know of : twiki (twiki.org). Absolutly terrific ! it can be used as a collaboration medium, a knowledge base repository and much, much more, you will find new ways of using it everyday. I have installed it where I work ! and people have been ecstatic !