Version Control for Documentation?
CodeNation asks: "I'm a coder in smallish (~50 staff) company with a ~20 strong development team. We, the development team, have been using CVS, and CVSWeb to manage our source base for a couple of years. In the meantime, our corporate documentation has become a complete mess. By 'corporate documentation' I mean content such as Word documents, Powerpoint presenations, and Excel spreadsheets. Anyway, I was recently asked by the one of the bosses to put a document version control system in place for this corporate documentation. All this, and the system has to be usable by the non-technical." Ask Slashdot has touched on a similar topic but it's been about 2 years since that article. Has there been any headway in this area?
"Now, this would be a trivial task if:
- The documents were text-based (i.e. the file formats weren't binary)
- The entire company understood how to use CVS
However, neither of the above are true.
I took a look at CVSWebEdit, but unfortunately it's not quite there yet in terms of stability and usability.
Does anyone have any suggestions for a possible solution? What are you currently using for document control (remember these are Microsoft Office documents). Also note that although the developement team works on Linux boxen, the non-technical staff works in a Windows environment.
Thanks for your help."
Xanadu was made for this.
Joseph Elwell.
Corporate documents have much different needs than programs. Documents need to be searchable, categorized, approved, etc. You need something that will tag a document with
* Current Version
* Related Categories
* Awaiting Approvals
* Approvals Received
* Place in the system
* What time to re-review
* Notes for a document
This way, anyone can easily find a document, and find any past versions. New documents will have to go through a formal approval process, and people will be automatically notified when documents need to be re-reviewed. Notes can be attached for clarification and questions.
CVS can do versioning, but not the rest. And, CVS's versioning is MUCH too complex for what you need. You don't need branches/tags/everything else with corporate documents. You are never going to merge between branches. You just have the document and its version.
Engineering and the Ultimate
I've been in your shoes, and my opinion is that it's better to keep it simple, but organized.
I don't like using CVS with nontechnical people who haven't been trrained for it. They are as likely to accidentally overwrite the new version of the doc with the old as they are to save themselves any trouble. So, if you go with CVS, definitely factor in training for everyone.
But, if that won't fly, maybe you should just consider using a plain old network directory, with some careful setup.
Basically, use grouping and network access privileges to give each workgroup their own directory noone else can edit. Each software project has their own subdirectory, and each set of docs for a version of the software a deeper subdirectory. Keep the different versions of the docs totally separate (don't try to be clever and have a shared graphics directory), and archive the old docs in zip files so that nothing gets lost or written over.
That's a start. There is a whole lot more involved in doing this job right: you have to build standardized templates, get the users to enter information for each doc (this can be done easily using a macro that makes them fill out a form when the doc is created), collect that information and set up search tools and indexes.
It's a complicated subject, and that's the short version. If you want to read more, check out my my 16-page Guide to Standardizing and Organizing Documentation. It's not quite where I want it to be yet (second draft but not finished), but I think it might be helpful.
Jon Acheson
All opinions expressed herein are my own, and not those of my employers, who are appalled.
There are several possible solutions to this problem, as I see it. If you want to go the commercial route, Visual Source Safe is the normal standby, or any other commercial source code repository systems would work. VSS has poor merging capabilities, but when you're dealing with Word files or PowerPoint slides, that's not so much a problem (as Office can "version" the files for you -- though make sure to strip the file before releasing it to customers). Also, the upcoming Office XP has Sharepoint capabilities, and is very easy to use (easy enough for your pointy-haired boss to figure out, even).
On the other hand, you could setup CVS anyway, even though you said you'd rather not. There are a few nice win32 GUIs for CVS, so the end-user experience shouldn't be too bad. On top of that, if you use CVS for your documentation, you can keep that in your source tree. One less tree to manage, and you'll always know where the documentation is. And CVS can handle binary files just fine. Just don't expect to be able to merge changes.
Of course, another (less likely) option would be to move all your documents to HTML, or XML with a set of defined XSLs to transform them. That way, you could do merging just fine. However, that's most likely not a workable solution if you deal with anything but Word documents (since word can save to HTML instead of .doc).
There are a number of good tools out there for version control - however there are very few that can handle binary files in a "good" manner.
Some issues:
One - Binary file formats - such as MS Word - cannot have diffs run on them that are meaningful - a binary diff will result in a file larger than the two original files. So most version control programs will store binary files as seperate versions - but will NOT show the differences between the versions.
Two - Microsoft HAS some built in support for versions within Word - however this will quickly result in VERY large files - which get increasingly less useable. Also note, that this will be ONE file containing all versions - if you "version" this file, you will have TWO files with different sets of the underlying versions.. sounds confusing? It would be.
Three - There ARE version control programs that have worked with Microsoft to learn how to understand the underlying MS file formats (I believe Clearcase may, possibly MS's own Sourcesafe (which is otherwise a dangerous version control software to use since it can have data integrity problems) and possibly a few others.
So - what would I recommend or suggest?
First - Look carefully at WHAT you are intending to version. Is it a collection of documents (i.e. a full manual)? Individual documents that change over time? A whole project structure (say a website for online help?) Or something else?
Can you seperate out the FORMATING (which might be in MS Word) from the content? For example by using a Master document format - importing TEXT documents into MS Word? This would allow great flexibility in versioning the underlying text documents, keep a smaller MS Word file, and that file could be "versioned" storing copies of each successive version?
Second - For simple document management systems, (which run on Linux but can be accessed by any browser) look at a system like InfoPlace - simple, open source (I think) and easy to use. It is however not a rigorous version control system, but a partial version control system.
Hope this is helpful - I spent 2+ years teaching and managing version control for a very large development operation (1000+ developers worldwide).
Shannon
-- Join us in Chicago May 1-4th for MeshForum -- writer, historian, tech geek, entrepreneur, internet junky since '91 --
You guys should use the document management system that my last company used- total chaos. Every user creates a directory called "Bob's stuff" or "BSR" or something similar. Then all of the files they work on, no matter who else will need them, should go in the "Bob's Stuff" directory. Each file should be given completely incomprehesible names like "ClientData.xls", "JimQuote.doc", or "CSIQ12001rpg.ppt". That way when you need something, all you need to do is find the person who worked on it first and have them remember what they called the file. The set up and maintnence costs of this system are nearly nill. All you need for it to work are employees with photographic memories who never quit or die.
-B
If you'd love to check it out, but the links on cvshome.org point to wincvs.org for TortoiseCVS, and that domain, according to NSI whois: Record expires on 08-Mar-2001. So try this link instead, it works!
John 17:20
Sorry for the shameless plug (I work for Open Text Corporation), but Livelink is a great document management system product. It may be out of your price range, but it does allow you to have versioning on your documents, and it has a whole whack of other features that many enterprises demand.
It seems that they're marketing it now as "a highly scalable and comprehensive collaborative environment for the development of Web-based intranets, extranets and e-business applications." Oh dear.
www.timcoleman.com is a total waste of your time. Never go there.
As a developer, I usually prefer CVS, but StarTeam works quite well for a whole office, Word docs and all. For the Windows-based world you mention, it seems quite appropriate. They have many different clients, and I've seen it used in mixed Windows & Solaris & Linux environments.
In general, if a shop can't use CVS, and especially if they're using SourceSafe, I can in good conscience recommend it. And remember, friends don't let friends use SourceSafe :-)
IANAL, YMMV, etc. I'm not sure if it will work for you, but it's definitely worth investigating
True... and the problem with the files being binary can be solved by UUencoding them, first.
"Alcohol, Tobacco, Firearms, and Explosives" should be a convenience store, not a government agency.
CVS done right.
It's 10 PM. Do you know if you're un-American?
Check out the products available from Hummingbird, Documentum, and Eastman. A long list of document management vendors lives here.
---
The Big News Page
Maybe I'm cynical but your stated goals of
implementing version control and
making it usable by nontechnical people
You face one major uphill battle.
Many nontechnical people have a hard time understanding a hierarchy, or of file types; this is expressly why Windows 95 defaulted to hiding file extensions and the subdirectory trees.
Add to that the complexity of "where in the hierarchy does this file permanently belong," and the question "at what point in time was the file in a condition you liked?," you get into a major learning curve. Describing a sandbox is a task unto itself. Undisciplined developers often grok CVS but still don't use the delta comments in any meaningful way.
That said, VMS is probably your ideal here for simplifying version management. Too bad it was an integration into the filesystem itself, and didn't expressly deal with multiple writers or delta comments.
For those who haven't used VMS, the filename included a version number: name.extension;version . If you neglected to mention the version number in a system call, it assumed the newest. Every file opened for writing got the next version number and left the old versions untouched; every file opened for read-write cloned the newest old version first and bumped its version number. This builds into a large list of ;1 ;2 ;3 ;4 ;5 ... ;632 version for each file. You could easily back them all up, or prune to the newest version.
[
Microsoft doesn't even use Visual SourceSafe internally. It can't handle large projects. They actually use Rational ClearCase for the majority of their Source Control.
Doesn't say alot for the product.
The other thing is SourceSafe has problems with more than 200 projects in its repository. It starts Corrupting the data. Not what you want to see from a source Control system.
----
Just remove the spaces and do the intelligent thing to email me.
----
Just remove the spaces and do the intelligent thing to email me.
Great idea, except that good developers know not to leap into re-inventing a wheel that might already exist, especially at the cost of "a week or two". Don't you think they already have something to occupy their time?
Although I can't say it's the greatest system I've ever used, PVCS offers a moderately simple web interface that should work across all OSes. Our non-developers use it to store their documents without too much trouble.
-- What is this Earth thing you call "slow"?
As a systems engineer for a gov contractor, I needed to make a survey for documentation revision control software. Due to my CS background, I immediately thought about CVS, sourcesafe, and other source code revision control packages. The source code revision control packages were not up to the task of documentation revision control. Since systems engineers deal with requirements, and requirments are stored via documentation, every requirements analysis package comes with documentation revision control software. These packages also help to create documents from templates and databases. Depending upon your needs, these documents keep track of customer requirements from concept through delivery/installation, into maintenance.
There are many systems engineering tools that handle requirements analysis in various ways. Many of them work with MS Office (prolly what your writers currently use), and have built in versioning control. My best source for information on these tools was at the INCOSE tools website. This sites lists tools and checks them for the following features...
Store standard document outlines - used as starting points? User definable templates or modifiable?
Produce architecture views from functional and object oriented (OO) perspectives? Examples: WBS, functional , physical, data flow, state diagrams
Support various physical architectures? (View from a number of levels, Black box, Rack, circuit board, chip)
Enable tailoring to specific standards and requirements, IEEE, ISO, MIL-STD?
User friendly & menu driven (drag and drop capabilities)?
Support a single user or multiple concurrent users?
Input document change / comparison analysis
Visibility into existing links from source to implementation--i.e. follow the links.
History of requirement changes, who, what, when, where, why, how.
Baseline/Version control
Access control (modification, viewing, etc.)
. Support of concurrent review, markup, and comment
Multi-level assignment/access control
Plus many more features.
It is my opinion, that the following packages are up to the task:
Cradle
Doors
rtm
rdt
Caliber rm
If you go the page that I linked, it provides an in depth review of each of these tools plus many more.
Keeping
Microsoft Visual Source Safe.
It stores versions, has a nice, friendly, Explorer-like interface, and runs on windows. Sounds like that's all the management wants. As long as they don't want to branch documents (which I recall being a bit of a bitch), they should be fine.
(All of this with the note that I'm *pretty* sure that VSS handles binaries alright, even though it may not be able to do such things as diffs, even on files in a proprietary format from it's own company.)
There is a product called DocsOPEN made by Hummingbird I beleive. We use it at my office. It is WELL worth working into, and it integrates into almost everything windows. There is also a web interface for it called CyberDOCS.
Basically, instead of saving your files to a drive, Docs comes up and asks you to fill out a form. The file gets saved somewhere mysterious, the data goes into a serachable database, and you get on with your life. Later you call it up by searching for it, and you can add a new version when you save it. It's pretty neat. However, it is hard to get people to use unless you beat them over the head.
-- [ta]
Why not use HTML? I've been using HTML for years for all of my documentation. It works perfectly for a cross-platform manual accessable from anywhere.
If you need powerpoint-type presentations, Flash is easy to use, fast, and readable on nearly all modern browsers. You can even generate it with PHP or PERL.
If you need an easy to use UI, take a look at Xerox Docushare or perhaps if you want to lean toward groupware look at Amphora.
"A microprocessor... is a terrible thing to waste." --
"A microprocessor... is a terrible thing to waste." --
GeneralEmergency
Check out TortoiseCVS from the CVSHome website. It's an add-on to Windows Explorer that adds status dependent color shading to CVS controled directories and context sensitive commands to the Explorer file menu. Comes with a bundled SSH client for secure tunneling.
Easy to install and VERY easy to use, and no, I don't have anything to do with the project. I just use it.
It can be very embarrassing to have all of the private comments revealed to the other party when you didn't realize there were there. Increasingly firms are checking for these things as well.
Word 2002, from the Office XP suite, includes a Security Tab on the Options settings. In there you'll find a Privacy section which gives you checkboxes for things like deleting personal information on save and "Warn before sending, saving or printing" a document that has file revision tracking turned on.
-Coach-
Perhaps the world's greatest tragedy is that ignorance is not impotence.
Two weeks, one office employee: $600.
Five weeks, 3 part-time developers: $59,000.
Q4 earnings time, the look on the boss's face: priceless.
There are some things money can't buy. For everything else there's Corporate Mastercard.
--
"Fuck your mama."