Slashdot Mirror


Document Management and Version Control?

Tom wonders: "I am working in a medium-sized software development company. The functional analysts use Microsoft Word to document the specifications, and Sharepoint to publish the documents. However we'd like to improve our process to have better revision control and traceability. We have looked at alternatives like using Wikis, or static HTML documents with CVS. The functional analysts want ease of use, while we developers would like to see high-quality end products, revision control (i.e. tagging & branching of the document base), and traceability features. What tools and document formats do you use and would recommend?"

16 of 326 comments (clear)

  1. The simple answer by Ckwop · · Score: 5, Informative

    Latex with CVS. This is what I use for my documents. It's simple (yes it is simple.. markup languages are not hard to understand) and with CVS it's far more feature complete than Word in version control.

    There's plenty of WYSIWYG tools for Latex. Let Google be your guide.

    Simon.

    1. Re:The simple answer by CRCulver · · Score: 5, Informative

      There's plenty of WYSIWYG tools for Latex.

      I'm always happy to see fellow TeX evangelists here. If you don't know about LaTeX yet, check out the TeX Frequently Asked Questions and discover the joys of a typesetting system that is not only high-quality, but free as in freedom and immensely extendable.

      LaTeX's markup makes so much sense that a WYSIWYG tool isn't necessary, for even the man on the street can be just a productive with doing it up in a text editor. A good and free as in beer guide to the system is The Not So Short Introduction to LaTeX2e , though if you are going to be markup up lots of math (LaTeX's specialty) you'll probably want Graetzer's Math Into LaTeX since LShort doesn't cover it so much.

    2. Re:The simple answer by dsandler · · Score: 4, Insightful
      LaTeX's markup makes so much sense that a WYSIWYG tool isn't necessary, for even the man on the street can be just a productive with doing it up in a text editor.

      I love LaTeX (I have TeXShop open in the background right now!), but I have to argue with the assertion that the uninitiated "man [or woman! -ed] on the street" can be just as productive as s/he was in Word. Compare Word's graphical table builder and tab ruler, the result of about 20 years of noodling around with the best user experience for creating such things, with \begin{tabular}{|r|r@{.}l|}. OW MY WORD PROCESSOR.

      Even if you give everyone a pocket syntax reference, unless you have a TeX ninja working overtime on your templates, you'll still end up with a lot of documents that look like academic research papers. This is fine if what you're writing are academic research papers [hi!], but for most corporate communication people are accustomed to more effortless (read: WYSIWYG) control over the output. (This is, of course, usually a terrible idea, resulting in official RIF memos from HR written in Comic Sans; I think there's a happy medium somewhere in between.)

    3. Re:The simple answer by Ithika · · Score: 4, Informative

      LaTex may be terminally cool for creating fancy-looking documents. But it doesn't solve any problems that this guy cares about. For his purposes, it's just another word processing format.

      Not true. The OP asks about version control and branching... and the best way to store something for version control is as plain text. Yes, modern version control software allows fairly sophisticated binary deltas (for example, I believe SVN has this capability) but there are still features that can't be done without text.

      For example, can your version control software tell you what text changed between two revisions of Word documents? It can if they're LaTeX documents.

    4. Re:The simple answer by flooey · · Score: 4, Informative

      LaTex may be terminally cool for creating fancy-looking documents. But it doesn't solve any problems that this guy cares about. For his purposes, it's just another word processing format.

      Actually, switching to LaTeX changes the scope of the problem from tracking changes to arbitrary files to tracking changes to text files. There are a lot more tools that are good for the latter problem than the former, so the available options get bigger. You're right that it doesn't suggest a good tool to go with, though.

    5. Re:The simple answer by Zontar+The+Mindless · · Score: 4, Interesting

      DocBook rocks.

      We use DocBook XML for the source format of the MySQL Manuals, and SVN for version control. We maintain 3 distinct versions of a >1600-page software manual this way, in numerous translations. We produce end-user docs in HTML, PDF, TexInfo, plaintext, CHM, and a couple of other formats. These include the online manuals at dev.mysql.com which get updated 4 times a day from our SVN repositories. We also maintain the Internals Manual using this system. It's also relatively easy for us to produce documentation that's either standalone or integrated with larger documents, such as the Connectors manuals.

      We are very happy with this system. Our users seem to be also.

      I use oXygenXML as my principal XML editor. So do some of my teammates. It's thoroughly DocBook-aware and does nice transformations of shorter DocBook documents into HTML and PDF. It also validates, pretty-prints, and does a good job performing diffs and merges between different versions of large (100+ K) documents. (It also provides for editing and debugging XSLT stylesheets, although I don't personally use it for that at present.) It's available on *nix, Windows, and Mac (yes, it's a Java GUI app, but it's remarkably fast and stable one). It's neither libre nor gratis, but it's well worth the money, and much cheaper than the (other) commercial alternatives. If you are working hands-on with large amounts of XML in a production setting, I strongly recommend that you check it out.

      --
      Il n'y a pas de Planet B.
  2. Subversion... by nweaver · · Score: 4, Informative

    Subversion is your friend...

    It handles binaries right (unlike CVS)

    It works over a variety of transport layers (HTTP/HTTPS/SSH) with some decent authentication models.

    It treast revisions as an archive-wide property.

    You can't check in an inconsistant state.

    It runs under *NIX, Mac, Windows, etc.

    Its free software.

    Try it. I switched a few months back from CVS and have been very happy.

    --
    Test your net with Netalyzr
    1. Re:Subversion... by dpaton.net · · Score: 4, Informative

      After several years of working with SourceSafe and it's truly braindead way of dealing with atomic commits and binary files (not to mention the massive data loss problems we had with it) my office switched to SVN for EVRYTHING. All employees use it for everything, from notes on ideas in Notepad or BBEdit or pico, to massive software projects with hundreds of files and over a million lines of code. Using TortiseSVN to put it into the windows desktop shell, it's nearly transparent, and it allows atomic commits to work intelligently, making the engineers who work with programs that have multiple files (hardware in myb case, a half dozen files for each PCB design, it works even better for revision control for the software guys), which has allowed us to recover a really insane amount of time we'd been handing over to M$SS for maintainance and babysitting. Additionally, the (automagically compressed) repository size for 11 hardware guys and 13 software guys with a year's worth of code and binaries (2M lines and hundreds of MBs, local, respectiively) is a paltry 180MB. That's with upwards of 20 commits on everyone's data every day. I admit, I sound like I drank the SVN kool-aid, and I'm OK with that. The next step is to install it at home and use it to back up /home, /documents and /music on my various and sundry computers (seperate server, weekly media swap, etc).

      I love it that much, and you can too!

      --
      This is not a sig. this is a duck. quack.
  3. Baby step #1: source control + existing docs by dsandler · · Score: 5, Interesting

    It's pretty shocking to change everything (document format, writing environment, collaboration tools) all at once. Start with reasonable source control, the best bacon-saving device you can get. Have everyone check existing docs (Word, HTML, whatever) into source control; Even though diffs are meaningless for the binary formats, the other benefits (versioning, collaboration, remote storage, tags, platform independence) are huge. It's the quickest way to put an end to the madness of emailed .doc files and accidental deletions.

    If you've got a lot of Windows users, go with Subversion and get everyone to install the TortoiseSVN shell extension, which offers the most natural GUI for new (and experienced!) users of version control.

    Once everyone's comfortable with SVN, you can then start migrating to text-based document formats in which the source control diffs mean something (LaTeX, XML, reStructured, etc.)

  4. Meta-answer by Jerf · · Score: 4, Informative

    I doubt I'll have much to add to the long list of people describing their experiences with various systems, but I'll pop out this meta-thought: Your developers and "functional analysts" probably have wildly varying needs, especially if the "functional analysts" use word-processing documents like Word. There's no crime in given each group of people a separate system.

    Your devs probably ought to get subversion because the continuing cost of using a sub-optimal source management system adds up to staggering amounts pretty fast. Your other writers probably aren't continuously branching and merging and doing all the other things subversion allows (if nothing else that's really confusing for most documents), so they can use a simpler, easier-to-use system that doesn't incur continuous costs due to confusion and documents getting mangled or destroyed due to incorrect use of the system.

    The right tool for the right job.

    (Note: I'm not saying you should use multiple systems; I'm just saying it's not a crime, if they solve different problems. If you can get your writers to use SVN, especially if they use something with a decent plaintext representation that stands a chance in Hell of merging, hey, great, more power to you.)

  5. Depends very much on your analysts by Circuit+Breaker · · Score: 4, Interesting

    Where I work, developers use CVS exclusively. It has its quirks, and we've considered alternatives, but the combination of CVS+TortoiseCVS+Jalindi Igloo (Visual Studio integration)+Jira is unbelievebly hard to beat (Subversion doesn't have a reasonable SCC connector, and nothing else has anything that comes close to TortoiseCVS -- even TortoiseSVN is clunky and awkward by comparison. Oh, and CVS mergepoints work perfectly, unlike the nonexistent merge tracking capabilities of Subversion).

    When our "functional analysts alike" guys wanted version control, we naturally gave them the tried-and-tested CVS, and gave them instruction on its use. It was a horrible failure. The update-edit-change-commit cycle which is so trivial to developers just didn't work. The people are not dumb - just have a different mindset. We had also tried a Wiki (MoinMoin, works well for devs), but its inability to search or version Excel files make it irrelevant.

    Eventually, much to my dismay, we settled on Sharepoint. And while it's clunky, horrible, keeps only the 7 latest versions of any file, has no branching and is often inconsistent with error messages, them users are able to work with it without requiring assistance.

    Do not confuse a feature list with applicability of a tool to the situation at hand which, it appears, might depend more on the people involved than anything else.

  6. SVN + WebDAV + Autoversioning by HFShadow · · Score: 5, Informative

    http://svnbook.red-bean.com/nightly/en/svn.webdav. autoversioning.html

    From the SVN Handbook:
    "Because so many operating systems already have integrated WebDAV clients, the use case for this feature borders on fantastical: imagine an office of ordinary users running Microsoft Windows or Mac OS. Each user "mounts" the Subversion repository, which appears to be an ordinary network folder. They use the shared folder as they always do: open files, edit them, save them. Meanwhile, the server is automatically versioning everything. Any administrator (or knowledgeable user) can still use a Subversion client to search history and retrieve older versions of data."

  7. Re:Subversion...[*Does* Call Binary Diff Tools] by malloc · · Score: 5, Informative

    Of course, Subversion is no more your friend than CVS in this case since neither can do proper diffs! It's binary data for f*ck sake! Subversion handles binaries better than CVS, but not for the reason you state.

    Actually, GUI Subversion clients like TortoiseSVN can show diffs for binary files like Word or OpenOffice, using the built-in diff capability of these programs. The end result is you can double-click your binary document and get a window showing you the differences.

    The latest nightly TortoiseSVN builds even include an image diff viewer.

    -Malloc
    --
    ___________________ I want to be free()!
  8. Svnwiki, of course by Azul · · Score: 5, Interesting

    I would suggest using svnwiki, a wiki system that stores its whole contents in a Subversion repository (Disclaimer: I am the main author of svnwiki). That allows you to use the usual svn commands (svn diff, svn log, svn update, etc.) to work with your wiki as well as using the web interface.

    You can see an example wiki (in spanish) and its associated svn repository (login as anonymous, password is the empty string; Slashdot seems to strip out this auth information from my URL) to get an idea of what the repository looks like.

    These are examples of some of its features:

  9. Word has this built in by boatboy · · Score: 4, Informative

    Alot of people don't realize it, but there is document versioning built into Word. If it's turned on, it will track changes, etc. by user. There is also pretty rich editing capabilities. Reviewers can mark up the doc with comments, etc... Adding sharepoint lets you distribute that process pretty well. Get an in-depth Word book and figure out how to do it in sharepoint/word.

  10. Re:Reasons _NOT_ to use subversion: by PeeCee · · Score: 4, Interesting
    Your post is incredibly misleading, if not plain wrong. I will assume this is because you don't actually have any experience with Subversion, so let me address your points one by one.

    1. branching is woefully inefficient on the storage side
    No idea what you mean here; on the contrary, branching is incredibly efficient. The fact that creating a branch means creating a copy of the whole trunk does not mean every file in the repo is physically copied; all copies are "virtual", and files are only copied when they are actually modified. What this means is you can branch a directory with 10 or 10000 files in exactly the same time and using exactly the same storage space (both close to zero). See the "Cheap Copies" sidebar.

    2. merging is (practically) unsupported
    Merging is supported just fine. Granted, merge tracking is not supported (yet; it's planned for a future version), which means you have to keep track of your merges yourself. However, with a bit of discipline (for instance, writing in your commit log messages which version you are merging in) this is not too hard at all for the vast majority of cases.

    3. there is no rollback from a commit, because....
    This is more of a philosophical decision; the designers chose to make it so that every transaction is recorded. You want to go back to transaction X? Fine, copy that version as the current version; nothing is lost. There are also (complex) ways to physically do it for the very, very few cases in which it might actually be a good idea; but many of us have found that version control systems which allow the undoing of transactions are a recipe for disaster (I can vouch for this).

    4. it uses BerkeleyDB as the management system.. and that is bad because...
    As I said before, the "no rollback" thing has nothing to do with the storage system since it was a design decision. Also, BerkeleyDB is only one of the available data stores. I'd argue that FSFS is a lot more popular these days. More will come in the future.

    5. Oracle now own Sleepycat software (the makers of BDB)
    As I just said, this is close to irrelevant with the existence of the more popular FSFS. Besides, AFAIK Oracle has not said they will discontinue BDB, and I believe it was open source, so if it really mattered I guess it could be forked.


    - PeeCee