Open Source Document Management and Revision Control?
Ramon M. Felciano asks: "I'm trying to scrape together our development intranet and want to be able to post specs, change reports, etc. that could be in Word, HTML, PDF, etc. Are there any intranet "shells" out there that include at least rudimentary document management facilities (like the ability to post files to a directory structure, receive notifications of new postings, and allow single-click-to-open-inline functionality, rather than download and then open). I've played with the idea of converting everything to PDF but there doesn't seem to be a way to automate this from the source format. Also, integration with CVS for version controlling these docs would be a nice bonus I checked Freshmeat but didn't see any good matches. The closest was Phorum and the other thread / discussion servers, but we don't really need discussion support. Any suggestions? "
Hi,
I've been thinking about this for a longt ime now, and have found some things,but no real integrated solution as of yet.
I will list the ideas I had here, in the hope someone will take them and build an open source application of it, or maybe I'll get some feedback and will start myself:
-Posting documents:
-- Can be done using a webdav enabled server, but I don't know wether it has version control and update messages.
-- CVS could serve as a tool, but from things like MS Office it just doesn't seem too handy
-- Have someone update/maintain a database and have the files on a fileserver, with their filelocations in the database. This is up to now the best idea, if you can afford the person to handle the document archiving. This is also limited to finished documents
- Document formats:
-- PDF seems good, as it has alot of open source tools to create them and to search them (see searching later on). There are tools to convert everything to pdf, and it has things like copyright protection. IMHO the document format
-- XML , could be cool, open and easy format, just not many wysiwig tools to create them. Office 2000 is said to create xml docs, but I haven't checked the format out.
- Searching(2 options as I see it)
-- In the catalogueing plan one can easily search the database for the documents
-- In other cases , look at umbsearch or ht:dig, who both search files, and can be enhanced by using parsers for pdf and Word documents.
comments at the above email address
-- signed for your pleasure --
I have similar needs and have been thinking about this problem for quite some time. I even submitted it to 'Ask Slashdot' but it wasn't lucky enough to make it.
OK. I have no intranet shells yet, but here is my take on the overall infrastructure.
Version controlIt can be either CVS or ClearCase or any other decent version control software. I prefer CVS due to its free nature.
ContentAll our legacy documents are in M$Word format and switching to something new is out of question. That said, I would like to phase in a XML/XSL based solution. Take DocBook for example and store (almost) all new documents in XML/DocBook.
PresentationA small CGI based (or a Zope) frontend can easily serve files from CVS through the web server. If the doc is in binary format (Word, Excel, ...) there isn't much the CGI program can do. But if it's in XML it can be presented in HTML or served in PDF, etc.
SearchingThe free indexing tools can help here, altough I am not aware of tools capable of indexing Word docs. But Word can be converted to HTML (search freshmeat for it) and while the output is quite ugly, it does not matter for indexing purposes.
ProblemsOn the server side we need a rock solid XSLT engine and a XSL formatter. The latter is a problem at the moment.
On the client side a nifty, preferably cross platform XML editor is needed for the users to create and edit XML docs. This editor should buzzword compatible, validating, maybe WYSIWYG.
These are the missing links. I decided to wait for a while with the implementation until XML and its friends (XSL, XSLT) mature a bit. But even now one can start coding the front end. I am thinking about creating a Zope product for this.
See http://bscw.gmd.de/
The formats are probably whack, but the idea is incredible.
Java interface (last time I checked) and I think the source is downloadable.
Probably not what you want, but cool none the less.
willis.
there is no thing
what else could you want?
Perhaps everybody has seen The Brain which has an interesting visualization system, to say the least. I would love to see a more integrated solution under linux, with Web Browsing and Searching, plus link managment/browsing using something like DaVinci , which has an excellent (soon to be improved) API which could create/browse highly structured document links.
This would make an excellent concept like Everything useful for everyone...
I will have to solve a problem similar to this in the near future. I plan to write all of the documents in LaTeX, and write Makefiles in the various directories that will use latex2html to generate a web readable version, or by a different make command, generate pdf or ps. I will have everything under cvs. I intend that access for posting will be through ssh and cvs -- you can set the CVS_RSH environment variable to tell cvs to use ssh to access a remote repository.
On the server I will occasionally do a "cvs up; make html" in the web directory. I could also have a cgi that would do the same; but anyone who can post will have an ssh-reachable account and can do it themselves, so it would be a possibly insecure convienence.
People who use Word will be SOL; there will be helpful links to Mac and Windows versions of LaTeX and emacs. It won't make a difference in the long run; they won't ever write anything in LaTeX, and they'll start using Word and at first email, later other servers, to exchange documents. But I believe that you have to fight the good fight; also, given the amount of stuff I have to write, I think that the only way to keep my sanity and any social life at all is to have a good LaTeX template doc and use it for everything.
Let's review how this would meet your needs:
--ability to post files to a directory structure: if you are thinking of something point-and-clicky like FrontPage, then you can look at some of the front ends or wrappers to cvs. I use pcl-cvs.el. There are some girly java and tcl/tk things also.
--receive notifications of new postings: in cvs you can set a "cvs watch". Or the makefiles can blast a message if something is remade when they run. I don't intend to have any sort of automatic notification, because it seems like spam.
--single-click-to-open rather than download and open: Well, I would browse the document library by checking out a copy from cvs, but I suppose others could look at the web and click on the html versions
The main problem is that cvs doesn't handle binary format files very well, so Word is kind of out. Oh well. I think that the other people on this project will never use LaTeX and eventually they will put the frontpage extensions on the apache web server, but I'll always be able to say I tried to do the right thing from the start.