Data Migration Between CMS Repositories?
StyleChief asks: "My employer has decided to begin migrating all of the company's documentation oriented objects and files to a new content management system. The new system seems to have good functionality, robustness, and better usability than our current systems. However, the
task of migrating all of the data from 2 or 3 other repositories to the new system seems to be a daunting chore. Automating the
process as much as possible is of course my first goal. There are APIs that one can use to do this, but the details quickly
become eye opening. Questions of objects versus files, handling their attributes, authorizations, file type identification, shadowing, build integration, versioning, etc., are several of the plethora of issues at hand. Moving perhaps hundreds of thousands of objects from one proprietary repository to
another while preserving everything related to that object is the name of the game. I would like to know how others from Slashdot have dealt with similar scenarios. I am particularly interested in the 'lessons learned,' and the problems that you didn't see coming beforehand."
Comma-delimited files. 'nuff said. :)
The swiftest way to migrate data between repositories.
Drop the firewall.
This question sucks. Not because of the post or the person who did it, but because it is one of the gaping holes of "Enterprise" class software. I have some first hand Documentum experience, and I can only say thank heaven for Jython. It has been a serious time saver in learning the internals of Documentum's API.
As far as CMS software goes, Documentum is one of the better that I have had to deal with (compared to Interwoven, Vignette, Stellant), and the API really is simple to use once you dig through the layer of Java / OO Design cruft that some developer types like to throw in the way of getting things done.
A few questions need to be asked and answered before you can ever migrate content from other systems. First is a survey of what kind of content you are trying to pull, is it structured and tagged or categorized well, like XML in Docbook format? Or, is it old 1996 HTML crap that years of users and Frontpage or Dreamweaver have thrown out to the corporate intranet? If it is the latter I suggest looking at a nice emerging company that handles this well called Nahava. (I am not affiliated, but I think their tech is well done after working with them a bit)
After this is done, you have to decide what you want the new CMS to store. Are you going to fit all the old stuff to some fancy new taxonomy that a big brain strategerian has come up with, or is it a straight over migration, with 3 root folders, one for each of the old systems. Is it possible to do both by putting some of the Documentum features to use?
Anyway, there are a million things to answer in this process, good luck!
Interestingly, none of these "migration articles" on web sites that are explicitly devoted to CMS matters (e.g., CMSwatch.com, cmsReview.com) seem to characterize this problem as relating to Extraction, Transformation, and Loading (ETL), raising the possibility that their authors are ignorant of the many ETL tools that are available. In the open source world, these tools include Octopus and Jetstream. Of course, Perl programmers do not call this process "ETL," but, rather, simply "data munging."
A prior Slashdot story on "Transferring data 'tween databases" (posted 14 April 2003) might interest you. I cannot post a link to it, however, because Slashdot's search engine is currently down.
Finally, EMC just bought Documentum, the CMS that you are considering. EMC is primarily a storage company, and I cannot help but wonder how CMS fits into their storage strategy.
A lawyer & digital forensics examiner. Also an expert on open source software (OSS).
Data Junction specializes in such work (among other stuff).
...its interesting in this context. The UK Government's Public Records Office issued a standard a while back that all their records management systems are supposed to adhere to (RMS systems sit on the same spectrum as document- and content- management systems). Every supplier has to get their product certified against this spec. The main thing the spec mandates is an import and export format for all the document metadata.
It takes someone as big as a government to demand "no lock in please, we're british" to get things like this sorted out. Hopefully the JCR API will sort out the content management space as well. All a bit too late for you.
With that said, the 2002 product integrates nicely with .NET and is actually pretty slick.
perfect portal