Slashdot Mirror


Migration from PDF, MS Word and Frontpage?

l337hx0r asks: "I've just began work at a government department who, with 24 branches and five years at their disposal, has managed to create 85,000 proprietary documents. The current IT manager doesn't see a problem with this, and the government recommendations of open formats came as quite a surprise. I want to move this intranet away from WYSIWYG to a logical document structure (Docbook, Tex, XHTML), though the migration tools have been quite disappointing. Can it be done?"

2 of 10 comments (clear)

  1. Yes. by RGRistroph · · Score: 3, Insightful

    Yes, it can be done.

    The easy part is converting and indexing all the docs. Not that that is easy. What I would do is something along the lines of scripts to convert them to html and put them in a database with a web browsable front end, building indexes of keywords, accompanied by A LOT of manual labour inserting meta-information about each document.

    Almost any document editor these days can import and export html.

    The hard part is getting people to start using it. They won't insert their new documents, they won't use it to efficiently look up stuff instead of poking around in their harddrives and email archives, they will just keep doing what got them in trouble in the first place.

    And the only thing you can do about it is get a new job.

    Hope that helps.

  2. minimize conversion by candot · · Score: 2, Insightful

    Format migration is required by almost every commercial content- or knowledge-management system, and the more structured and metadata laden your content, the better as far as these products are concerned. But as other have pointed out, getting content into new formats is only part of the battle; then you have to put an interface on top of it, as well as reorient users to create content in the new system.

    It's a fight, but I haven't given up. I do a lot of consulting around structuring new approaches to information management and migrating content from old formats to new formats. Rather than look for another job, I've been looking for ways to change this one. One approach we've hit on lately is to stop beating ourselves silly migrating format A into highly structured format B. Instead, we leave things the way they are (if open format isn't important) or migrate to a minimally structured format, such as html or very loose xml. Then, instead of relying on a CMS to dictate a production system that produces well-formed documents, we develop systems that provide high-precision navigation across highly unstructured document sets. After all, for most companies, the point of all this work is to improve document access and navigation.

    If you've got a lot of content (and it sounds like you do), it's often better to put as little work in fixing past content than developing systems to handle future content. It's a lot cheaper, at least. The minimal conversion approach, combined with a good navigational overlay, can save a lot of time and money without compromising document access. Done properly, you can start creating new docs in the open format of choice, leave the old stuff alone, and actually improve document access in the process.