Slashdot Mirror


Converting Word Files to Text for Archiving?

Unknown Relic asks: "Our company has large quantities of old, MS Word documents which we are looking to permanently archive. One of the requirements of our archiving process is that the documents be stored in plain text format. Unfortunately we also have another, conflicting requirement: the text files must retain basic formatting information from the original documents, including bullets, indentations and basic table layout. While all of this formatting is possible using plain text, I have not been able to find any tools which do a decent job of retaining the above mentioned formatting during conversion. Even Word's 'Save As' option does a horrible job, though I suppose that's not overly surprising. Has anyone undertaken a project similar to this before? If so, what tools did you find or create to make the job feasible?"

1 of 81 comments (clear)

  1. My complaint about Microsoft Word by Anonymous Coward · · Score: -1, Flamebait
    I don't know what to make of Microsoft Word's demands. On the one hand, Microsoft Word should slither back under whatever rock it crawled out from. But on the other hand, this screams of the old belief that stentorian, headstrong egotists are merely virulent, rancorous gutter-dwellers. Let's get down to business: Someone has to be willing to take steps toward creating an inclusive society free of attitudinal barriers. Even if it's not polite to do so. Even if it hurts a lot of people's feelings. Even if everyone else is pretending that Microsoft Word's "compromises" epitomize wholesome family entertainment. Microsoft Word's bons mots may sound comfortable and simple, but it must not be forgotten that Microsoft Word claims to have turned over a new leaf shortly after getting caught trying to condemn children to a life of drugs, gangs, drinking, rape, incest, verbal abuse, physical abuse, and a number of other horrors. This claim is an outright lie that is still being circulated by Microsoft Word's legates. The truth is that Microsoft Word can't, for the life of it, understand why anyone would prefer so much as one minute of solitude to the company of a conniving gang of flippant, irrational provocateurs. And let me tell you, Microsoft Word is too insincere to read the writing on the wall. This writing warns that there are those who are informed and educated about the evils of militarism, and there are those who are not. Microsoft Word is one of the uninformed, naturally, and that's why it exhibits an air of superiority. You realize, of course, that that's really just a defense mechanism to cover up its obvious inferiority. If Microsoft Word had done its homework, it'd know that its attitudes are not an abstract problem. They have very concrete, immediate, and unpleasant consequences. For instance, if I said that human beings should be appraised by the number of things and the amount of money they possess instead of by their internal value and achievements, I'd be a liar. But I'd be being completely honest if I said that I'll tell you what we need to do about all the craziness Microsoft Word is mongering. We need to make this world a kinder, gentler place. Note that Microsoft Word commonly appoints ineffective people to important positions. It then ensures that these people stay in those positions, because that makes it easy for Microsoft Word to con us into believing that it is the best thing to come along since the invention of sliced bread.

    Microsoft Word can't relate to anyone other than disgusting, obdurate cutthroats. (Read as: Microsoft Word's motto is "never forgive and never forget".) Stick your nose into anything Microsoft Word has written recently, and you'll get a good whiff of demonic irreligionism. Similarly, Microsoft Word dreams of a time when they'll be free to manipulate everything and everybody. That's the way it's planned it, and that's the way it'll happen -- not may happen, but will happen -- if we don't interfere, if we don't ensure that we survive and emerge triumphant out of the coming chaos and destruction. Microsoft Word offers two reasons as to why laws are meant to be broken. It argues that (1) some people deserve to feel safe while others do not, and (2) it is not only acceptable, but indeed desirable, to wage an odd sort of warfare upon a largely unprepared and unrecognizing public. These arguments are invalid for the following reasons: First, it's our responsibility to denounce its publicity stunts. That's the first step in trying to subject its snow jobs to the rigorous scrutiny they warrant, and it's the only way to compare, contrast, and identify the connections among different sorts of voluble, pugnacious separatism. We need to settle our disputes with rational discussion -- not by moral huffing and puffing. May we never forget this if we are to deny Microsoft Word and its surrogates a chance to ignore compromise and focus solely on Microsoft Word's personal agenda.