Ask Slashdot: What Is the Best Open Document Format?
kramer2718 writes: I am working on a project that requires uploading and storing of documents. Although the application will need to allow uploading of .docx, doc, .pdf, etc, I'd like to store the documents in a standard open format that will allow easy search, compression, rendering, etc. Which open document format is the best?
Since "best" can be highly driven by circumstances, please explain your reasoning, too.
Have a question for Slashdot's readers? Take a look at other recent questions first to see if someone else has had a similar question. And if not, ask away! The more details and context you include, the more likely your question will be selected.
.txt. If you need pretty formatting, fill it Latex tags.
Or store both the original, and a standardized format. The place I work stores everything from engineering drawings, meeting minutes, purchase records, to manuals of old equipment in a central document library. It retains the original file, and makes a pdf of every file, and a link to both is listed in each entry. We've already had some older CAD formats no longer supported by current software we have easy access to, but the old pdfs are still readable and it is cheap enough to find some intern to re-create the document from the pdf if need be.
All of the "X" variants of MS Office documents stand for "XML" - that is, the documents are stored in a series of XML files inside of a ZIP file that is renamed to formatX (docx, xlsx, etc). There is no real need to even have Windows or Office installed to index these documents. Just write up a basic script to extract the ZIP file and parse out the related XML documents. Note: this isn't as trivial as it sounds at first, though. This would assume that Microsoft's XML structures (yes, plural), had an easy to comprehend standard that was logical to work with. It'll take a little digging but totally doable.
TLDR: not by choice, my company heavily relies on Excel documents, and this is how I ended up managing them, importing their contents into a SQL database for indexing and other purposes .
Or store both the original, and a standardized format. The place I work stores everything from engineering drawings, meeting minutes, purchase records, to manuals of old equipment in a central document library. It retains the original file, and makes a pdf of every file, and a link to both is listed in each entry.
THIS.
PDFs (or some similar standard) will ensure that the original documents can be read by everyone and viewed with the original formatting intended by the person creating them. Any differences in the version of Word or whatever is going to tweak the formatting in unpredictable ways.
But the originals should always be retained, since it may make future editing easier. And people also won't be stuck trying to undo whatever unpredictable reformatting or editing (e.g., loss of certain features moving between formats) might go on in your conversion process.
No, just no....
Store the documents in their original format.
There are many possible reasons why you shouldn't mess with the originals such as formatting, legal implications, loss of content because one format supports stuff that the other doesn't, etc.
The only way that I could see this working is if you converted everything to an open format but kept copies of the originals and linked to them. But if the plan is to dump the original documents, then it just isn't worth it....