Ask Slashdot: What Is the Best Open Document Format?
kramer2718 writes: I am working on a project that requires uploading and storing of documents. Although the application will need to allow uploading of .docx, doc, .pdf, etc, I'd like to store the documents in a standard open format that will allow easy search, compression, rendering, etc. Which open document format is the best?
Since "best" can be highly driven by circumstances, please explain your reasoning, too.
Have a question for Slashdot's readers? Take a look at other recent questions first to see if someone else has had a similar question. And if not, ask away! The more details and context you include, the more likely your question will be selected.
http://www.pdfa.org/2011/08/pd...
Let's make like a bird... and get the flock outta here.
I would suggest, unless you have a pressing need to convert them, that you should store the documents in the formats they are uploaded in.
Whenever you convert a document you run the risk of completely messing up the layout, style, etc.
.txt. If you need pretty formatting, fill it Latex tags.
1) Forget the Universal Format approach - your users will kill you for messing up their formatting, and you'll never get complete feature parity
2) Store the docs in their original format
3) Get Apache Solr to search your content
4) You'll be spending a lot of time on #3, so leave time to tinker
Word Perfect Document, because it's been consistent for nearly 20 years. it has a simple underlying format, it's more finely granular than HTML and because I just like obsolete things.
As an IT person, I hate questions like this. There's not enough information to give a solid answer. For example:
* What kinds of documents are you talking about? Text? Photos? Spreadsheets? .pdf, etc", what formats are in "etc"?
* What is the source of the documents? Are these currently printed out documents that need to be scanned back in? Are they currently digital, and in a particular file format?
* What will people need to do with them when these documents are retrieved? Do they need to be able to edit the documents?
* How much does formatting matter? If someone retrieves the document in 5 years, will it be important that all the line breaks and page breaks are in the same place? Does it need to have all of the correct fonts? Or are you more interested in being able to have access to the information itself?
* When you say that the application will need to allow ".docx, doc,
There may be many other relevant questions, my point is that there just isn't enough detail here. In general, if the most important thing is that you have a printable document that you want to be able to print out from any machine, maintaining the formatting as much as possible, then PDF is a pretty good choice (be sure to embed the fonts and include searchable text!). If you already have a bunch of Word documents and you want the formatting unchanged, and would like the capability to edit the document after it's retrieved, then I'd typically just recommend keeping it as a .docx. It keeps things simple, will be widely supported, and prevents the risk of something going wrong while you're converting to another format. If you like the idea of using .docx because of what I just said, but want something more "open", then ODF is probably worth looking into.
Really, there are only so many choices, and each have advantages depending on your specific needs.
...you can't beat bamboo strips. The oldest original versions of Lao Tzu's Tao Te Ching are written on rolls of bamboo strips. Not sure how they scan electronically, and you will have to keep your pet pandas away from them, but for document durability, you can't beat that format...
Development is programmable; Discovery is not programmable. (Fuller)
...you can't beat bamboo strips. The oldest original versions of Lao Tzu's Tao Te Ching are written on rolls of bamboo strips. Not sure how they scan electronically, and you will have to keep your pet pandas away from them, but for document durability, you can't beat that format...
Chisel it into stone tablets, then find an ignorant local. Set up a natural gas line to a nearby bush and hide behind a rock. Cub your hands to add a slight reverb effect and tell him to preach the chiselled word, then break the tablets and hide them in a box and trick nazis into looking at them.