Slashdot Mirror


Ask Slashdot: What Is the Best Open Document Format?

kramer2718 writes: I am working on a project that requires uploading and storing of documents. Although the application will need to allow uploading of .docx, doc, .pdf, etc, I'd like to store the documents in a standard open format that will allow easy search, compression, rendering, etc. Which open document format is the best? Since "best" can be highly driven by circumstances, please explain your reasoning, too. Have a question for Slashdot's readers? Take a look at other recent questions first to see if someone else has had a similar question. And if not, ask away! The more details and context you include, the more likely your question will be selected.

7 of 200 comments (clear)

  1. can't you search the current doc types? by alen · · Score: 3, Informative

    if you use the API's supplied by their creators?

  2. PDF/A by thechemic · · Score: 5, Informative
    --
    Let's make like a bird... and get the flock outta here.
  3. Forget the Universal Format crap by xxxJonBoyxxx · · Score: 5, Informative

    1) Forget the Universal Format approach - your users will kill you for messing up their formatting, and you'll never get complete feature parity
    2) Store the docs in their original format
    3) Get Apache Solr to search your content
    4) You'll be spending a lot of time on #3, so leave time to tinker

    1. Re:Forget the Universal Format crap by Anonymous Coward · · Score: 2, Informative

      I work at a typography, and I get a lot of documents from a lot of different people. Those "documents" come as MSWord files with missing fonts, pdfs made with some shoddy software, strange ODTs, many more different types of doc files, the mysterious lnk files that work perfectly fine for them, but not for anyone else and my personal favorites, jpg files (not png or some other lossless format, because that would imply actual thinking).
      Strange enough, I've yet to receive any plain text files.

      To index everything, I use calibre, just something simple like changing the file name to "Project name - tag 1, tag 2, tag 3 - Client name.pdf" and it imports it automatically from a folder I drop it in, adding that info to the metadata.
      It doesn't actually index or search inside the files, but they are easier to find and handle.

  4. Re:For Two-Millennia Durability... by gstoddart · · Score: 5, Informative

    Nonsense, bamboo can't touch papyrus for longevity, and you don't need to worry about pandas.

    Damned bamboo shills.

    And don't anybody go suggesting cave paintings, it's a completely dead platform.

    --
    Lost at C:>. Found at C.
  5. Re:.txt by Desler · · Score: 4, Informative

    Then you end up with Microsoft inserting garbage characters at the start of each text file to make their job easier, breaking scripts and confusing both users and other editors alike.

    It's not a garbage character. It's a BOM and it's part of the Unicode standard. If your scripts and text editors can't read the BOM in 2015 then they are the things that are horribly broken.

  6. Re:And a pony too? by gstoddart · · Score: 3, Informative

    English idiom connoting yet another impossible thing in a child's unrealistic wishlist ... typically placed at the end of a series of outrageous demands: " ... and a pony".

    Now, please, don't make me pedantic you again to explain the cromulency of phrases. ;-)

    --
    Lost at C:>. Found at C.