Slashdot Mirror


Why OpenOffice.org? Open Document Formats

Jem Berkes writes "In this current article about OpenOffice.org (also covered at Linux Today), I try to make a point about OpenOffice's commitment to open document formats and interchange as the strongest selling point - never mind cost. The OOo developers are putting a lot of effort into their XML format; will this pay off, and will users notice the significance of OpenDocument/OASIS document formats?" This can't be said enough: file formats are what determine whether and how easily data is portable, or whether the user is just stuck.

40 of 478 comments (clear)

  1. Not to be negative but... by skids · · Score: 3, Insightful

    Why no SVG support, then?

  2. Who cares if its XML? by PincheGab · · Score: 1, Insightful
    The fact that the format is XML is rather meaningless... XML is nothing more than a human-readable data file format... For may things XML is unsuitable/non-optimal (ie, databases, binary data, etc...).

    The fact that the data format is documented (and the commitment to keep it so) is what's important.

    1. Re:Who cares if its XML? by Cecil · · Score: 4, Insightful

      Not necessarily true. Reverse-engineering XML (at least, XML that is not purposely obfuscated) is orders of magnitude easier than reverse engineering binary formats, because it is a self-descriptive format. Each piece of data has a name associated with it automatically -- the name of the tag -- as well as a rough structure (clearly this 'size' is for font size, not page size, since it's within a font tag). And just as importantly, XML tells you exactly where an 'array' of items ends because it has a /tag. With a binary format, the count for the array will typically precede the array, but does not have to... in a particularly complex format the length of the array can be implied by other parameters, and you have to use multiple samples to find out how exactly it is implied where it ends, and even when you think it's figured out it probably isn't, and the files that don't fit your assumptions will crash or produce garbage when read in.

      A proprietary XML file is not at all proprietary compared to a binary file. They're easy for even a novice programmer to figure out how to read.

    2. Re:Who cares if its XML? by ticktockticktock · · Score: 2, Insightful


      <data>
      AAAAAAAAAAABBBBBBBBBBBBBB BBCCCCCCCCCCDDDDDDDDDDDD
      </data>

      Someone could wrap a binary file with XML tags. Is it suddenly more readable than before?

    3. Re:Who cares if its XML? by arendjr · · Score: 5, Insightful

      I'm sorry, but here you are a bit mistaken. Most importantly there are 2 things which make XML special in this area:

      • Namespaces. XML allows you to use different XML schema's within one document. This makes it possible to embed for instance SVG data within an OpenOffice.org document (which it actually does if you're adding images). So, no need to reinvent the wheel here.
      • XSL. A technique making it possible to transform a document from one XML schema to another with very little programming effort. This makes XHTML export and import/export filters for Office 2003 XML files much less of a hassle. Again, this is actively being taken advantage of by OpenOffice.org. No need to reinvent all the parsing and generation code again.

      To say the fact they're documenting the format it is more important than the fact it's in XML is true, but that doesn't make it unimportant they're using XML.

    4. Re:Who cares if its XML? by kfg · · Score: 2, Insightful

      XML is not a file format. It is a text markup language.

      The file format of OOo XML files is gzipped ASCII.

      KFG

    5. Re:Who cares if its XML? by Tanktalus · · Score: 2, Insightful

      I have to agree with the GP post ... it's not the format, it's the documentation of the format that matters.

      Let's say that OOo were to disappear one day, replaced with another suite from somewhere else. If that new suite also documented its format, it would be simple, if not completely trivial, to write a convert program to convert from OOo to the new suite. Nothing here is fundamentally different just because OOo uses XML.

      The only difference between XML and other formats is that with XML you may not need to write a parser. But that's not an incredibly difficult piece to write once. (Writing a generic XML parser is a bit more difficult.) Even if both suites used XML, but used different schema because they look at data completely differently, the difficult part would be the semantic conversion (from layouts based on, say, paragraphs to pages or something).

    6. Re:Who cares if its XML? by Brandybuck · · Score: 3, Insightful

      which it actually does if you're adding images

      Nonsense! OpenOffice adds images as files to its zipped archive. They do not get embedded in the XML. Thus SVG, PNG, TIF, JPG, and all the other image formats are treated the same.

      Do this experiment. Create an OpenOffice.org document. Embed an image in the document. Save the document. Rename the sxw file to zip. Open the zip file using your favorite method. Notice that the image is a separate file and not a part of the content.xml file.

      --
      Don't blame me, I didn't vote for either of them!
    7. Re:Who cares if its XML? by arendjr · · Score: 2, Insightful

      Nonsense! OpenOffice adds images as files to its zipped archive. They do not get embedded in the XML. Thus SVG, PNG, TIF, JPG, and all the other image formats are treated the same.

      Sorry mate, your entire post is correct with the exception of the word "Nonsense!".

      While it's true the images themselves are saved as seperate files inside the zip archive, their properties (like size and alternative text) are stored inside the content.xml file as SVG properties.

      With your experiment, try opening content.xml and search for the svg:width and svg:height properties for instance.

    8. Re:Who cares if its XML? by frisket · · Score: 3, Insightful
      This can't be said enough: file formats are what determine whether and how easily data is portable, or whether the user is just stuck.
      ...
      The fact that the data format is documented (and the commitment to keep it so) is what's important.

      Amen. I blogged more open file formats for my wishlist just last week and I've just received abuse from the anti-XML faction ("too hard", "too fiddly", "just a fad"). OK, so I haven't exactly been polite about programmers who don't grok XML in the past, but believe me there is still a hard core of non-Microsofties out there who still want XML to die :-)

      The fact that the format is XML is rather meaningless [...] For many things XML is unsuitable/non-optimal...

      Yes, it could have been a number of formats (ODIF, anyone? :-) but XML was explicitly designed for (well, inherited its application to) textual information, so it's a little captious to say it's unsuitable for binary data, but the important long-term reason is not just that it's documented, it's that it's based on an international standard, so it's public, stable, and cannot be hijacked by corporate factions (they'll try).

      You should care that it's XML...

    9. Re:Who cares if its XML? by Decaff · · Score: 2, Insightful

      You can obfuscate XML content just as easily as you do with binary structures, while still having a perfectly valid XML document.

      So what?

      With XML you have to put work into obfuscating it, and you have the possibility of having a clear and reasonably self-documenting format.

      With binary formats its already obfuscated from the start. If I hand someone a binary file, were is the built-in indication of endian-ness, work length, data labelling or structure?

  3. Re:Righto Mate by PincheGab · · Score: 4, Insightful
  4. Stability by scrote-ma-hote · · Score: 4, Insightful

    I wish people would stop touting stability as a superiority of software products. I use OO and MS Office regularly, and both have crashed on me, or done very flaky things, such as refusing to save a file for some unknown reason. I'm a more than average user, but not some elitist who has configured my machine perfectly, and if I can't get things not to crash, then your average user isn't going to be able to either. They'll try the program, excited by it's superior crash record, it'll crash once, and then they'll get burned, blame the software and never try again. There's plenty of good reasons to use OSS software, but stability wise, it's no better, and note no worse, in my books than an MS product.

  5. Formatting Woes by Thats_Pipe · · Score: 2, Insightful

    Its funny how a free piece of software like OpenOffice.org can out-do Microsoft Office. Every format that Office produces can be read by OOo but anytime you try opening a non-Office-formatted document in Office, it freaks out and asks you to define the encoding. But it doesn't have a single encoding that will work, ever. Yes, regular text and even RTF can be opened by Office but the point is Office just can't handle anything that wasn't originally created by MS.

    --
    "You see them trees out back, I take care of them. I'm a tree, I'm a tree wizard." - Crazy Homeless Guy
    1. Re:Formatting Woes by figleaf · · Score: 2, Insightful

      You are right non Office products don't always write proper Office compatible documents.
      Thats why I just use MS Office.
      Atleast I am assured that everybody can read my documents.

  6. Too Bad OO Sucks So Bad by Crispin+Cowan · · Score: 5, Insightful
    I love the open document format concept. I think it is vitally important. I can't believe that enterprises and governments are willing to store critical archival documents in Microsoft Office format, and put them selves at risk of being unable to open these documents as little as 10 years hence.

    However I have tried hard to switch to OpenOffice. Even our business people have tried to use it. And the sad truth is that it just sucks. There is no way in hell that OpenOffice competes with Microsoft Office for usability. The PowerPoint clone is especially weak: in PP, common buttons like "make the font bigger" are prominently displayed, while in OO you have to hunt hard for the button in the customization menus, and even then it doesn't work right.

    This is not to say that OO is not a valuable asset. Clearly a lot of people have worked hard on it. But don't kid ourselves, this beast has a long way to go yet just to compete with MS Office 97, never mind 2003.

    Crispin

  7. The sad thing is... by beeglebug · · Score: 4, Insightful

    ... almost every file I save in Open Office gets saved as a .doc/.xls rather than an OOo format (I can't even think of the file extensions of the top of my head, thats how infrequently I use them). If the file I am saving has to be sent to anyone, or opened on a machine other than my own, I have to go with Microsoft compatability, even though it annoys me intensly.

    1. Re:The sad thing is... by bladesjester · · Score: 4, Insightful

      Because, for whatever reason, most people specifically ask for doc and xls files. They tend to get snippy when you send them pdfs.

      When dealing with buisnesses that you wish to continue dealing with in a positive manner (be it for commerce, looking for a job, or any other reason), you try not to do things to annoy them overmuch. Just shrug, show them what they want to see while you do what needs to be done in the background. Most of them will be happy as long as they get the results that they wanted and what *they* see is what they expected to (there are exceptions to this, but as a general rule it's not a bad guideline).

      --
      Everything I need to know I learned by killing smart people and eating their brains.
    2. Re:The sad thing is... by 99BottlesOfBeerInMyF · · Score: 2, Insightful

      Actually they don't. If someone is technically inclined enough to know what a doc and xls file is, they are 15 geek shame seconds away from downloading Acrobat.

      ...and then they have Acrobat for windows, which is a piece of garbage. Reading PDF files on windows is a painful experience for many. Acrobat reader is slow and clunky. You can scroll bitmaps faster.

      That said, I send only PDF files for security reasons. If your company does not require you to clean all outgoing word files, or convert them to PDF, well they are probably going to be burned by it eventually. They probably won't even figure out that is the problem.

  8. How to speed OpenOffice file-format adoption by CdBee · · Score: 4, Insightful

    Write a Firefox Extension that enables OpenOffice documents to be viewed in the browser, or edited if OOo is present on the system? (yes, this would be a lot of work)

    Suddenly you have an alternative to the traditional recipe of using .Doc files and the free MS Word Viewer to distribute written documents.

    --
    I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU
    1. Re:How to speed OpenOffice file-format adoption by Prof.+Pi · · Score: 2, Insightful
      Bottom line, if in doubt, HTML. If HTML won't work because the person posting it is too anal about formatting...

      Caller: "I'd like to ask some questions about the document you sent me. OK, in the second paragraph starting on page 4, which starts with "In case of a system problem..."

      You: "In my copy, that paragraph starts with "If you need to reformat the disk..." You need to set your font size to 10, and make sure you have 1-inch margins when you print. Oh, and be sure you use a variable-width font. Because I don't want to be anal about format!

    2. Re:How to speed OpenOffice file-format adoption by srleffler · · Score: 2, Insightful
      The web is a powerful platform for deployment of information precisely because there are a very limited number of standard formats for contents, and a single standard environment for viewing them. It pisses me off to no end when I see a PDF file without an HTML version alongside it. The last thing I want to do is deal with a whole different environment to view content---whether it's Acrobat or a viewer plug-in makes no difference. Ditto for Word, OOo, etc. (As I always say, "Repeat after me: 'HTML is for Viewing, PDF is for Printing'.")

      Unfortunately, in the real world people often want to both view and print documents. Anyone posting a static document online that is likely going to be printed by a large fraction of the people who view it, needs to consider PDF rather than HTML as an option.

    3. Re:How to speed OpenOffice file-format adoption by wannabgeek · · Score: 2, Insightful

      It pisses me off to no end when I see a PDF file without an HTML version alongside it. The last thing I want to do is deal with a whole different environment to view content---whether it's Acrobat or a viewer plug-in makes no difference.

      My learning for the day : HTML does not require _any software_ to view!!

      --
      I'm much more funny, interesting and insightful than the moderators think
  9. Who cares if its XML?-XML Grouch. by Anonymous Coward · · Score: 2, Insightful

    "The fact that the format is XML is rather meaningless... "

    To those who don't understand XML, but that's OK. We love you in spite of your faults.

  10. 50 years from now by mslinux · · Score: 4, Insightful

    Open, well-documented formats will allow governments and businesses to access documents/info many years from now. It's unfortunate that most IT managers don't realize how closed formats will hinder them in the future.

    1. Re:50 years from now by Tim+C · · Score: 2, Insightful

      I really don't think many people (let alone most managers) think or care about how accessible their data will be in 50 years time.

      I agree with you, but in 50 years time, I'll be retired or dead. Most people simply don't think about things like that in the time frame of "many years from now".

  11. Open document formats vs accepted document formats by staeiou · · Score: 2, Insightful

    One of the largest problems I have had with coworkers/friends/family when they switch to OO.o is the document format. Sure, it works great on their own computer, and even takes up less space. However, I was phoned at one o'clock in the morning from a Kinko's because someone had to print up a report and the computers there didn't have OO.o.

    The problem (IMO) with OO.o is that it saves the documents in its own format by default. Sure, you can select to save it to any number of formats, but most people just type it a name and check "OK." This leads to many, many problems when it comes time to interact with other computers.

    Some might say that having the .sxw format be the default will help OO.o get into the mainstream. However, this is faulty logic. The person I talked about above ended switching back to MS Office because she just wanted things to work all the time. Even though she had no previous problems with OO.o, and I explained to her that you _could_ save in .doc format, she switched anyway. Her words: "I just can't stand being stranded."

    I think that the open source community should really take those words to heart. If OS wants to grow, developers are going to have to step away from their niche market of people who really care about software being free and all that jazz. People just want things to work.

  12. Not to be negative but...Looke here. by Anonymous Coward · · Score: 5, Insightful

    There's SVG support. It's just not particularly good.

    http://graphics.openoffice.org/svg/svg.htm

    However someone is working on it, and there's enough documentation out there, you can too.

  13. "...nothing more than...:" by aquarian · · Score: 3, Insightful

    "XML is nothing more than a human-readable data file format..."

    I'd say that's a pretty good reason right there, especially compared to a non-human-readable one (MS).

  14. Re:file size by pseudochaotic · · Score: 2, Insightful

    But that doesn't really matter, does it? It takes up less space, for the same amount of user effort, which is really the only important metric in office apps.

    --
    And the l33t shall inherit the 34r7h.
  15. Re:[OT] devolution of MS Office by aldoman · · Score: 2, Insightful

    People tend not to 'upgrade', usually every 3 years when the computers are replaced, people get the latest Windows and Office on it. Which happens to be WinXP and Office2k3.

    I have to say the most impressive thing about Office is VBA. It works in all Office apps and is very very simple yet exceedingly powerful. Any replacement needs perfect VBA understanding.

  16. Is this newsworthy? by Beetle+B. · · Score: 2, Insightful

    I mean, really. The article is very terse, and says nothing that hasn't been beaten to death on Slashdot every month or so.

    Heck, if the article had even been somewhat comprehensive, I wouldn't have minded. But it appears to me that this article was approved simply to get Open Office more exposure (with nothing new promised).

    --
    Beetle B.
  17. Re:Righto Mate by marcello_dl · · Score: 2, Insightful

    Ok, so the great innovators at Microsoft patented using XML to store a word processing document.

    If you are going to take into account all things that have been patented you can well stop developing software altogether (I found your comment informative, anyway, sorry if I sounded offensive).

    --
    ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
  18. Integration is the holy grail by pfunkmallone · · Score: 3, Insightful

    Whether or not a file format is closed or open, isn't what's going to drive users preferences. Users generally don't care.

    The place where the open oo format can rule, is by integrating its use with other open software. Things like, an Apache server that can *create* the document format based on data it holds. By writing php scripts that can output their data directly into spreadsheets that contain formulas etc. Imagine a web application that allows the user to modify the spreadsheet online, without having to download/upload the whole thing. Think collaboration. This is where MS is trying to get too.

    The power lies in finding the advantage of documented file formats. But, the first step is creating and documenting them. We just don't have that *killer* app yet.

  19. Re:Might other word processors adopt the format?? by JimDabell · · Score: 2, Insightful

    I wonder how feasible it would be for other word processors, such as AbiWord, to use this format natively. Or, at least appear to use the format natively.

    The OpenOffice format is being standardised by OASIS and the KOffice developers have decided to use it as the native format in future.

  20. Batch conversion tools are desperately needed by tweedlebait · · Score: 2, Insightful

    Even if they required MS office on the conversion machine (for mass conversions). Yes, even OOo doesn't handle everything perfectly and has to deal with a moving target.

    Part of the problem in migration (last I checked) was no nice and reliable way to massivly convert the piles of ms office files to OOo. If users would find a DOC file they'd just go hunting for a machine with word on it. They would also freak out dealing with .DOC email attachments, despite good efforts to educate.

    If the users only saw properly rendered OOo files, this problem of adoption would disappear.

    Ideally I'd love to see something that would search a whole network for ms office docs and convert them, archive the ms office files as originals and only leave OOo files 'easily' accessable. I'd write one but my skills in this type of thing are too rusty at the moment.

    --
    Firefox & /. ? Use this often:
  21. Re:The persistance of Monopolies. by westlake · · Score: 2, Insightful
    Not as faulty as letting a convicted monopoly persist.

    In the real world civil suits usually end in settlements that leave both parties more or less where they began. There is compensation for damages, but life goes on.

    It is a waste of time to dwell upon an argument that fundamentally leads nowhere.

  22. Abuse of XML by tepples · · Score: 2, Insightful

    There is a difference between the letter of the law and the spirit of the law, between implementing "buzzword compliance" and actually encoding the structure of an object in XML. I can see how publishers of proprietary programs could abuse the letter of the W3C Recommendations by having their programs shove a base64 encoded binary in an undocumented format into an XML element and then trying to sell their programs using a misleading claim that the result benefits from being XML. Should that practice become commonplace, W3C will probably issue a release that strongly deprecates that practice, if it hasn't already.

  23. Re:file size by Pyroja · · Score: 2, Insightful

    I don't understand why people feel like pointing this out. Sure, yes, it's a zipped file. It's compressed but.. It's still the format. It's akin to someone shooting down FLAC, even though it's about half the size of WAV, just because, well, it's compressed. Open Office.org's format results in smaller files than Word's. And there you have it.

    Or heck, maybe I'm totally off, in which case feel free to alert me to that fact. However, that's how it seems to me.

    --
    [Trojan.]
  24. will this future proof doc formats? by jedi63 · · Score: 2, Insightful
    As time passes, older proprietary formats get used less and less, until the s/w they need to open these documents is no longer available or the s/w is progressed into supporting newer formats leaving the older formats unuseable without conversion.

    This sounds like it may be important for historical and archival uses, too, where you want to keep your older documents over time and not have to worry about them becoming useless bits.