OpenDocument Voted In By ISO
cduffy writes "OpenDocument has been voted in as ISO/IEC 26300, with no dissenting votes and a small number of abstentions. There are still several formalities to take place before final issuance. Now the question: Will OpenXML get the same treatment, despite its technical weaknesses? There's also coverage on Groklaw."
Although ODF is a bit nicer standard from a human point of view, and builds on existing standards, I hope OpenXML isn't accepted simply because having two standards doing the exact same thing is nonsense. They're much more similar than they are different at many levels.
ECMA are welcome to OpenXML, I don't think ISO should accept it.
"Elmo knows where you live!" - The Simpsons
If you look at the history of standards, such as done at NIST, usually people try to choose the best thing, but it is hard to forsee what is the best. A good example are the standards associated with how to quantify vibrations in static structures, such as bridges. Looked good in 1948, turned out bad (Tacoma bridge).
Well, I would agree with you about XSLT - but that's an XML technology, you realise? XSLT is actually one of the handy tools which you have access to. As an example, I was able to convert a large number of documents from HTML to OpenDocument using XSLT, and I would have had to write my own parsers etc. if the files on both sides weren't XML.
XML is handy because there's a lot of wheel reinvention that you just don't need to do. Also, it's not just a way of structuring data - comparison to JSON or YAML isn't really well-founded, they're not feature equivalent.
"Elmo knows where you live!" - The Simpsons
As far as I know ISO only has standard organisations as members, which represent a country (ANSI for the United States). As I remember Microsoft took place in a workgroup, which only makes minor edits (IANASG). See http://www.iso.org/iso/en/aboutiso/isomembers/Memb erCountryList.MemberCountryList
Interoperability.
I agree there's much overhead having to translate between text and binary data, but the point is that XML isn't used for exclusively processing. It's for INFORMATION INTERCHANGE.
OpenDocument is an xml format, but it's an OPEN format, completely documented and with no loose ends. Furthermore, it's very similar to HTML, so the algorithms to process it are similar, too.
On the other hand, Microsoft's "Open"XML... eew.
The parties involved I believe will be in the knowledge that this standard ie free for all to implement. Kudos to ODF.
That is perhaps the biggest mistake developers make when they design their XML schema (or DTD), and leads to ...
I hate XML. It's not easy for humans to read as a wire protocol.
If you keep the things that are supposed to be human readable as the text within nodes, and move the rest (formatting instructions etc) into attributes, your XML will be much more readable after some simple processing to remove the nodes. Using attributes for all those small name-value pairs that XML documents are full of also reduces the size and makes parsing more efficient.
I hate XML. We should be using something like JSON or YAML.
JSON and YAML are more focused formats intended for lightweight transmissions and compatibility with existing computer languages, and tend to complement XML rather than supplant it.
XML is designed as a "catch-all" format that is capable of storing any form of data. That makes it extremely powerful, yet sometimes quite unweildly.
Each format has its tradeoffs, and as a result it is hard to say that one is "better" than the other. For example, XML's verbosity allows for parsing errors to be much more easily identified and repaired while simultaneously preventing accidental errors from going unnoticed. In YAML and JSON it is much easier to place unintended characters or data structures without the parser noticing. Neither one (to my knowledge) has the ability to check the structure of the transmission like XML DTDs and Schemas do.
However, DOM and XSLT are both awesome ideas - especially for parsing documents.
You've just given two reasons for the existence of XML. Both concepts are extensions of the XML concept, and are not necessarly applicable to other data-exchange formats. (At least not without massive changes.)
XML was designed with the DOM in mind so that any type of flat or heirarchical data could easily be loaded and stored programatically. This cuts down on the number of programs that attempt to construct an interchange document manaually. This rigid structure thus makes way for the programatic transformation of such documents, ala XSLT.
Javascript + Nintendo DSi = DSiCade
Actually if you think in terms of who's really interested in processing, archiving, and dissemenating large volumes of text to people all over the world, it's not hard to imagine that religious organizations would be at the top of the list. They have huge archives, and probably desire both interoperability and stability (no "format of the week" syndrome).
It's honestly tough to find many organizations that really are thinking past the next quarter or fiscal year; in most industries people are buying software and hardware for the here-and-now. If that document isn't accessible in 15 years, who cares? Outside of their mandated recordkeeping obligations (Sarbannes-Oxley, etc.) a lot of large commercial organizations probably wouldn't care if their documents were written with magic disappearing ink that rendered them unreadable in a few years or a decade. (To be fair, the majority of commercial text is probably nothing that you'd want to read in a decade -- memos, meeting minutes, reams of emails; most of it probably makes little sense outside its original context anyway.)
I think this attitude is shortsighted, but it's pervasive. Nobody wants to think about long-term storage, nobody wants to think about accessibility 10 or 20 or 100 years from now, except libraries, governments, and religious institutions. (And perhaps some of the very largest and longest-lived corporations.) So it makes sense that if you're designing a data format that you want to be around for a while, you'd want to bring on board the people who have the most interest in making it successful.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
If Microsoft implements OpenDocument (or anything like it) in Office 2007 it will make a lot of people very happy.
A blank Word document takes up eleven kilobytes, and a one page document takes up about forty. If this becomes the de facto standard for documents rather than the Word document format, then document file sizes will shrink significantly, and a lot of bandwidth and disk space on office networks will be saved as a result.
Experts from Boeing, bring them on.
Experts from the Society of Biblical Literature?? What have they got to do with a computer data formatting standard??
Isn't it obvious? Literary organizations have massive numbers of documents that need to be digitized and archived in perpetuity. As a result, they have a vested interest in using standardized formats that will be guaranteed to meet their needs for years to come. The Society of Biblical Literature is no different in these respects, especially as more and more fragments of apocryiphal and gnostic texts continue to be found.
Javascript + Nintendo DSi = DSiCade
"The movement toward OpenDocument in the free world, warms the open cockles of my heart. (Emphasis mine)
I sure hope the chambers of your heart aren't open, you might want to visit the doctor if so.
But if the cockles you're referring to are the bivalve mollusc kind, they are always open -- cockles don't shut. However, they are hermaphroditic and they can jump. Which still presents a problem for your cardiac health.
Seriously, though, formal recognition of this standard removes one of the obstacles to widespread implementation of non-MS office software. The bigger hurdle, of course, is retraining & support expenses (for businesses) and factory (or pre-purchase, anyway) installation of the software (for home users).
This doesn't change the fact that MS formats are the de facto standards in use, but it may help unify the communities that use non-MS formats, leading to a larger install base.
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
I just hope that OpenDocument gets its formula standards in order. I've read in a few places that there is very little documentation in the standard proper about how formulas (for spreadsheets) should be stored and used, which could in time cause some compatibility problems. That being said, I'm glad that it was approved by the ISO... maybe in a few years I'll not have to worry about converting from one office format to another ad absurdum.
Well, for one, ODF is an ISO standard and is implemented in a bunch of different programs now, and OpenXML isn't. Lack of interoperability is a pretty crippling technical weakness, since we're talking about a document format.
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
So did ODF folks finally decide how to store formulas? Currently every single spreadsheet that supports ODF (not that there are many) stores those as they wish with no defined standard.
> Experts from the Society of Biblical Literature?? Wtf?? What the hell
> have they got to do with a computer data formatting standard??
Oh I dunno. ODF had as design goals support for longterm document storage and seamless internationalization support. I suspect the Society of Biblical Literature has an interest in both. Unless you are so ignorant that you believe Moses and Jesus spoke the English of King James that is. You probably wouldn't believe just how many languages and scripts the original texts are written in. If ODF can deal with all of those it shouldn't have a problem with any of the modern encodings.
And if you know of anyone with older documents, and likely to still be using them a thousand years hence, speak up.
Democrat delenda est
MathML is the worst way to store formulas ever. Anything that takes 5k of text to specify int(from 0 to infinity, exp(-lambda*x**2)dx) correctly is simply stupid. It means hand coding mathML just isn't a viable option for more than a couple very simple equations. We should agree on something similar to a C, Fortran, Matlab, or other programming language notation as the standard way to store equations in the file. The added benefit of potentially being able actually execute at least some of the functions is just icing on the cake.
On a related, but somewhat less relevant note is that I can't find any inexpensive programs that allow the generation of mathML easily. There are a few out there that generate mathML at all, but they seem to concentrate on the typesetting aspect of mathML* and on having an obtuse interface. Why isn't there a easy-to-find, cheap or free (beer or speech), mathML editor that is as easy to use as the equation interface in LyX? (and yes i've tried export-html options in LyX, and attempted to manually convert with commandline utilities but my latex2html functions all seem to be completely braindead.)
*iirc, there is a way to use mathML to store calculable functions, but I have yet to see this implemented, and it takes even MORE text to store the equations.
I think the lack of available editors, and tex converters, especially considering the potential academic utility of mathML is pretty good evidence that it is a poor standard: it hasn't generated enough interest for someone to scratch the itch and write a decent converter/generator/editor.
Can you be Even More Awesome?!
OpenXML is patent ridden and in a way that is problematic at best, compared to OpenDocument. ODF is also patent ridden, but unlike MS' offering, the patents have free licensing for conformant implementations and conformant means to the official stated spec, with the possibility of extensions becoming part thereof- unlike MS' offering which requires you to meet MS' shifting definition of what is/isn't compliant (i.e. it's not explicitly stated...) and you don't get to add improvements unless MS embraces and extends them themselves (i.e. if you've got extensions and MS doesn't approve of them, you're NOT at all compliant and can be sued for patent infringement...).
Technically, they're the same. This is the reason why people can't understand why MS is insistent on NOT supporting ODF as a format and trying to push OpenXML- unless they've got some ulterior motive. Now, they've little valid excuse for it.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
When comparisons between formats remark upon mixed content models compared to non-mixed asking "which would you rather transform" expecting the answer of "mixed" you know a lot of people throwing opinions around on this issue have never actually worked transforming XML.
If you're wanting a human readable document format you have XHTML. Use it and enjoy. If you're producing an interchange format for word processing applications I'll take unambiguous and explicit over ambiguous and implicit even if that is at the expense of human readability.
The MS model uses a manifest to resolve link references, the ODF uses absolute references... this is criticised by Groklaw on the basis of human readability. Not maintainablity, application use, refactoring or normalisation of data.
There are valid problems that can be cited for both formats (I wish for instance MS had stuck with XLink), but this is quickly resolving into another round of MS bad, anything else good. It's emotive and is in most cases prejudged before technical merits are weighed.
I guess I just resent being asked whether I'd prefer to transform a mixed content model by somebody I know has never done so.
The Groklaw article points out a few of them. The most damning is that Microsoft has chosen to ignore existing, reusable standards like XLink, SVG, Dublin Core, etc. for their own proprietary tags. These standards were expressly produced because they represent reusable patterns that many document formats need but which shouldn't be respecified by each of them. The upshot is that parsing OpenXML will be a massive pain in the butt because none of your existing scripts / tools / editors etc. that may have built-in knowledge of existing standards will not work with OpenXML.
I believe that some information that will help explain this is to be found here. It's best to read that article for yourself, but I'll provide a little abstraction of some of the details myself, although this isn't really my area of expertise:
The main point revolves around the fact that MS' OpenXML uses a non-mixed content model, while OpenDocument uses a mixed content model. This means that OpenDocument can have tags interspersed with regular text, or tags within text delimited by other tags, etc. However, OpenXML cannot do this: all text must reside within a tag, and only text or tags can reside within other tags. The article gives a textual example of this. To the computer, the MS one is probably closer to the internal representation of the data: object-oriented programmers will probably recognise the structure as an object encoding its member variables. However, it pretty much removes the benefit of using XML in the first place: source readability. If you look at HTML, it's fairly easy to change a couple words around, and make a few italic, or bold. But in the OpenXML format, that becomes a more laborious task.
The article goes on to make arguments which back up the basic premise given here. You can also see from the examples how the tags differ in type. They give examples in OpenXML, ODF, and XHTML. Just looking at the tags in the OpenXML source doesn't give you any real idea what they're doing-- I mean, what does <w:rPr> mean? However, the tags used in ODF are longer and easier to read and understand for a human.
Of course, you could say that human-readability isn't an issue, and that's a fairly valid argument. However, if human-readability isn't an issue, why use XML? Why not do what Office was doing before, and writing memory out to disk, or basically serializing the document object tree in binary? It'll be smaller and easier for the computer. The whole point of using XML is to make the data easily understandable to humans, to the point where we can make numerous (albeit potentially quite small) changes without needing a program to interpret the data for us. Or where it's possible for us to write an app that understands the data, which pretty much requires that we personally understand it. As it stands, just about any XML data format is quite self-explanatory in itself, which is why we have XML.
Maybe that doesn't answer everyone's questions, but I hope it proves at least a decent starting point.
-Q
On the contrary, KOffice will run on Windows very soon. Kdelibs are being ported to Qt4 as we speak, and almost runs under Windows already. The same is happening with KOffice, and I think we will see a proof of concept of KOffice running on Windows before summer this year.
DUH of course, it's binary vs XML file format!
sw5YRhw4ln3pr7$Ock1/4ma0u8Lw2Tm5l6/7DOiC5e6t4NSb6