OpenDocument Voted In By ISO
cduffy writes "OpenDocument has been voted in as ISO/IEC 26300, with no dissenting votes and a small number of abstentions. There are still several formalities to take place before final issuance. Now the question: Will OpenXML get the same treatment, despite its technical weaknesses? There's also coverage on Groklaw."
If you look at the history of standards, such as done at NIST, usually people try to choose the best thing, but it is hard to forsee what is the best. A good example are the standards associated with how to quantify vibrations in static structures, such as bridges. Looked good in 1948, turned out bad (Tacoma bridge).
That is perhaps the biggest mistake developers make when they design their XML schema (or DTD), and leads to ...
I hate XML. It's not easy for humans to read as a wire protocol.
If you keep the things that are supposed to be human readable as the text within nodes, and move the rest (formatting instructions etc) into attributes, your XML will be much more readable after some simple processing to remove the nodes. Using attributes for all those small name-value pairs that XML documents are full of also reduces the size and makes parsing more efficient.
I hate XML. We should be using something like JSON or YAML.
JSON and YAML are more focused formats intended for lightweight transmissions and compatibility with existing computer languages, and tend to complement XML rather than supplant it.
XML is designed as a "catch-all" format that is capable of storing any form of data. That makes it extremely powerful, yet sometimes quite unweildly.
Each format has its tradeoffs, and as a result it is hard to say that one is "better" than the other. For example, XML's verbosity allows for parsing errors to be much more easily identified and repaired while simultaneously preventing accidental errors from going unnoticed. In YAML and JSON it is much easier to place unintended characters or data structures without the parser noticing. Neither one (to my knowledge) has the ability to check the structure of the transmission like XML DTDs and Schemas do.
However, DOM and XSLT are both awesome ideas - especially for parsing documents.
You've just given two reasons for the existence of XML. Both concepts are extensions of the XML concept, and are not necessarly applicable to other data-exchange formats. (At least not without massive changes.)
XML was designed with the DOM in mind so that any type of flat or heirarchical data could easily be loaded and stored programatically. This cuts down on the number of programs that attempt to construct an interchange document manaually. This rigid structure thus makes way for the programatic transformation of such documents, ala XSLT.
Javascript + Nintendo DSi = DSiCade
If Microsoft implements OpenDocument (or anything like it) in Office 2007 it will make a lot of people very happy.
A blank Word document takes up eleven kilobytes, and a one page document takes up about forty. If this becomes the de facto standard for documents rather than the Word document format, then document file sizes will shrink significantly, and a lot of bandwidth and disk space on office networks will be saved as a result.
Nicer from a human point of view means less bugs down the line. I just spent a week trying to get an .wsdl to parse through Axis AND .NET's wsdl.exe. Any format that is less opaque, less verbose and more understandable gets my vote.
I just hope that OpenDocument gets its formula standards in order. I've read in a few places that there is very little documentation in the standard proper about how formulas (for spreadsheets) should be stored and used, which could in time cause some compatibility problems. That being said, I'm glad that it was approved by the ISO... maybe in a few years I'll not have to worry about converting from one office format to another ad absurdum.
So did ODF folks finally decide how to store formulas? Currently every single spreadsheet that supports ODF (not that there are many) stores those as they wish with no defined standard.
I believe that some information that will help explain this is to be found here. It's best to read that article for yourself, but I'll provide a little abstraction of some of the details myself, although this isn't really my area of expertise:
The main point revolves around the fact that MS' OpenXML uses a non-mixed content model, while OpenDocument uses a mixed content model. This means that OpenDocument can have tags interspersed with regular text, or tags within text delimited by other tags, etc. However, OpenXML cannot do this: all text must reside within a tag, and only text or tags can reside within other tags. The article gives a textual example of this. To the computer, the MS one is probably closer to the internal representation of the data: object-oriented programmers will probably recognise the structure as an object encoding its member variables. However, it pretty much removes the benefit of using XML in the first place: source readability. If you look at HTML, it's fairly easy to change a couple words around, and make a few italic, or bold. But in the OpenXML format, that becomes a more laborious task.
The article goes on to make arguments which back up the basic premise given here. You can also see from the examples how the tags differ in type. They give examples in OpenXML, ODF, and XHTML. Just looking at the tags in the OpenXML source doesn't give you any real idea what they're doing-- I mean, what does <w:rPr> mean? However, the tags used in ODF are longer and easier to read and understand for a human.
Of course, you could say that human-readability isn't an issue, and that's a fairly valid argument. However, if human-readability isn't an issue, why use XML? Why not do what Office was doing before, and writing memory out to disk, or basically serializing the document object tree in binary? It'll be smaller and easier for the computer. The whole point of using XML is to make the data easily understandable to humans, to the point where we can make numerous (albeit potentially quite small) changes without needing a program to interpret the data for us. Or where it's possible for us to write an app that understands the data, which pretty much requires that we personally understand it. As it stands, just about any XML data format is quite self-explanatory in itself, which is why we have XML.
Maybe that doesn't answer everyone's questions, but I hope it proves at least a decent starting point.
-Q
Can you be Even More Awesome?!