Dark Corners of the OpenXML Standard
Standard Disclaimer writes "Most here on Slashdot know that Microsoft released its OpenXML specification to counter ODF and to help preserve its market position, but most people probably aren't aware of all the interesting legacy code the OpenXML specification has brought to light. This article by Rob Weir details many of the crazy legacy features in the dark corners of OpenXML. As it concludes after analyzing specification requirements like suppressTopSpacingWP, 'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"
ODF spec page count: 722.
OpenXML spec page count: 6000 !!
No. ODF has several real, factual, benefits. It might have been originated in a single product but... it reuses existing standard technologies (SVG, CSS...). It has properly designed XML tags that act as "markup", in OpenDocument xml tags act as container for chunks of data. ODF tries to separate content from style.
And about your RTF suggestion... can I draw diagrams with RTF? Can I have a ToC? Can I do complex styling? Can I have a "galery" of styles? Can I include images? No. RTF is not a solution.
The behavior of years-old proprietary word processing software is included by reference into OOXML. How is any spec that includes by reference the behavior of proprietary software exactly "open"? True, implementors could produce a partial implementation of the spec that degrades away the legacy baggage (more or less) gracefully, but some standards' patent licensors forbid implementors to publish a partial implementation. I don't know if this applies to OOXML's license.
Once it is ratified as an ISO Standard, the standard is locked up and anyone that does want to a copy has to buy it from ISO. These are copyrighted. They're not cheap; thousands of dollars. Out of the reach of the average hobbyist, and not listed anywhere on the Internet. That 6,000 page draft will vanish into the mists of time.
Larger Companies can afford this, but garage companies and hobbyists definitely can't. So what's the chance of an open source or even small upstart challenging Microsoft's Documentonopoly? Zero.
Want another example? ISO country codes. The country codes (e.g. .us, .jp) are actually ISO, and ISO ended up backing off on a demand for royalties for this(!) But if you want state codes (e.g. California, Kantou), well, forget it unless you want to buy them off ISO. http://www.alvestrand.no/pipermail/ietf-languages/ 2003-September/001472.html
ISO aren't the only ones guility of doing this. IEEE do it as well. Want the latest simulation standard? Then get out your checkbook: http://standards.ieee.org/catalog/olis/compsim.htm l
ISO and the IEEE are enemies of openness. Microsoft is taking a page out of their gamebook.
ISO or IEEE certification is a *bad* thing.
The hitch here is that *not* having them means tons and tons of reverse engineering, and that's only after tracking down every release of every version of every MS Office ever.
The real hitch, as the article hints, is that the releases are contradictory. For instance, the Mac version of small caps is different from others. This is part of the reason Word is so bloated and does not preserve printing type setting from one machine to the next.
Ten years ago, a state agency I was working for was forced to move from Word Perfect to Word. Hundreds, if not thousands, of documents were painstakingly converted from one format to the other. The typesetting, which they had never had a problem with previously, was easily broken by moves from one machine to the other or by changing printers. That is the kind of thing that no program can account for - it was broken from then and can not be created correctly today. It's also probably the reason for all of the nebulous "guidance" sections that don't tell you anything other than to look at, and presumably measure, old printed examples. Not even M$ knows what it was really doing in the field. As I saw at the time, no two were alike.
Of course, the time to get things right is not in your XML it's when you import the document. The author tells us this in so many words. The XML should be general enough to encompass any kind of typesetting. It is the importing program's task to figure out what the old format wanted things to look like. As the author points out, the spec does not do anything other create something impossible to follow. It's not going to magically make things look right no matter how hard they wish it would.
DMCA, Hollings, Palladium. What might have sounded like paranoia is now common sense.