Slashdot Mirror


Dark Corners of the OpenXML Standard

Standard Disclaimer writes "Most here on Slashdot know that Microsoft released its OpenXML specification to counter ODF and to help preserve its market position, but most people probably aren't aware of all the interesting legacy code the OpenXML specification has brought to light. This article by Rob Weir details many of the crazy legacy features in the dark corners of OpenXML. As it concludes after analyzing specification requirements like suppressTopSpacingWP, 'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"

9 of 250 comments (clear)

  1. The power of legacy systems... by Anonymous Coward · · Score: 5, Insightful

    The power of legacy systems is at once both Microsoft's greatest strength and greatest weakness. Nobody in OSS is going to have the patience to rebuild the same level of backwards compatibility needed to displace them but the code must be an absolute tarpit of accumulated cruft and security holes that's incredibly difficult for them to keep going.

  2. Basically by DrYak · · Score: 5, Insightful

    ODF is the former SXW format that was taken and transformed into a standard by a committee comprising several Office software makers. It's suppose to describe the normal features that anyone should expect from any Word processing application, be it OpenOffice.org, KWord, AbiWord, Corel Word Perfect, etc. all this in a perfectly neutral way. It was designed with a function in mind (storing word processing documents in an open and interoperable way). Its benefits are comparable to the standardisation of HTML.

    OpenXML is Microsoft trying to translate its proprietary DOC file inside a XML container (because it's a big buzzword) and propose it as a standart to ECMA (because everyone is speaking about ODF being an ISO standard). It describes not only what is to be expected from a word processor, but also all MS-Word specific microsoftism. It was designed with a specific software in mind (and partly derives from the internal functionning of MS-Word). It's only a small improvement over the previous MS XML format (which had a lot of informations hidden in a binary blob).

    The good thing for Microsoft, is that they can pretend this limitation is "Not-a-bug-but-a-feature", and brag around that there are a lot of stuffs that MS-Word couldn't store inside an ODF and only OpenXML can carry.

    Microsoft's plan :
    1. Embrace
    2. Extend <- They are here
    3. Extinguish

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  3. Re:MIcrosoft sucks. by Aadain2001 · · Score: 4, Insightful

    But they broke plenty of laws to keep their monopoly :) And while their actions during their rise to the top may not have been illegal, they could easily be called 'strong-armed'.

    --
    Space for rent, inquire within
  4. Re:The author is exactly right. by _|()|\| · · Score: 4, Insightful

    Prior to reading this article, I was ambivalent about Office XML. The push to standardize Office's "DNA sequence" seemed disingenuous, but at least the format was described in detail. Now I see that the table-sagging 6,000 pages is just the tip of the iceberg: this "standard" effectively includes, by reference, the source code for every prior version of Office, to which only Microsoft has access.

  5. Re:MIcrosoft sucks. by TrekkieGod · · Score: 5, Insightful

    If you get to the point where you build up a company that can even consider garnering the term "monopoly", then get back to us...At that point, maybe, just maybe, you may come to thinking that you you earned what you got, and the government has no right to tell you how to run your business...

    Yeah. Because the person best suited to decide what a company should or should not be allowed to do are the people who own the company. Of course you're going to want to be completely unrestricted to mow down your competitors using whatever advantages you have if you are in a position to do so. What you're missing is that no one should be allowed to use unfair practices to do it. Some people think we should idolize the free market as some sort of religion. We don't like free market economy because it was given to us by the gods. We like it because it tends to result in better products and lower prices. That ceases to be true when you have a monopoly in the mix.

    That being said, I'm not really informed about any Microsoft specifics, so I'm not going to argue in favor or against any "federal laws" as it applies to them (or failed to apply to them). However, suggesting that only people who have built a company that holds a monopoly should be able to decide what is fair regulation isn't rational. It may even be that the current federal laws regarding monopolies may be unfair and in need of reform, but the fact remains that the existence of a set of laws to regulate businesses is necessary.

    --

    Warning: Opinions known to be heavily biased.

  6. Re:Unfair by DavidTC · · Score: 4, Insightful

    Tools that don't care about legacy support are unaffected by this; they can just pick the closest modern option to whatever the legacy flag calls for on input, and not output documents that use them.

    And thus tools, legally, are not OOXML, and won't qualify for purchasing by companies that specify OOXML. Which is the entire point.

    There's a difference between 'We need to make sure that old documents can be converted correctly.', and 'We will literally convert old documents into a new representation that contains all their weirdness, and we won't explain how to implement said weirdness in the standard.'.

    What Microsoft has produced is not even a standard. Standards must specify everything, or reference other standards that specify everything. They can't reference applications.

    If Microsoft wants to keep secret how to turn Office 95 documents into OOXML, fine. Producing a standard doesn't mean you have to explain how to convert things into that standard.

    It does, however, mean you have to explain exactly what should happen if mwSmallCaps is true, to the pixel. You can't just pawn it off on the unexplained hypothetical behavior of some other application.

    --
    If corporations are people, aren't stockholders guilty of slavery?
  7. Re:I can't see this being too big of a problem by Askmum · · Score: 5, Insightful

    You seem to be missing the point.

    You do not need these features to begin with in a new format that is inherently incompatible with an old format. You don't want to say "now I'm going to do WP style linespacing and my linespacing is 1".
    If you want to convert a WP document to an XML document, the conversion program should know that the linespacing in WP is 0.9 times the linspacing in XML document (or what it really may be)and will then use linespacing=0.9 in the XML document. This is not a task of the new wordprocessor or its specification.

    By adding this so-called "backward compatibility" to your specification, you make the spec overly difficult and in fact you make the conversion program in the new application when this is absolutely not necessary.
    And on top of that, you require that the programmer who uses this spec should have knowledge of all these old versions and is able to program them without error. And as the application will grow because of these unnecessary features, the number of bugs will also rise. So this is not a blueprint for a good application, this is a blueprint for a very buggy implementation of a wordprocessor.

  8. Documents outlive applications by Geof · · Score: 4, Insightful

    There's nothing wrong with saving in a file format that matches your internal representation, in fact, it's a darn good idea (see .ABW for AbiWord, .DOC for Word, .WPD for WordPerfect I would also wager is the same idea).

    Documents are worth far more than software, and they outlive the applications used to create them. See the comment to the original article - reading documents after 5, 20, 30, 100 years or more is not optional. You can pay the price of developing an independent format now, or you can pay the price of reverse engineering over and over again every time you change your internal representation.

    Repeated implementation limits future change and innovation. It's expensive: it likely costs more even for Microsoft. But they can afford it; their competitors may not be able to. Plus, Microsoft already has their first implementation.

    interoperability seems to work best when taken from the ground up - when working with another application's data structure of any complexity, you simply can't do a lossless roundtrip without losing before you've started.

    Perhaps so. But compare that cost to the cost I've just outlined. It is in the best interest of users and software developers (maybe even of Microsoft) to bite the bullet now, do the conversion once, and develop a clean format for the future.

    Maybe you have in mind an argument you're not making, but I don't see any sufficient basis for your broad contention that using a file format based on an internal representation is a "darn good idea". In specific cases, yes (e.g. where the cost of development time or effort are the most important factors). In general, I very much doubt it. That successful applications in the past have taken that approach is weak evidence. They were developed when the up-front cost of development in a time of rapid innovation, the loss of customer lock-in, and a lack of open-format competition where good business reasons for making such a choice - even if it was inferior technically, increased cost in the long term, and was bad for consumers. In today's climate of slower innovation, competition from open formats, and customers who are running into their own long-term interests, the situation is different.

    Which is not to say Microsoft's apparent attempt to set the rules of the game and throw sand in the gears of change is not in their interests, or that it will be unsuccessful.

  9. Re:MIcrosoft sucks. by hachete · · Score: 4, Insightful

    Yes, they got into trouble for bundling but it misses the point every time. The secret sauce that Microsoft uses is to strong-arm the OEMs into bundling windows with PCs, espeicially for consumers. I'm also thinking that the Windows Tax is levied even if you buy Linux on a Dell. This is the lynch-pin of Microsoft domination, without it all their other strategies whither on the vine. Without bundling of windows with new pcs, the bundling of IE (and all the other sofware), the resistance against inter-operability, the mysterious file formats etc wither on the vine. I've been disappointed that *none of the investigations I've read about have gone after the OEM-Microsoft link. Break that, and you'll have a free-market again.

    I think the Office XML format style is a play straight out of IBM's hand-book: make the standard complex and incomprehensible, and the little players - that's you - will find it hard to compete. In a way, that's a good sign: Microsoft is now lumbering into middle-age, hoist on their own evermore complex petard.

    The other thing about middle-age is that every little technological step away from their established base-line is treated as a revolution. In reality, it's no such thing, just a small stepping stone to shouting "pesky kids. Get off my lawn." Or maybe they've reached that stage already.

    --
    Patriotism is a virtue of the vicious