Slashdot Mirror


Dark Corners of the OpenXML Standard

Standard Disclaimer writes "Most here on Slashdot know that Microsoft released its OpenXML specification to counter ODF and to help preserve its market position, but most people probably aren't aware of all the interesting legacy code the OpenXML specification has brought to light. This article by Rob Weir details many of the crazy legacy features in the dark corners of OpenXML. As it concludes after analyzing specification requirements like suppressTopSpacingWP, 'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"

17 of 250 comments (clear)

  1. The author is exactly right. by JoshJ · · Score: 3, Insightful

    This is why the Microsoft Office XML (let's not kid ourself, this is far from "open") format should not become an ISO standard.

    1. Re:The author is exactly right. by _|()|\| · · Score: 4, Insightful

      Prior to reading this article, I was ambivalent about Office XML. The push to standardize Office's "DNA sequence" seemed disingenuous, but at least the format was described in detail. Now I see that the table-sagging 6,000 pages is just the tip of the iceberg: this "standard" effectively includes, by reference, the source code for every prior version of Office, to which only Microsoft has access.

  2. The power of legacy systems... by Anonymous Coward · · Score: 5, Insightful

    The power of legacy systems is at once both Microsoft's greatest strength and greatest weakness. Nobody in OSS is going to have the patience to rebuild the same level of backwards compatibility needed to displace them but the code must be an absolute tarpit of accumulated cruft and security holes that's incredibly difficult for them to keep going.

  3. Basically by DrYak · · Score: 5, Insightful

    ODF is the former SXW format that was taken and transformed into a standard by a committee comprising several Office software makers. It's suppose to describe the normal features that anyone should expect from any Word processing application, be it OpenOffice.org, KWord, AbiWord, Corel Word Perfect, etc. all this in a perfectly neutral way. It was designed with a function in mind (storing word processing documents in an open and interoperable way). Its benefits are comparable to the standardisation of HTML.

    OpenXML is Microsoft trying to translate its proprietary DOC file inside a XML container (because it's a big buzzword) and propose it as a standart to ECMA (because everyone is speaking about ODF being an ISO standard). It describes not only what is to be expected from a word processor, but also all MS-Word specific microsoftism. It was designed with a specific software in mind (and partly derives from the internal functionning of MS-Word). It's only a small improvement over the previous MS XML format (which had a lot of informations hidden in a binary blob).

    The good thing for Microsoft, is that they can pretend this limitation is "Not-a-bug-but-a-feature", and brag around that there are a lot of stuffs that MS-Word couldn't store inside an ODF and only OpenXML can carry.

    Microsoft's plan :
    1. Embrace
    2. Extend <- They are here
    3. Extinguish

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Basically by megabyte405 · · Score: 3, Insightful

      ODF is a nice idea in theory, but really, it's a similar situation (OpenOffice.Org internal dataformat jammed into a standard, so designed with OO.o in mind by necessity) just with more OSS-positive karma associated. There's nothing wrong with saving in a file format that matches your internal representation, in fact, it's a darn good idea (see .ABW for AbiWord, .DOC for Word, .WPD for WordPerfect I would also wager is the same idea). However, interoperability seems to work best when taken from the ground up - when working with another application's data structure of any complexity, you simply can't do a lossless roundtrip without losing before you've started.

      There is, however, a format that can do this sort of thing. Yes, I'm talking about the dark horse of the "file format wars," the non-glamorous workaholic format that even WordPad and TextEdit.app can read with ease: RTF. It may not get press attention, but it's actually a fairly well-documented standard, has been working as an interchange format for years, and yet is designed with enough expandability that it's still useful with the kinds of documents produced today. It's a true de-facto standard.

      This may not be an exciting idea, but for those who really want interoperability, RTF is the way to go with today's software. Not to say that import/export of ODF, Word, WPD, etc. isn't important (AbiWord, a project that I contribute to but do not purport to speak for, has very good to great support for those formats and many others), just that an unnecessary dichotomy is drawn between OpenXML and ODF with regard to their design goals - both are repurposed native formats for a single application.

      --
      I recognize people by their sigs. Is that a bad thing?
    2. Re:Basically by blincoln · · Score: 3, Insightful

      There's nothing wrong with saving in a file format that matches your internal representation, in fact, it's a darn good idea (see .ABW for AbiWord, .DOC for Word, .WPD for WordPerfect I would also wager is the same idea).

      I would argue that when it's taken to the extreme of Office prior to 2007, it *is* a bad thing. AFAIK, the old Word format is more or less a (very) partial RAM dump (which is why you can often find all sorts of interesting stuff in Word files that the authors think they've deleted). That makes for faster dev times, but because the load and save functions don't really "understand" the content of the file, IMO the developers made things a lot harder for themselves in the big picture. I imagine reproducing issues in testing is a particular nightmare.

      --
      "...always new atoms but always doing the same dance, remembering what the dance was yesterday." -Richard Feynman
  4. Re:MIcrosoft sucks. by Aadain2001 · · Score: 4, Insightful

    But they broke plenty of laws to keep their monopoly :) And while their actions during their rise to the top may not have been illegal, they could easily be called 'strong-armed'.

    --
    Space for rent, inquire within
  5. My favorite quote by IvyKing · · Score: 3, Insightful
    From TFA


    This is not a specification; this is a DNA sequence.


    Outrageously funny and to the point.

  6. Re:MIcrosoft sucks. by edwdig · · Score: 3, Insightful

    Microsoft broke no laws getting DOS onto every PC. They happened to be in the right place at the right time, and the market fell onto them. But from there, Microsoft bended and broke the law every chance they got to ensure that there never was any competition.

    Also don't forget that although MS's purchase of DOS was perfectly legal, it was ethically horrible. They arrived at a handshake agreement to license the code from Seattle Computer Company. While the MS paperwork was being finalized by the lawyers, SCC then made arrangements to finance other business ventures using the MS money. MS then presented them a contract to buy the code rather than license it, and told SCC to take it or leave it. As SCC had already committed to the other deals, they had no choice but to take MS's offer. Sure, no one held a gun to the head of the SCC executives forcing them to take the deal, however, they didn't have any other reasonable alternatives. MS's behavior was legal, but certainly not ethical.

  7. Re:MIcrosoft sucks. by TrekkieGod · · Score: 5, Insightful

    If you get to the point where you build up a company that can even consider garnering the term "monopoly", then get back to us...At that point, maybe, just maybe, you may come to thinking that you you earned what you got, and the government has no right to tell you how to run your business...

    Yeah. Because the person best suited to decide what a company should or should not be allowed to do are the people who own the company. Of course you're going to want to be completely unrestricted to mow down your competitors using whatever advantages you have if you are in a position to do so. What you're missing is that no one should be allowed to use unfair practices to do it. Some people think we should idolize the free market as some sort of religion. We don't like free market economy because it was given to us by the gods. We like it because it tends to result in better products and lower prices. That ceases to be true when you have a monopoly in the mix.

    That being said, I'm not really informed about any Microsoft specifics, so I'm not going to argue in favor or against any "federal laws" as it applies to them (or failed to apply to them). However, suggesting that only people who have built a company that holds a monopoly should be able to decide what is fair regulation isn't rational. It may even be that the current federal laws regarding monopolies may be unfair and in need of reform, but the fact remains that the existence of a set of laws to regulate businesses is necessary.

    --

    Warning: Opinions known to be heavily biased.

  8. Re:MIcrosoft sucks. by tyme · · Score: 3, Insightful
    Brandybuck wrote:
    Things that are illegal for a monopoly are perfectly legit for a non-monopoly. It's a crazy law, but that's how it works. Microsoft broke no federal laws to *gain* their monopoly.


    Unfortunately, you are wrong on almost all counts:
    1. Section 1 of the Sherman Act (Restraints of Trade) applies to everyone, not just to monopolists. If Microsoft engaged in any restrains of trade, even before they were a monopoly (which doesn't have to mean 100% market share, as so many people seem to believe, it only means that the company must have the power to set prices in the market*), they would have violated Sherman 1.
    2. Section 2 of the Serman Act (Monopolization) makes it a crime to monopolize or attempt to monopolize. Hence, you can be held in violation of Sherman 2 for doing something that is otherwise perfectly legal, so long as you were trying to maintain or obtain a monopoly. Since (1) Microsoft is obviously a monopoly and (2) they have taken steps both to attain their monopoly position and to maintain it once they had it, they are clearly in violation of Sherman 2.

    The real problem, however, with the Sherman Act is that, in general, it can only be prosecuted by the Federal Trade Commission, and that is under the direct control of the executive power. Ever since the Regan administration, there has been little or no desire on the part of the FTC to persue anti-trust litigation.

    * Courts have generally used the rule that anyone with more that 70% market share obviously has monopoly power, and anyone with less than 20% obviously lacks it, but that between 20% and 70% requires and examination of facts and circumstances before declaring that someone has monopoly power.

    --
    just a ghost in the machine.
  9. Re:Unfair by DavidTC · · Score: 4, Insightful

    Tools that don't care about legacy support are unaffected by this; they can just pick the closest modern option to whatever the legacy flag calls for on input, and not output documents that use them.

    And thus tools, legally, are not OOXML, and won't qualify for purchasing by companies that specify OOXML. Which is the entire point.

    There's a difference between 'We need to make sure that old documents can be converted correctly.', and 'We will literally convert old documents into a new representation that contains all their weirdness, and we won't explain how to implement said weirdness in the standard.'.

    What Microsoft has produced is not even a standard. Standards must specify everything, or reference other standards that specify everything. They can't reference applications.

    If Microsoft wants to keep secret how to turn Office 95 documents into OOXML, fine. Producing a standard doesn't mean you have to explain how to convert things into that standard.

    It does, however, mean you have to explain exactly what should happen if mwSmallCaps is true, to the pixel. You can't just pawn it off on the unexplained hypothetical behavior of some other application.

    --
    If corporations are people, aren't stockholders guilty of slavery?
  10. Re:I can't see this being too big of a problem by Askmum · · Score: 5, Insightful

    You seem to be missing the point.

    You do not need these features to begin with in a new format that is inherently incompatible with an old format. You don't want to say "now I'm going to do WP style linespacing and my linespacing is 1".
    If you want to convert a WP document to an XML document, the conversion program should know that the linespacing in WP is 0.9 times the linspacing in XML document (or what it really may be)and will then use linespacing=0.9 in the XML document. This is not a task of the new wordprocessor or its specification.

    By adding this so-called "backward compatibility" to your specification, you make the spec overly difficult and in fact you make the conversion program in the new application when this is absolutely not necessary.
    And on top of that, you require that the programmer who uses this spec should have knowledge of all these old versions and is able to program them without error. And as the application will grow because of these unnecessary features, the number of bugs will also rise. So this is not a blueprint for a good application, this is a blueprint for a very buggy implementation of a wordprocessor.

  11. Documents outlive applications by Geof · · Score: 4, Insightful

    There's nothing wrong with saving in a file format that matches your internal representation, in fact, it's a darn good idea (see .ABW for AbiWord, .DOC for Word, .WPD for WordPerfect I would also wager is the same idea).

    Documents are worth far more than software, and they outlive the applications used to create them. See the comment to the original article - reading documents after 5, 20, 30, 100 years or more is not optional. You can pay the price of developing an independent format now, or you can pay the price of reverse engineering over and over again every time you change your internal representation.

    Repeated implementation limits future change and innovation. It's expensive: it likely costs more even for Microsoft. But they can afford it; their competitors may not be able to. Plus, Microsoft already has their first implementation.

    interoperability seems to work best when taken from the ground up - when working with another application's data structure of any complexity, you simply can't do a lossless roundtrip without losing before you've started.

    Perhaps so. But compare that cost to the cost I've just outlined. It is in the best interest of users and software developers (maybe even of Microsoft) to bite the bullet now, do the conversion once, and develop a clean format for the future.

    Maybe you have in mind an argument you're not making, but I don't see any sufficient basis for your broad contention that using a file format based on an internal representation is a "darn good idea". In specific cases, yes (e.g. where the cost of development time or effort are the most important factors). In general, I very much doubt it. That successful applications in the past have taken that approach is weak evidence. They were developed when the up-front cost of development in a time of rapid innovation, the loss of customer lock-in, and a lack of open-format competition where good business reasons for making such a choice - even if it was inferior technically, increased cost in the long term, and was bad for consumers. In today's climate of slower innovation, competition from open formats, and customers who are running into their own long-term interests, the situation is different.

    Which is not to say Microsoft's apparent attempt to set the rules of the game and throw sand in the gears of change is not in their interests, or that it will be unsuccessful.

  12. Re:Unfair by Askmum · · Score: 3, Insightful

    So they did it wrong.

    You need to let a conversion program worry about converting Word 2006 documents to XML documents. You need to let the maker of Word 2006 worry about making this conversion program. This can be in the form of a "save as XML" option, but also an external program.
    You can not say "oh, this is an old feature, let's put it in the spec and let's let the programmer that uses this spec worry about it because we can't be bothered to convert it or don't know how to convert it".
    Sorry, but XML should be clear to everyone and if you include an option, you should document the behaviour of this option

    But even so, you do not want the specification of a new document format have all the quirks of all the old formats. That is just silly. That is saying that a car should have a 6V battery system too because old cars have 6V battery systems and you might come across an acessory that uses 6V.

  13. Re:MIcrosoft sucks. by ArsenneLupin · · Score: 3, Insightful

    If you get to the point where you build up a company that can even consider garnering the term "monopoly", then get back to us. Until then, you have no idea what you're talking about, especially when quoting arbitrary and esoteric "federal laws". Call me nuts, but if you ever got to that point, you'd might even get a crazy idea in your head that those "federal laws" that you are so damned proud of, are about as fair and just as our drug laws. At that point, maybe, just maybe, you may come to thinking that you you earned what you got, and the government has no right to tell you how to run your business that you started in your teens, and proceeded to build to make it one of the most successful companies in the history of capitalism.

    Until you get to that point, I suggest that you those "federal laws" out your ass, Mr. Ashcroft. I agree 100% with you. However, for fairness' sake, we should then abolish all those unjust business-hampering federal laws, including copyright and patent law.

    Oh, and also those so-called "computer misuse" laws. Indeed, if I want to set up a consultancy where I propose to convert customers ASP scripts to PHP I should be allowed to demo to my prospective customers in great graphical detail why ASP is so insecure, even if I don't yet have an existing business relationship. Why should I tolerate that the government tells me how I may and may not recruit new customers?

    Anything less would be one-sided and unfair.

  14. Re:MIcrosoft sucks. by hachete · · Score: 4, Insightful

    Yes, they got into trouble for bundling but it misses the point every time. The secret sauce that Microsoft uses is to strong-arm the OEMs into bundling windows with PCs, espeicially for consumers. I'm also thinking that the Windows Tax is levied even if you buy Linux on a Dell. This is the lynch-pin of Microsoft domination, without it all their other strategies whither on the vine. Without bundling of windows with new pcs, the bundling of IE (and all the other sofware), the resistance against inter-operability, the mysterious file formats etc wither on the vine. I've been disappointed that *none of the investigations I've read about have gone after the OEM-Microsoft link. Break that, and you'll have a free-market again.

    I think the Office XML format style is a play straight out of IBM's hand-book: make the standard complex and incomprehensible, and the little players - that's you - will find it hard to compete. In a way, that's a good sign: Microsoft is now lumbering into middle-age, hoist on their own evermore complex petard.

    The other thing about middle-age is that every little technological step away from their established base-line is treated as a revolution. In reality, it's no such thing, just a small stepping stone to shouting "pesky kids. Get off my lawn." Or maybe they've reached that stage already.

    --
    Patriotism is a virtue of the vicious