Slashdot Mirror


Dark Corners of the OpenXML Standard

Standard Disclaimer writes "Most here on Slashdot know that Microsoft released its OpenXML specification to counter ODF and to help preserve its market position, but most people probably aren't aware of all the interesting legacy code the OpenXML specification has brought to light. This article by Rob Weir details many of the crazy legacy features in the dark corners of OpenXML. As it concludes after analyzing specification requirements like suppressTopSpacingWP, 'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"

8 of 250 comments (clear)

  1. Backwards compatibility by Bob54321 · · Score: 3, Interesting

    I thought most people considered themselves lucky if there documents could open in successive versions of Office. Why would anyone want to implement support for really old versions if Microsoft does not do it themselves?

    --
    :(){ :|:& };:
  2. Don't forget the page counts... by Anonymous Coward · · Score: 5, Interesting


    ODF spec page count: 722.

    OpenXML spec page count: 6000 !!

  3. Re:Basically by Nicopa · · Score: 4, Interesting

    No. ODF has several real, factual, benefits. It might have been originated in a single product but... it reuses existing standard technologies (SVG, CSS...). It has properly designed XML tags that act as "markup", in OpenDocument xml tags act as container for chunks of data. ODF tries to separate content from style.

    And about your RTF suggestion... can I draw diagrams with RTF? Can I have a ToC? Can I do complex styling? Can I have a "galery" of styles? Can I include images? No. RTF is not a solution.

  4. Forbidden partial implementation? by tepples · · Score: 4, Interesting

    OOXML is just as open as ODF

    The behavior of years-old proprietary word processing software is included by reference into OOXML. How is any spec that includes by reference the behavior of proprietary software exactly "open"? True, implementors could produce a partial implementation of the spec that degrades away the legacy baggage (more or less) gracefully, but some standards' patent licensors forbid implementors to publish a partial implementation. I don't know if this applies to OOXML's license.

  5. Disadvantages of ISO by BillGatesLoveChild · · Score: 5, Interesting

    Once it is ratified as an ISO Standard, the standard is locked up and anyone that does want to a copy has to buy it from ISO. These are copyrighted. They're not cheap; thousands of dollars. Out of the reach of the average hobbyist, and not listed anywhere on the Internet. That 6,000 page draft will vanish into the mists of time.

    Larger Companies can afford this, but garage companies and hobbyists definitely can't. So what's the chance of an open source or even small upstart challenging Microsoft's Documentonopoly? Zero.

    Want another example? ISO country codes. The country codes (e.g. .us, .jp) are actually ISO, and ISO ended up backing off on a demand for royalties for this(!) But if you want state codes (e.g. California, Kantou), well, forget it unless you want to buy them off ISO. http://www.alvestrand.no/pipermail/ietf-languages/ 2003-September/001472.html

    ISO aren't the only ones guility of doing this. IEEE do it as well. Want the latest simulation standard? Then get out your checkbook: http://standards.ieee.org/catalog/olis/compsim.htm l

    ISO and the IEEE are enemies of openness. Microsoft is taking a page out of their gamebook.

    ISO or IEEE certification is a *bad* thing.

  6. MS areslow learners by WebCowboy · · Score: 3, Interesting
    ...but they do learn....slowly...eventually.

    Their "open" XML format for office docs is a prime example of this.

    I think Steve Jobs was the one who first said "Microsoft just doesn't get it". Microsoft was probably the very first third-party software developer for the Mac and this was Jobs' reaction to Microsoft's first Mac applications (I think a port of Multiplan--which was re-incarnated into Excel IIRC, and MSBasic). They really WERE "tasteless", ugly and took almost no advantage of the revolutionary GUI interface--their DOSness really showed through--I think in the case of Multiplan the mouse could be used only to jump the cursor to a certain cell and that was it--the rest was all like in DOS.

    MS Windows is another example--Microsoft didn't "get it" well enough until the third major release. Now MS is SLOWLY "getting it" with the beneficial characteristics of XML standards. Microsoft's early XML efforts are like Windows 1.0--there is some very rudmentary understanding of the mechanics but not the philosophy of XML, and I wonder if this is why SOAP ended up NOT so simple (given Microsofties were involved in its creation and seemed to be trying to make it a DCOM-in-XML-but-dumber thing). Microsoft's "Version1" XML might look like this:

    <Soap:Envelope>
    <Soap:Body>
    <wsWriteLegacyData>
      <encodedBinaryData>
    SDFgkdfkljSDFJLDFSJKLkjdfbks df jklsdfklj;hk/jkjnb.kndf
    jk.sdfjkldfsddfsdfkkjsdfh kvbkjnkjkjksdfkjsdfkeuieru903
    oijooeoefvkmefmklef lmkseflkvfeklmlmermklemleflmdvldflk
    </encodedBina ryData>
    </wsWriteLegacyData>
    </Soap:Body>
    </Soa p:Envelope>
    "See? We're using XML and SOAP! We're hip! We're cooool! You can't say we don't play by the rules now!"

    Of course, this is an obtuse, opaque and obsfucated way to use XML andtotally NOT in the spirit of interoperability and openness. I won't even go into the nifty XML tools MS has made...nifty to use but they've done a lot to obliterate the S out of SOAP in their crazy output.

    OOXML (Opaque and Obsfucated XML) standard is "version 2.0"--they're doing their best to eliminate ambiguity but now we've gone over to hyper-specificity, and the standard is being shared a bit better...problem is that they don't fully describe the interpretation of the standard elements so as to keep its advantage. All they've done is taken every formatting option and mapped it to an XML element--it is monolithic and completely non-extensible. But hey, at least its publicly available and doesn't involve weirdness like encoded-binary-blobs.

    In a few years MS will reach version 3.0 of "getting" XML...
  7. the real hitch - it never was clear by Erris · · Score: 4, Interesting

    The hitch here is that *not* having them means tons and tons of reverse engineering, and that's only after tracking down every release of every version of every MS Office ever.

    The real hitch, as the article hints, is that the releases are contradictory. For instance, the Mac version of small caps is different from others. This is part of the reason Word is so bloated and does not preserve printing type setting from one machine to the next.

    Ten years ago, a state agency I was working for was forced to move from Word Perfect to Word. Hundreds, if not thousands, of documents were painstakingly converted from one format to the other. The typesetting, which they had never had a problem with previously, was easily broken by moves from one machine to the other or by changing printers. That is the kind of thing that no program can account for - it was broken from then and can not be created correctly today. It's also probably the reason for all of the nebulous "guidance" sections that don't tell you anything other than to look at, and presumably measure, old printed examples. Not even M$ knows what it was really doing in the field. As I saw at the time, no two were alike.

    Of course, the time to get things right is not in your XML it's when you import the document. The author tells us this in so many words. The XML should be general enough to encompass any kind of typesetting. It is the importing program's task to figure out what the old format wanted things to look like. As the author points out, the spec does not do anything other create something impossible to follow. It's not going to magically make things look right no matter how hard they wish it would.

    --
    DMCA, Hollings, Palladium. What might have sounded like paranoia is now common sense.
  8. OOXML's Origin Is Not The Problem by NickFortune · · Score: 3, Interesting
    ODF is a nice idea in theory, but really, it's a similar situation (OpenOffice.Org internal dataformat jammed into a standard, so designed with OO.o in mind by necessity)
    The ODF format must necessarily describe the structure and layout of an office document. There's no need for it to reflect the internal data structures of any specific application, except to the extent that they too describe office documents.

    OOXML includes data elements that should be part of internal import routines rather than being enshrined in the document format, and it includes elements that are not specified except by reference to applications for which no public specs exist. This is the problem, not the fact that OOXML is derived from MS Office file formats.

    RTF. It may not get press attention, but it's actually a fairly well-documented standard, has been working as an interchange format for years, and yet is designed with enough expandability that it's still useful with the kinds of documents produced today. It's a true de-facto standard.
    Well, I was a big fan of RTF at one time. But a few years back I found that documents with any kind of formatting more complex than paragraph+justification+font just wasn't working between MS Office and back. I don't know if this was because the format couldn't cope, or because of faulty implementations. In either case, it led me to give up on RTF.

    In any event, to be a replacement, RTF would need to work for spreadsheets and presentations at a minimum - something I don't think there's a lot of support for in the current RTF specification. We'd also lose the benefits of an XML based format, which given the amount of work on the seamless integration of XML documents into databases, web services and other data management applications means losing a lot of functionality.

    for those who really want interoperability, RTF is the way to go with today's software
    Interoperability is only part of the problem. We also want a spec that can be fully and freely implemented by anyone, which isn't under the control of any single vendor.We want a format to which we can entrust documents, knowing that in twenty years time there will be an application capable of reading them.

    an unnecessary dichotomy is drawn between OpenXML and ODF with regard to their design goals - both are repurposed native formats for a single application.
    I don't know what you mean by native in this case, but the repurposing of OOXML isn't the problem. It's one of size and obfuscation, and as TFA points out specification by reference to closed formats and the behaviour of extinct proprietary software. These are non trivial problems with OOXML which are not (to the best of knowledge) found in ODF.

    There's nothing wrong with ODF. Re-creating it based on the non-XML RTF would be a waste of time and effort.

    --
    Don't let THEM immanentize the Eschaton!