Slashdot Mirror


Stephane Rodriguez Dismantles Open XML

Elektroschock writes "Stephane Rodriguez, a reengineering specialist who became popular for his article on MS Office 2007 binary data, now comprehensively debunks Microsoft's new Open XML format. With small case studies he demonstrates the impossible challenges third-party developers will face. His conclusion: it is 'defective by design.' Next week members of the International Standard Organization are likely to approve the format as a second official ISO standard for office documents, even though most nations have submitted comments. Rodriguez claims he is 'not affiliated to any pro-MS or anti-MS party/org[anization]/ass[ociation].'"

13 of 188 comments (clear)

  1. Re:This is not proof of OOXML being defective by d by darkatom · · Score: 4, Insightful

    But that's still a problem. Microsoft's implementation becomes the de facto standard and all others must (attempt to) conform to the behavior of that implementation or be judged defective. This is what happened when MS published the MAPI (Mail API) spec and then released an implementation alongside it. Lotus and others could never fully mimic what the MS implementation did, so they eventually languished.

  2. Re:This is not proof of OOXML being defective by d by Anonymous Coward · · Score: 5, Informative

    "by design" is of course about motivation which we can know in OOXML from emails, quotes, obtuse or brittle design, and lack of specification.

    The document contains all of these. I suggest that you read it.

    By the way -- there's newly discovered undocumented Microsoft tech present in OOXML, such as SSPI ("Security Service Provider Interface") which is a proprietary Microsoft developed protocol for security providers, and OLE ("Object Linking and Embedding") which is for embedding (eg, taking an Excel spreadsheet and putting it into a Word document). This is undefined in OOXML only available on Microsoft Windows.

  3. Re:This is not proof of OOXML being defective by d by bomanbot · · Score: 4, Interesting

    This is not proof of OOXML being defective by design. It only shows that apparently MS's software isn't able to handle OOXML properly

    Um, isnt the fact that not even Microsofts own software can handle OOXML which btw. is designed by Microsoft themselves, proof enough that something is seriously wrong with the design of OOXML?

    I mean if not even the maker of OOXML can get it to work properly in its own products, how are third parties supposed to do it? And if no one is able to implement OOXML correctly, what is this "standard" good for besides being a great smoke-and-mirrors tactic by Microsoft themselves?
  4. Re:This is not proof of OOXML being defective by d by David+Gerard · · Score: 5, Funny

    OOXML is a theoretically perfect standard that just happens to have no implementations whatsoever.

    --
    http://rocknerd.co.uk
  5. Re:This is not proof of OOXML being defective by d by setagllib · · Score: 5, Insightful

    It's deliberate. The standard is just a distraction, to keep competitors busy trying to implement it, while documents are actually being created in the Office 2007 variant of OOXML. A few months of legacy almost guarantees a transition to the real OOXML would be an uphill battle, especially with no real documentation of how *either* format works. So even with a supposed 'standard' and a near-enough implementation, the vendor lockin is just as strong as it was with the binary formats.

    --
    Sam ty sig.
  6. Personally.. by nrgy · · Score: 5, Interesting
    Personally I like this link (pdf) in the ariticle.

    From: Bill Gates
    Sent: Saturday, December 5 1998
    To: Bob Muglia, Jon DeVann, Steven Sinofsky
    Subject : Office rendering

    One thing we have got to change in our strategy - allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company.

    We have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities.

    Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destroy Windows.

    I would be glad to explain at a greater length.

    Likewise this love of DAV in Office/Exchange is a huge problem. I would also like to make sure people understand this as well.

    I'm not saying this as some linux nut job but its things like that which just drive me nuts. Regardless of which ever os I prefer that kind of thinking just boils my blood.

    How can any committee deciding on open standards seriously take a company which has been proven time and time again to play by its own rules and whenever it offers something labeled OPEN its about as open as the doors to Fort Knock are to the average person.

  7. Smokescreen for Sharepoint by Sweetshark · · Score: 4, Interesting

    This "OpenXML" stunt is just a smokescreen covering Microsofts controlled retreat in the office format battle. It only needs to keep parties distracted until Microsoft has reclaimed the control over business content by means of vendor lockin v2.0 aka Microsoft Office Sharepoint Server.

    http://weblog.infoworld.com/openresource/archives/ 2007/04/while_you_were.html
    http://www.itbusinessedge.com/blogs/mia/?p=198

  8. Re:ODF specifies ASCII number IEEE float value? by The+New+Andy · · Score: 4, Informative

    The relevant code from an ODF spreadsheet:

    <table:table-row table:style-name="ro1">
    <table:table-cell/>
    &#8722;
            <table:table-cell office:value-type="float" office:value="123456.123456789">
    <text:p>123456.12</text:p>
    </table:table-cell>
    </table:table-row>

  9. Re:This is not proof of OOXML being defective by d by swillden · · Score: 5, Insightful

    This is not proof of OOXML being defective by design. It only shows that apparently MS's software isn't able to handle OOXML properly.

    If Office can't read OOXML files produced by other tools, and other tools can't read Office OOXML files, where do you suppose end users will place the blame?

    And what do you suppose users will do when faced with incompatibilities?

    It's a brilliant strategy: Define a new "standard" but don't quite implement it yourself, ensuring that no one can implement a competitive office suite that is compatible with yours. Further, make the standard complex and weird enough that you can always blame inconsistencies on the other implementations. Voila! You get to proclaim to the world that your de facto standard office suite supports an open, ISO-blessed international standard format -- but with no worries about losing your lock-in.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  10. Re:Can anyone repro? by Karellen · · Score: 4, Insightful

    Uh, UTF-8 files do not need a BOM. What the fuck is the point of a byte-order-mark on an encoding that is byte-order neutral?

    One of the advantages of UTF-8 for text files is that you don't need a BOM. With XML it's even easier because, as you point out, the XML declaration ("XMLDecl" in the spec) header can contain the "EncodingDecl" to tell explicitly you the file is in UTF-8. If the EncodingDecl says UTF-8, and the file is encoded in UTF-8, then if an XML parser cannot handle that, it's seriously fucked an needs to be fixed.

    You might also want to go read STD-63 at some point. It points out that there are a few problems with using BOMs in UTF-8, and that if there is a way for UTF-8 to be determined in a way other than with the use of a BOM, that should be used instead. Given that XML specifically includes support for an "EncodingDecl" in the "XMLDecl", it is clear that best practices dictate that you *shouldn't* use a BOM when working with UTF-8 encoded XML files. Even if your tools _insist_ on writing BOMs to such files, they had *better* still be able to work if the BOM is missing.

    Heck, with OOXML, you could also use the ZIP's manifest file to keep track of file metadata like the character encoding.

    --
    Why doesn't the gene pool have a life guard?
  11. Disingeneous by golodh · · Score: 4, Informative
    I see three questions here:

    -Q(1) What does Rodriguez's article show?

    -Q(2) is OOXML in and by itself flawed?

    -Q(3) What's the practical relevance of the question whether OOXML is flawed?

    -Q(4) So what's in it for Microsoft? Why do they bother?

    -

    - Q(1) : What does Rodriguez's article show?

    - A(1) : Rodriguez's article show that the OOXML format written by latest Microsoft Office applications, among them MS Excel, is:

    - sorely defective in that you can't be sure to get your original data back after saving it to OOXML

    - impossible to change outside MS Office applications

    - tied to the MS Office way of representing internationalised versions of documents because "of the way Microsoft chose to store XML using the US English locale, no matter how good your implementation is, you have to retrofit it to work just like Office does" in order to accommodate internationalised documents

    - MS Office legacy formats supported throughout, greatly (and unnecessarily) contributing to the size and complexity of the 6,000 page standard.

    - Q(2): Is OOXML flawed in and by itself?

    - A(2):Yes, I think so, partly because of Rodriguez's article, partly because of flaws documented elsewhere: see http://www.noooxml.org/petition The points 2,3,4,5 listed there seem especially crippling to me:

    (2) There is no provable implementation of the OOXML specification: Microsoft Office 2007 produces a special version of OOXML, not a file format which complies with the OOXML specification;

    (3) There is information missing from the specification document, for example how to do a autoSpaceLikeWord95 or useWord97LineBreakRules;

    (4) More than 10% of the examples mentioned in the proposed standard do not validate as XML;

    (5) There is no guarantee that anybody can write software that fully or partially implements the OOXML specification without being liable to patent lawsuits or patent license fees by Microsoft;

    - Q(3): What's the practical relevance of the question whether OOXML is flawed?

    - A(3): Enormous. We currently see that Microsoft is trying to convince the world to accepted OOXML as an ISO "standard", whereas it's no such thing. It's too loosely defined, and opposed to the existing Opendoc standard there is no open-source reference implementation. So there will be a morass of possible implementations, of which only Microsoft's own implementations will be guaranteed mutually compatible. That's a polite way of saying that Microsoft simply aims at continuing its format lock-in, only this time the under the name of OOXML.

    - Q(4) : So what's in it for Microsoft? Why do they bother?

    - A(4) : Well ... Microsoft has a policy whereby it quite explicitly does not want other people's software, let alone Open Source software, to render MS Office documents correctly.

    For reference, see this email, (cited from Rodriguez's article):

    From: Bill Gates

    Sent: Saturday, December 5 1998

    To: Bob Muglia, Jon DeVann, Steven Sinofsky

    Subject : Office rendering

    One thing we have got to change in our strategy - allowing Office documents to be rendered very well by other peoples browsers is one of the most destructive things we could do to the company.

    We have to stop putting any effort into this and make sure that Office documents very well depends on PROPRIETARY IE capabilities.

    Anything else is suicide for our platform. This is a case where Office has to avoid doing something to destroy Windows.

    I would be glad to explain at a greater length.

    Likewise this love of DAV in Office/Exchange is a huge problem. I would also like to make sure people understand this as well.

    Is that

  12. Re: US English not "canonical" by Jeremy_Bee · · Score: 4, Insightful

    However, other things seem either wrong or have a bias towards hand editing of the files, e.g. "International, but US English first and foremost". He complains that it uses U.S. English settings. He may not like the U.S., but it's called picking a canonicalized format. This is offensive bull.

    I don't think you intended it that way, but you should be aware of the vast number of people you just insulted. US English and US dates are only "canonical" in the minds of US citizens. If not for Microsoft purposely and determinedly screwing up the implementation of anything but US standards in their software the usage would have no traction at all.

    The majority of the "English speaking" world still uses the English language and English formats and standards, not US variant ones. The fact that the USA has seen fit to re-invent English, still refer to that as English, and then foist it on the rest of the world doesn't make it "canonical."

    As the author of this article so aptly describes, date formats and language implementations are a multi-stage nightmare in Office. To the point that the majority of users even in English speaking countries like Canada, Australia, New Zealand and the UK itself, often end up using American English and American dates simply because Office is the only game in town and you cna only bash your head against the wall on these things for so long. That doesn't make it right, and that doesn't mean that those users wouldn't be happier and more productive if they were not forced to use a US standard when they may have not even traveled to the US.

    Any kind of English except the US variant, is severely broken in Office and always has been. Your answer sounds to me a lot like: "So what, they should all be using our standards and language anyway." Not helpful at all, and illogical as well.
  13. Re:This is not proof of OOXML being defective by d by QuestorTapes · · Score: 4, Insightful

    > But that's still a problem. Microsoft's implementation becomes the de facto standard
    > and all others must (attempt to) conform to the behavior of that implementation or
    > be judged defective.

    It's worse than that. Since MS defines a number of aspects of the specification solely
    in terms of compliance with MS application software, the MS implementation is not only
    the -defacto- standard, but the very explicit standard. Not only can no one conform
    to a sufficient level to be judged compliant in the marketplace, for all contractual
    specifications, -nothing- but MS software can -ever- be 100% compliant.

    This means on big, contract driven projects, such as many government projects, MS
    and vendors using MS tools are effectively the only possible competitors, unless
    the contracts and specifications specifically waive vendor compliance with those
    parts of the spec.

    And I strongly doubt anyone would ever write a contract like that.