Slashdot Mirror


Dark Corners of the OpenXML Standard

Standard Disclaimer writes "Most here on Slashdot know that Microsoft released its OpenXML specification to counter ODF and to help preserve its market position, but most people probably aren't aware of all the interesting legacy code the OpenXML specification has brought to light. This article by Rob Weir details many of the crazy legacy features in the dark corners of OpenXML. As it concludes after analyzing specification requirements like suppressTopSpacingWP, 'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"

1 of 250 comments (clear)

  1. The site seems to be slow... by junglee_iitk · · Score: 5, Informative
    You want to hire a new programmer and you have the perfect candidate in mind, your old college roommate, Guillaume Portes. Unfortunately you can't just go out and offer him the job. That would get you in trouble with your corporate HR policies which require that you first create a job description, advertise the position, interview and rate candidates and choose the most qualified person. So much paperwork! But you really want Guillaume and only Guillaume.

    So what can you do?

    The solution is simple. Create a job description that is written specifically to your friend's background and skills. The more specific and longer you make the job description, the fewer candidates will be eligible. Ideally you would write a job description that no one else in the world except Guillaume could possibly match. Don't describe the job requirements. Describe the person you want. That's the trick.

    So you end up with something like this:

    * 5 years experience with Java, J2EE and web development, PHP, XSLT
    * Fluency in French and Corsican
    * Experience with the Llama farming industry
    * Mole on left shoulder
    * Sister named Bridgette

    Although this technique may be familiar, in practice it is usually not taken this extreme. Corporate policies, employment law and common sense usually prevent one from making entirely irrational hiring decisions or discriminating against other applicants for things unrelated to the legitimate requirements of the job.

    But evidently in the realm of standards there are no practical limits to the application of the above technique. It is quite possible to write a standard that allows only a single implementation. By focusing entirely on the capabilities of a single application and documenting it in infuriatingly useless detail, you can easily create a "Standard of One".

    Of course, this begs the question of what is essential and what is not. This really needs to be determined by domain analysis, requirements gathering and consensus building. Let's just say that anyone who says that a single existing implementation is all one needs to look at is missing the point. The art of specification is to generalize and simplify. Generalizing allows you to do more with less, meeting more needs with few constraints.

    Let's take a simplified example. You are writing a specification for a file format for a very simple drawing program, ShapeMaster 2007. It can draw circles and squares, and they can have solid or dashed lines. That's all it does. Let's consider two different ways of specifying a file format for ShapeMaster.

    In the first case, we'll simply dump out what ShapeMaster does in the most literal way possible. Since it allows only two possible shapes and only two possible line styles, and we're not considering any other use, the file format will look like this:

    <document>
    <shape iscircle="true" isdotted="false"/>
    <shape iscircle="false" isdotted="true"/>
    </document>

    Although this format is very specific and very accurate, it lacks generality, extensibility and flexibility. Although it may be useful for ShapeMaster 2007, it will hardly be useful for anyone else, unless they merely want to create data for ShapeMaster 2007. It is not a portable, cross-application, open format. It is a narrowly-defined, single application format. It may be in XML. It may be reviewed by a standards committee. But it is by its nature, closed and inflexible.

    How could this have been done in a way which works for ShapeMaster 2007 but also is more flexible, extensible and considerate of the needs of different applications? One possibility is to generalize and simplify:

    <document>
    <shape type="circle" lineStyle="solid"/>
    <shape type="square" lineStyle="dotted"/>
    </document>