Office 2007 Fails OOXML Test With 122,000 Errors
I Don't Believe in Imaginary Property writes "Groklaw is reporting that some people have decided to compare the OOXML schema to actual Microsoft Office 2007 documents. It won't surprise you to know that Office 2007 failed miserably. If you go by the strict OOXML schema, you get a 17 MiB file containing approximately 122,000 errors, and 'somewhat less' with the transitional OOXML schema. Most of the problems reportedly relate to the serialization/deserialization code. How many other fast-tracked ISO standards have no conforming implementations?"
If you can change a vote of "no with comments" to "yes" I don't see why you couldn't change "fails with 122,000 errors" to "passes." I mean, when your standard passes through sheer lobbying and politics with little technical analysis, it's going to take a lot to surprise me with how epically it fails.
My work here is dung.
the Open Document Format? Just curious.
Technical details mean absolutely nothing in this discussion. I thought we established this.
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
You just use this conversion tool called Open Office
Engineering is the art of compromise.
Men in Black? What happened to good old megabytes? The article says 17MB!
In a blog posting this week, Alex Brown, leader of the International Organization for Standardization (ISO) group in charge of maintaining the Office Open XML (OOXML) standard, revealed that Microsoft Office 2007 documents do not meet the latest specifications of the ISO OOXML draft standard. "Word documents generated by today's version of Microsoft Office 2007 do not conform to ISO/IEC 29500," said Brown in a blog post recounting the process of testing a document against the "strict" and "transitional" schema defined in the standard.
Ahem. Let me be the first to say:
Brownie, you're doing a heck of a job!
Without a reference implementation, how do you know a standard is valid?
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.
Seriously......anyone not see it coming? Office 2007 being submitted to this test is like submitting to a "Will it float?" test with your hands tied and the good ol' cement shoes strapped on.
which is that it's the standard that's deficient. I'm sure that the standard will soon be "improved" so it conforms with Office 2007
OOXML is such a fraud that it's disgusting that we continue to waste such time on it. If it could win on the merits it wouldn't need such underhanded tactics by its (very few) supporters. It's clearly intended as an ODF-killer by creating an unnecessary parallel "standard".
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
While it's hardly unexpected that Office 2007 document format isn't *cough* ISO compliant, 122k errors for a 60Mb file results into a remarkable ~500 bytes of markup per error.
I really do not understand where Microsoft is heading. They've rammed their miserable OOXML format through - supposedly so they could advertise their product as ISO compliant. But what's their advantage now that their product is shown to be so horribly incompatible?
It's not a fast-tracked ISO standard, but HTML and CSS have no conforming implementations. I'm not sure, but links might conform to HTML.
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
I don't want to destroy the mood that the slashdot editor wanted to create by posting this sensational peace of propaganda. but this is not 122.000 bugs is it? this is a parser generating 122.000 error results. sure it's bad.. but anyone who has ever tried to make code w3c compatible or debug any piece of code will know that just 1 error can result into many many many error results. thus ( despite my will for it to be so ) does not really give you much insight in microsofts compatibility with it's own standard.
For one example where this has worked well, consider vehicle networking. Bosch invented/designed the Control Area Network (CAN). This was standardised by SAE as part of the in vehicle networking specification. ISO then just adopted the SAE stuff and extended it in some new areas. The stuff all works well and is based on proven technology (ie. the technology existed before the standards).
Engineering is the art of compromise.
In other words, if you're validating against the TRANSITIONAL spec, the OOX documents aren't horribly far off. And it's wrong in such a way that's easy to compensate for in code (i.e. check for "true|on" for a truth value). That's a markedly different situation than described by the headline's "'somewhat less' with the transitional OOXML schema" claim.
And in case anyone claims that ODF doesn't have the same sort of problem, I refer you to AbiWord bug 11359/OpenOffice bug 64237. This one is a show-stopper.
> Wha? Valid in what respects?
Valid as in possible to implement. How could a standard not be possible to implement you ask? Well that is simple. E.g. write a program that follows this standard:
1. It must print "1" on exit
2. It must print "2" on exit
As you can see, it would not be possible to implement a program according to that standard. That is why someone would need to write a reference application implementing the standard to notice errors like this. Before the standard is given to the whole world to be implemented.
It is better that only one has to wonder the errors of the standards, rather than the whole world.
You need at least one coded reference implementation or else you'll end up with something in the standard which is difficult/impossible to implement. Especially in a 6,000+ page standard.
ISO would be well advised to take the method the IETF uses, which is to have two independent teams implement the standard based on the documentation before an RFC can reach a Draft Standard status. I suspect ODF would have only benefited from this process by cutting down its rough edges, while OOXML would have been so cumbersome that it would be simply dropped.
Not a typewriter
After the first error, are the remaining errors meaningful (i.e. false positives)? I believe most errors after the first are false positives relative to the first error.
That explains why OSI is such a trainwreck compared to IP.
Not a bottom upSo why was ODF approved, then? Or ISO C?
adopt the lowest common denominator of whats already out there"Lowest common denominator" is not equivalent to bottom-up design.
http://outcampaign.org/
Obligatory: 122,000 errors should be enough for anybody.
Seven puppies were harmed during the making of this post.
Ha!
Then there are those of us who think the prank is the people who refuse to use it (and who trot out the tired "hard drive manufacturers are stealing my disk space" myth/meme).
Seriously, the one thing we can agree on is that there is often confusion regarding whether someone meant "1000" or "1024" when they used a prefix. The difference in approach between the two camps is:
1. Stick with the status quo (where one tries to guess the convention being used based on context). That is, just accept with the confusion/inaccuracy.
2. Use SI units in the original SI sense (powers of 10) and use new binary prefixes when you really mean it (power of 2). That is, create a convention and adhere to it.
Interesting that in a discussion about standards (and failures thereof) you would argue that a standard meant to reduce confusion is a prank! I agree, by the way, that "mebibyte" sounds kinda silly... but who cares? It gets the job done. ("Quark" was a silly name, but it's now deeply ingrained in science and no one thinks twice about it.)
For what it's worth, many software products now use the binary prefix notation (e.g. Konqueror).
The details are trivial and useless; The reasons, as always, purely human ones.
1. It must print "1" on exit
2. It must print "2" on exit onExit() {
print("1");
print("2");
}
What's so hard about that?
There's a fundamental difference between the IETF and ISO. IETF makes standards of stuff that has been proven to work (or at least be implementable), whereas ISO wants to write specs to tell people what should work.
A bit like comparing tcp/ip and whatsitsname (x400?). It doesn't really matter how nice something looks on paper if there's no good implementation of it.
Isn't that what file formats do?
ROMANES EUNT DOMUS
The referenced article claims that "the English had imposed GMT on the rest of the world by force when Britain was a big colonial power", which is bogus.
The English had a major sea trading infrastructure, at a time when improvements in clocks finally made accurate determination of longitude by celestial navigation practical for trans-Atlantic voyages.
They established an observatory at a major port (Grenwich) to provide a time-hack for ships in port (both military and commercial) to set their clocks, and distributed navigational charts with that observatory's latitude as the basis for the coordinate system (thus simplifying navigational calculations).
This quickly became the defacto standard on a voluntary basis among commercial shipping, along with the cities that grew up around major seaports (with multiples-of-an-hour offsets to approximate local noon - typically multiples of an hour, sometimes of a half- or quarter-hour), just as the coordinate system became the standard for shoreline mapping in other locations (to simplify navigation near shores by ships using the Grenwich meridian for their ocean charts). Then when railroads drove time standardization it spread from the seaport cities to inland locations.
Of course the empire's military and government used it internally. But the rest of the world adopted it voluntarily.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Oh wait! It wasn't!
The fast-track is for de-facto standards which are already so widespread (i.e. supported by multiple vendors) and consistent that there's little point in trying to push a divergent standard out, even though a divergent standard might be better. Something like TCP/IP would be a good example of the sort of thing where the fast track might be appropriate. ODF wasn't fast-tracked, so the standards committee came up with the best standard, irrespective of what might actually be out there in the wild. Now it's up to the vendors to catch up. That's the usual way this is done (i.e. the C++ standard, where most vendors took a few years to catch up, or the C standard where most vendors took a few months to catch up, and MS took a few years).
Of course, if MSOOXML had gone through the regular track, it probably would have taken years to finish (since it's so large, complex, and poorly defined), and MS couldn't afford to wait. So instead they bought themselves a standards committee or twelve.
And that's what's been going on. However, a lot of governments and other organizations are now realizing that leveraging all that data they've been gathering for the better part of two decades on a closed, proprietary standard could lead to disaster. That's the whole point of trying to get an internationally recognized open standard that anyone can implement. ODF is supposed to fulfill the function of a published, implementable office document standard so that, theoritically, in 2100AD, when someone needs to open a document created in 2010, it's in a openly available format that, at the very worst, someone has to reimplement, but at least has clear, concise documentation that isn't thousands of pages long and doesn't include references to proprietary standards.
The problem with that is that an open document format standard is a direct threat to Microsoft's near-monopoly in the office app department. If anyone can implement a document format that's cross-compatible, then they can easily implement a competitor to Office, and if they decide to undercut Office or (as with OO.org) give the damn thing away, then Microsoft's monopoly is one breath from collapse, and believe me, if Microsoft loses Office, they're in serious, serious trouble within five years. So, OOXML, a "standard" that not even Microsoft can implement, is pushed through the ISO using all sorts of peculiar and ultimately nefarious methods now means Microsoft and its partners can go around telling Small Town, USA that Office saves in an ISO standard, but in reality, the poor bastard in 2100AD who needs to open this file is going to be spending many months trying to figure out this monster, which is in direct violation of the whole notion of an open standard.
That you have no problems is irrelevant. That's not what the point of an open standard is.
The world's burning. Moped Jesus spotted on I50. Details at 11.
The microsoft implementation would print "1" on Vista Home, "2" on Professional and "12" on Premium. It prints "4" on Linux just to prove it's linux that is broken. On Mac OS X it would print "1" and then "2" if you paid $50 more.
Actually, what am I saying. A M$ program exiting cleanly.... ha ha
I thought the idea behind the fast-track was a have less-fussy way of ratifying standards, when those standards were already widely used.
If that is correct, then how does the MSOOXML standard qualify? This is a "standard" that is used by absolutely nobody, not even the creator of the standard uses this standard.
Do I not understand the idea behind the fast-track process?
Do ya think?
Governments started demanding documents in open formats.. that threatened their monopoly, so they paid to get their XML schema called one.. now governments go back to buying exclusively Office again... MS Wins.
End users don't give a shit about open. Governments do but only on paper.. once it comes down to the buying decision all they need is a checkmark on a list. It doesn't actually have to mean anything (cf. Posix compatibility in NT4.. damned near useless but it was a requirement at the time).
The point of the article is that there are no conforming implementations. There never will be a conforming implementation and everyone knows it.
No calls now, I'm
ODF is the tip of a very big iceberg. It's an important and public facing tip but it is a small part of both government and business wasting money on the upgrade treadmill and all the intentional waste of M$ Office. It's all downhill from here.
No calls now, I'm
twitter now has six known accounts on Slashdot, three of which have negative or near-zero karma.
The twitter monologues. Click on my homepage and be amazed.
>How many other fast-tracked ISO standards have no conforming implementations?
C++?
Try out the "export" keyword next time you write any C++.
ISO 25436 describes a version of the Eiffel programming language that has never been fully implemented. The standard contains lots of "blue-sky" "would-be-nice-to-have" sections which are planned to be implemented in the future.
ECMA gives the document author a lot of control, so things can become ECMA standards that would not become ISO standards. But then the fast track ISO process (for existing ECMA standards) makes it easier for them to become ISO standards.
Paid Q&A/Research
As far as I know, Open Office produces valid ODF documents (with the odd extension for things like spelling and grammar checker options that are application-dependent), but it doesn't necessarily implement 100% of the latest version of the ODF spec. (In fact, IIRC sometimes other word processors add support for new ODF features before it does.) Since ODF is a committee-developed standard not based on what any one word processor does, this really shouldn't be surprising.
I wouldnt agree with your statement.
The point of the article is that MS Office isnt conformant to the STRICT version. This shouldnt come as a surprise, as the change from the original OOXML to the strict version happened, but no new versions of MS Office have been released. The best thing anyone could reasonably expect of a company is that they would update it in the next Office 2007 service pack.
Office comes in a 2-4 year release cycle, and the change in ISO from the transitional version to the strict version happened after Office 2007 SP1 was already done.
How could MS have known in advance the changes that would happen to the standard? They cant see into the future.
Dont forget here that the STRICT version is NOT representative of what any version of office produces. We already knew that.
It was an ISO evolution of the submitted version (the transitional one). The vendor would need some time and a release cycle to adapt their products to it.
What _will_ be interesting is how/when/if MS does conform to the strict format.
On the other hand, the MS Word conformance to the transitional format seems reasonable. TFA only noted one problem, where an attribute value was using on/off rather than true/false. This is minor and easily fixed and/or recorded as a known issue.
Facts? Try this fact: this is not an external standard that Microsoft is supposed to bring their software into line with, this standard was presented by Microsoft as accurately describing what their software actually did. That's the whole reason it was "fast tracked", because it was supposed to be a description of a conforming implementation.
If it's not, then it shouldn't have been "fast tracked", it should have gone through the same process as current HTML standards... you know, the ones Acid3 are testing...
That is, the issue is not whether Office conforms to the standard, but that Microsoft lied about its status.
Comment removed based on user account deletion