Dark Corners of the OpenXML Standard
Standard Disclaimer writes "Most here on Slashdot know that Microsoft released its OpenXML specification to counter ODF and to help preserve its market position, but most people probably aren't aware of all the interesting legacy code the OpenXML specification has brought to light. This article by Rob Weir details many of the crazy legacy features in the dark corners of OpenXML. As it concludes after analyzing specification requirements like suppressTopSpacingWP, 'so not only must an interoperable OOXML implementation first acquire and reverse-engineer a 14-year old version of Microsoft Word, it must also do the same thing with a 16-year old version of WordPerfect.'"
The crazy amount of backwards compatibility is what allowed Microsoft to rise to the position it holds today...
This is why the Microsoft Office XML (let's not kid ourself, this is far from "open") format should not become an ISO standard.
Care about privacy? Read this!
The power of legacy systems is at once both Microsoft's greatest strength and greatest weakness. Nobody in OSS is going to have the patience to rebuild the same level of backwards compatibility needed to displace them but the code must be an absolute tarpit of accumulated cruft and security holes that's incredibly difficult for them to keep going.
ODF is the former SXW format that was taken and transformed into a standard by a committee comprising several Office software makers. It's suppose to describe the normal features that anyone should expect from any Word processing application, be it OpenOffice.org, KWord, AbiWord, Corel Word Perfect, etc. all this in a perfectly neutral way. It was designed with a function in mind (storing word processing documents in an open and interoperable way). Its benefits are comparable to the standardisation of HTML.
OpenXML is Microsoft trying to translate its proprietary DOC file inside a XML container (because it's a big buzzword) and propose it as a standart to ECMA (because everyone is speaking about ODF being an ISO standard). It describes not only what is to be expected from a word processor, but also all MS-Word specific microsoftism. It was designed with a specific software in mind (and partly derives from the internal functionning of MS-Word). It's only a small improvement over the previous MS XML format (which had a lot of informations hidden in a binary blob).
The good thing for Microsoft, is that they can pretend this limitation is "Not-a-bug-but-a-feature", and brag around that there are a lot of stuffs that MS-Word couldn't store inside an ODF and only OpenXML can carry.
Microsoft's plan :
1. Embrace
2. Extend <- They are here
3. Extinguish
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
The crazy amount of backwards compatibility is what allowed Microsoft to rise to the position it holds today...
Or maybe it was their illegal business tactics?
It would be pretty easy for me to run a successful business too if I could break federal law with impunity.
Life is too short to proofread.
Things that are illegal for a monopoly are perfectly legit for a non-monopoly. It's a crazy law, but that's how it works. Microsoft broke no federal laws to *gain* their monopoly.
Don't blame me, I didn't vote for either of them!
"It would be pretty easy for me to run a successful business too if I could break federal law with impunity."
What's insightful about pretending that federal laws are the only ones a company has to deal with? And when you go international...
But they broke plenty of laws to keep their monopoly :) And while their actions during their rise to the top may not have been illegal, they could easily be called 'strong-armed'.
Space for rent, inquire within
This is not a specification; this is a DNA sequence.Outrageously funny and to the point.
No, actually, I think you'd find it takes the skill of many people, good timing, and luck to be successful in business, even if you could break very many laws. Creating and sustaining a business for many years is hard. Not very many businesses make it.
You're missing the point. By defining their "standard" in this manner they can now say "Application X doesn't implement OOXML", naturally by "implement OOXML" they mean "fully implement OOXML" so that if even the most obscure and bizarre tag is not supported that's that. At that point they can either demand that application X not claim on its packaging that it supports their "standard", they might have one of those cute little "OOXML Compatiable" seals and refuse to let anyone who doesn't fully support the "standard" use it, or simply use it as marketing tool.
If you want to think in a more paranoid manner, one could also speculate that MS might cause its future versions of Word to use one or more of the supposedly depreciated tags regularly (or, nastier thought, at random) so that any competing product that attempts to open an OOXML document produced (or even saved) in MS Word will not properly render the document. Joe User will assume that since it renders properly in MS Word, but not in application X, and it is an open standard after all, it just proves that MS Word is better.
MS does what is best for MS, not what is best for its customers and certainly not what is best for documents in general.
"Mission Accomplished" -- George W. Bush May 1, 2003
I understand that these tags will be needed when converting legacy documents, but how many people are going meet all the following conditions to even be effected by this:
If it gets adopted as a standard (ISO or similar, not defacto standard) then everyone. The point is not whether people need the features, the point is that MS is trying to get this accepted as a standard. It still can only be implemented by MS, and therefore should not be accepted as a standard. If a government body had as part of a software procurement requirement "Complies with ISO XXXX (MS OpenXML)" Then by default only MS office could fill that requirement. As opposed to ODF which can be supported by any company that chooses to to so.
MS can have any features it likes in it's file format, that's not the issue. How well it works or if people need it are also not the issues. The issue is that for it to be considered a standard, it should be able to be implemented by anyone and that the format as currently documented is practically impossible to implement for anyone but MS. Therefore it should not be considered a standard by bodies such as ISO.
http://marriedmansexlife.com/
Microsoft broke no laws getting DOS onto every PC. They happened to be in the right place at the right time, and the market fell onto them. But from there, Microsoft bended and broke the law every chance they got to ensure that there never was any competition.
Also don't forget that although MS's purchase of DOS was perfectly legal, it was ethically horrible. They arrived at a handshake agreement to license the code from Seattle Computer Company. While the MS paperwork was being finalized by the lawyers, SCC then made arrangements to finance other business ventures using the MS money. MS then presented them a contract to buy the code rather than license it, and told SCC to take it or leave it. As SCC had already committed to the other deals, they had no choice but to take MS's offer. Sure, no one held a gun to the head of the SCC executives forcing them to take the deal, however, they didn't have any other reasonable alternatives. MS's behavior was legal, but certainly not ethical.
Yeah. Because the person best suited to decide what a company should or should not be allowed to do are the people who own the company. Of course you're going to want to be completely unrestricted to mow down your competitors using whatever advantages you have if you are in a position to do so. What you're missing is that no one should be allowed to use unfair practices to do it. Some people think we should idolize the free market as some sort of religion. We don't like free market economy because it was given to us by the gods. We like it because it tends to result in better products and lower prices. That ceases to be true when you have a monopoly in the mix.
That being said, I'm not really informed about any Microsoft specifics, so I'm not going to argue in favor or against any "federal laws" as it applies to them (or failed to apply to them). However, suggesting that only people who have built a company that holds a monopoly should be able to decide what is fair regulation isn't rational. It may even be that the current federal laws regarding monopolies may be unfair and in need of reform, but the fact remains that the existence of a set of laws to regulate businesses is necessary.
Warning: Opinions known to be heavily biased.
Unfortunately, you are wrong on almost all counts:
The real problem, however, with the Sherman Act is that, in general, it can only be prosecuted by the Federal Trade Commission, and that is under the direct control of the executive power. Ever since the Regan administration, there has been little or no desire on the part of the FTC to persue anti-trust litigation.
* Courts have generally used the rule that anyone with more that 70% market share obviously has monopoly power, and anyone with less than 20% obviously lacks it, but that between 20% and 70% requires and examination of facts and circumstances before declaring that someone has monopoly power.
just a ghost in the machine.
I think they may have implemented it, and then made a spec to take into account their horrible implementation.
Tools that don't care about legacy support are unaffected by this; they can just pick the closest modern option to whatever the legacy flag calls for on input, and not output documents that use them.
And thus tools, legally, are not OOXML, and won't qualify for purchasing by companies that specify OOXML. Which is the entire point.
There's a difference between 'We need to make sure that old documents can be converted correctly.', and 'We will literally convert old documents into a new representation that contains all their weirdness, and we won't explain how to implement said weirdness in the standard.'.
What Microsoft has produced is not even a standard. Standards must specify everything, or reference other standards that specify everything. They can't reference applications.
If Microsoft wants to keep secret how to turn Office 95 documents into OOXML, fine. Producing a standard doesn't mean you have to explain how to convert things into that standard.
It does, however, mean you have to explain exactly what should happen if mwSmallCaps is true, to the pixel. You can't just pawn it off on the unexplained hypothetical behavior of some other application.
If corporations are people, aren't stockholders guilty of slavery?
You seem to be missing the point.
You do not need these features to begin with in a new format that is inherently incompatible with an old format. You don't want to say "now I'm going to do WP style linespacing and my linespacing is 1".
If you want to convert a WP document to an XML document, the conversion program should know that the linespacing in WP is 0.9 times the linspacing in XML document (or what it really may be)and will then use linespacing=0.9 in the XML document. This is not a task of the new wordprocessor or its specification.
By adding this so-called "backward compatibility" to your specification, you make the spec overly difficult and in fact you make the conversion program in the new application when this is absolutely not necessary.
And on top of that, you require that the programmer who uses this spec should have knowledge of all these old versions and is able to program them without error. And as the application will grow because of these unnecessary features, the number of bugs will also rise. So this is not a blueprint for a good application, this is a blueprint for a very buggy implementation of a wordprocessor.
Documents are worth far more than software, and they outlive the applications used to create them. See the comment to the original article - reading documents after 5, 20, 30, 100 years or more is not optional. You can pay the price of developing an independent format now, or you can pay the price of reverse engineering over and over again every time you change your internal representation.
Repeated implementation limits future change and innovation. It's expensive: it likely costs more even for Microsoft. But they can afford it; their competitors may not be able to. Plus, Microsoft already has their first implementation.
Perhaps so. But compare that cost to the cost I've just outlined. It is in the best interest of users and software developers (maybe even of Microsoft) to bite the bullet now, do the conversion once, and develop a clean format for the future.
Maybe you have in mind an argument you're not making, but I don't see any sufficient basis for your broad contention that using a file format based on an internal representation is a "darn good idea". In specific cases, yes (e.g. where the cost of development time or effort are the most important factors). In general, I very much doubt it. That successful applications in the past have taken that approach is weak evidence. They were developed when the up-front cost of development in a time of rapid innovation, the loss of customer lock-in, and a lack of open-format competition where good business reasons for making such a choice - even if it was inferior technically, increased cost in the long term, and was bad for consumers. In today's climate of slower innovation, competition from open formats, and customers who are running into their own long-term interests, the situation is different.
Which is not to say Microsoft's apparent attempt to set the rules of the game and throw sand in the gears of change is not in their interests, or that it will be unsuccessful.
So they did it wrong.
You need to let a conversion program worry about converting Word 2006 documents to XML documents. You need to let the maker of Word 2006 worry about making this conversion program. This can be in the form of a "save as XML" option, but also an external program.
You can not say "oh, this is an old feature, let's put it in the spec and let's let the programmer that uses this spec worry about it because we can't be bothered to convert it or don't know how to convert it".
Sorry, but XML should be clear to everyone and if you include an option, you should document the behaviour of this option
But even so, you do not want the specification of a new document format have all the quirks of all the old formats. That is just silly. That is saying that a car should have a 6V battery system too because old cars have 6V battery systems and you might come across an acessory that uses 6V.
Until you get to that point, I suggest that you those "federal laws" out your ass, Mr. Ashcroft. I agree 100% with you. However, for fairness' sake, we should then abolish all those unjust business-hampering federal laws, including copyright and patent law.
Oh, and also those so-called "computer misuse" laws. Indeed, if I want to set up a consultancy where I propose to convert customers ASP scripts to PHP I should be allowed to demo to my prospective customers in great graphical detail why ASP is so insecure, even if I don't yet have an existing business relationship. Why should I tolerate that the government tells me how I may and may not recruit new customers?
Anything less would be one-sided and unfair.
Where is the problem in doing the conversion (for the legacy features) in the converter, so that the new format is free from this bloat? OK, its harder to write the converter (which has to implement this old behaviors), but its Microsoft who wants to have the backward compatibility. So it only needs to be done once.
My mind must be failing. I seem to recall that a free market is based on fairly competing businesses, hence no monopoly can be tolerated. We allow monopolies to form and exist as long as competitors have a chance to emerge.
Yes, they got into trouble for bundling but it misses the point every time. The secret sauce that Microsoft uses is to strong-arm the OEMs into bundling windows with PCs, espeicially for consumers. I'm also thinking that the Windows Tax is levied even if you buy Linux on a Dell. This is the lynch-pin of Microsoft domination, without it all their other strategies whither on the vine. Without bundling of windows with new pcs, the bundling of IE (and all the other sofware), the resistance against inter-operability, the mysterious file formats etc wither on the vine. I've been disappointed that *none of the investigations I've read about have gone after the OEM-Microsoft link. Break that, and you'll have a free-market again.
I think the Office XML format style is a play straight out of IBM's hand-book: make the standard complex and incomprehensible, and the little players - that's you - will find it hard to compete. In a way, that's a good sign: Microsoft is now lumbering into middle-age, hoist on their own evermore complex petard.
The other thing about middle-age is that every little technological step away from their established base-line is treated as a revolution. In reality, it's no such thing, just a small stepping stone to shouting "pesky kids. Get off my lawn." Or maybe they've reached that stage already.
Patriotism is a virtue of the vicious
You're missing his point: When converting the file to OOXML, one can and should add generic tags indicating the specific (broken) behavior which should emulated (such as "scale small caps by this percentage point") rather than just specifying a generic "Do What I Mean" marker without any useful guidance on how rendering of documents containing this marker should be implemented.
As long as tags indicate for all the relevant changes (like scaling small caps), the document will then look the same even without the DWIM markers.
If it isn't specified, how can the format be a specification?
Things that are illegal for a monopoly are perfectly legit for a non-monopoly. It's a crazy law, but that's how it works.
I think your logic is more than a little broken. Monopolies have a great deal of power that other's don't have. They can undermine capitalism in a market and destroy innovation in entire industries. They can spread causing that damage to other markets. Think of it like this, people piloting airplanes aren't allowed to drink or step outside for a cigar, while those behaviors are perfectly legal for people who aren't piloting planes. Isn't that crazy?
Afterall if Microsoft wrote the spec, and has implemented the spec, then how difficult could it be?
Did you read the article. Some of the spec is things like "do what MS Word 5.1.4 did with line spaces." How exactly is anyone other than MS supposed to implement that? By reverse engineering a whole slew of old products that are not even available on the market anymore?
I once spent 18 months writing a 3000 page spec, and it only took a team of 5 another year to implement it.
That's fine but this spec isn't even a spec in the proper sense. It references specific closed implementations by MS and other vendors. Since those other implementations are not themselves specs, neither is this one.
"I think your logic is more than a little broken. Monopolies have a great deal of power that other's don't have. They can undermine capitalism in a market and destroy innovation in entire industries. They can spread causing that damage to other markets. Think of it like this, people piloting airplanes aren't allowed to drink or step outside for a cigar, while those behaviors are perfectly legal for people who aren't piloting planes. Isn't that crazy?"
A pilot knows that he's drinking at the time that he's doing it, and knows that it's against the law to do so while flying.
But a company doesn't know that it has a monopoly until some judge declares so. So while a company is engaging in normal business activity, some judge years later can rule that the company had a monopoly years ago, and rule that those normal business activies were therefore illegal. So, in order for a company to be sure to not run afoul of antitrust law, the company has to second guess every thing it does on the off-chance that at some point in the future, a judge *might* rule that the company had a monopoly at some point in the past. Well, you cannot run a company that way. It's best to engage in normal business practice, and if some judge rules in the future that it was illegal because he declares that you had a monopoly at the time, then deal with it at that point. And doing that would not be "evil". Second guessing whether you can engage in normal business practice or not in order to avoid what a judge might say in the future is not prudent.
Taking MS, specifically, at what point, what day and date, did they knowingly acheive monopoly status in the "desktop OSes for intel CPU" market? IBM was selling and heavily advertising OS/2 throughout the 90's. So when should MS have thought to itself, "OK, now I have a monopoly, so I'll no longer offer OEM discounts"? Even when OS/2 faltered, MS subsidized Apple, and many said that part of the motivation was to ensure that MS did NOT have a monopoly (everyone (certainly Mac advocates) assumed that Mac OS and Windows were competitors; MS didn't imagine that a judge would rule that Mac OS isn't even in the same market). So it would seem that MS never thought they had a monopoly, and even took steps to keep it that way.
Take Apple or Google, for other examples. Is it really so unimaginable that a judge could rule in the future that Apple or Google have monopolies *today* in mp3 players or online music (in the case of Apple) or web search advertising (in the case of Google)? In which case the same judge could rule that things Apple and Google are doing today are illegal? In such a case, would you demonize Apple or Google as "evil"? Should Apple and Google curtail their normal business activity because a judge in the future *might* rule this way? Do you see what I'm getting at?
BTW, this is why antitrust law is so screwed up. IMO, you should be able to engage in normal business activity until a judge officially rules you have a monopoly. Once that happens, then you can alter your business activities accordingly. But you should not be punished for things you did before you were officially declared to enjoy monopoly status in a particular market, nor should you be demonized for it. This is much cleaner since everyone would know upfront what standard they're being judged against. No second guessing what would be normal business practice, no subsidizing competitors to make sure they stay in business so that you don't get a monopoly, etc.
-- "I never gave these stories much credence." - HAL 9000
I read the back cover. Looked derivative. Put it back.
Aide-toi, le Ciel t'aidera - Jeanne D'Arc.