MS Office XML Format Now In TextEdit
computerdude33 writes "Apparently, Apple heard of Microsoft Office changing to XML formats. If you have OS X 10.4.2, you can save documents in TextEdit in Word XML Format. They are saved with a *.xml extension, and are riddled with references to Word. Here is an example of one of these documents."
Now you just have to find a Microsoft product to read the future Microsoft Word XML file!
I wonder if OpenOffice.org or KOffice will start supporting this format any time soon..
So a simple two word text file has the following 33 XML tags pasted here with the greater and less than signs removed...
/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve"o:DocumentProperties/o:Documen tPropertiesw:fontsw:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"//w:fontsw:docPr/w:docPrw:bodywx:sectw:pw:pP r/w:pPrw:rw:rPrw:rFonts w:ascii="Helvetica" w:h-ansi="Helvetica" w:cs="Helvetica"/wx:font wx:val="Helvetica"/w:sz w:val="24"/w:sz-cs w:val="24"//w:rPrw:tHot time!/w:t/w:r/w:pw:sectPrw:pgSz w:w="12240" w:h="15840"/w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"//w:sectPr/wx:sect/w:body/w:wordDocum ent
?xml version="1.0" encoding="UTF-8" standalone="yes"?
?mso-application progid="Word.Document"?
w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word
http://tinyurl.com/4ny52
It really concerns me that MS is able to create a "standard" due to their market share. What ensures they continue to maintain or even _use_ their own standard?
I think of the browser wars. MS loves it that everyone but them are W3C compliant because that ensures they can break all other browsers simply by being incompatible with one standard. Because of their market share, developers will just 'give up' and code CSS, Javascript, and the like as IE compatible. Out of frustration with incompatible websites, users won't use FireFox et al and MS maintains control.
So I welcome the compatibility but I'd like to see an independant standards body regulat the XML DTD.
I only came here to do two things; kick some ass, and drink some beer...looks like we're almost out of beer.
XHTML has its place: web.
If you were looking for something witty (and Slashdot-approven) to say, you meant Oasis.
I understood that the new office XML formats had an extension the same as the original with an x at the end, as in .docx.
Possibly this was a wrapper for the format to encapsulate images etc? Can anyone who has actually looked at this clarify?
Thanks,
Stuart
It's all fun and games until a 200' robot dinosaur shows up and trashes Neo-Tokyo... Again
I don't really see the problem with "bloated" xml, when the files are zipped by default. Instead of smushing your efficiency requirements in with your readability and standardization requirements (and screwing all three), you first handle readability and standardization and then rap it in a standard efficiency layer. The upshot is, not only are the files often *smaller* than the old Word equivalent, but I can also hack through them using a couple of standard perl packages that have come with linux, OS X and cygwin for years.
Where's the downside?
I don't see where XML files are bigger than RTF. I just performed a test, and the RTF file was 3 times as large as the XML file.
Ironically, the word ironically is often used incorrectly.
It's only "ugly" if you are not used to XML. It's certainly not "bloated" at all.
"Verbose" perhaps... but verbosity is kind of the whole point of XML in the first place.
I hate MS as much as the next guy, but I'm thrilled with the fact that they are finally creeping towards some open document standards.
When you consider that their main profit strategy for the last 5-10 years has been "force pointless upgrade sales by screwing with the document format and breaking compatability with everybody, including our old customers," I think the fact that they are suddenly playing nice like this (even though it may open opportunities for other people to chop them off at the knees) shows that Balmer & Co. seriously believe that their future does not lie in merely maintaining the MS-Office monopoly. Maybe Cringely's right, and the boys from Redmond are betting the company on the X-Box evolving into a ubiquitous media console.
Then again, maybe they are so cock-sure that they have the best & brightest programmers in the world, that they think they will be able to open the format and still maintain their lead on quality alone. I find it hard to believe that they are that delusional, but you never know.
Information wants to be anthropomorphized.
"Riddled with references to Word"? Whatever you mean, I don't see it. There's a reference to the Word XML namespace. But all XML applications have to define a namespace.
Text edit can load and save XHTML (1.0 Strict or 1.0 Transitional) with Embedded CSS, Inline CSS or no CSS.
This is just an additional format.
where's the standard xml header? hell, yeah, it was extended by a proprietary mso-application header. just M$ embrace-and-extend style all over...
any sign of a xmlns attribute anywhere? nope, and yet, they use the ns:tagName notation...
stupid.
has M$ at least released the XML Schemas for the formats? If not, forget it: it's just as illegible as binary...
and let's not forget it'll only display correctly inside MSWord itself...
I don't feel like it...
<?mso-application progid="Word.Document"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word
Win a signed Stephen Carpenter ESP Guitar from the Deftones: http://def-tag.com/?r=0008781
Mine opens just fine.. FF 1.0.4, OSX 10.2.8, it shows a warning: This XML file does not appear to have any style information associated with it. The document tree is shown below.
(followed by the source code)
I can't get Firefox to open the file in TextEdit.
Save to disk.
Open in Finder.
Open in TextEdit from Firefox? Please tell me that isn't possible.
I think Microsoft is noticing that the open source movement is not a fade. They were big proponents of XML based web services and now with them supporting XML document formats for Office they are finally giving users a choice. So, in the future instead of saying .NET/J2EE or Office/OpenOffice people can say both. They might not gain market share with this strategy but they might loose less in the long run.
I believe that it would be more difficult to have a valid XHTML document that is as flexible as a valid XML document. The nature of self-describing data is that at any point you can add tags that bring new functionality while still maintaining a valid document, whereas you have to get a new XHTML tag ratified by the W3C.
What's more, it is a logical step to use XML, as it is the little brother of the SGML system that dominated documentation for larger companies that could afford development of a SGML system. SGML and XML even have roots implanted in products such as Adobe Designer, Adobe Illustrator, and a myriad of other vector drawing programs via SVG or PDF in the case of Adobe Designer.
What's more, Apple use XML (though has defected a bit in the latest release for some of its property lists) as the cornerstone of its customization, so supported it in their little APSL gem that is TextEdit is logical. Apple's Pages and Keynote also XML as the holder of their data.
I don't know if it would be more efficient, but it would probably be invalid or not have as good a look, unless you have a little Eric Meyer in your machine making CSS that can accomplish what you can with a commericial XML product.
One thing to note is that the Microsoft XML formats and schemas, either those exported by TextEdit or by the .docx format, are not necessarily done by Microsoft by choice. They're not even in response to OpenOffice.org. In my opinion, they are the result of "government forced technology", similar to how the California clean air regulations back in the 70s started to force Detroit to pour more money into catalytic converters and environmentally friendly cars.
There have been numerous government proposals and mandates that require open document formats. Some of the Massachusetts proposals come to mind. I believe the EU also has proposals on the table that require the use of open document formats. The trick with the EU proposal is that it actually mentioned XML (I believe it's the ISIS proposal, but may have the wrong acronym). Governments are large Microsoft customers and Microsoft doesn't want to lose their business. Including the ability to save in publicly documented XML formats gives them a loophole to continue selling to governments, even if all of the open document format requirements are adopted.
The ability of OpenOffice.org (and NeoOffice/J) to support these formats really is dependent on two things. First, the schemas are licensed from Microsoft on non-OSS compatible terms. Each individual person or application has to enter into a licensing agreement with Microsoft individually. This is directly against the terms of either BSD style or GPL style licensing. Secondly, Microsoft may have software patents involved with their schemas according to their licensing terms. While the patentability of a schema itself is questionable, they seem to have several patents revolving around the interpretation of XML schemas that may apply to their Office schemas. This goes against the CDDL style licensing Sun is now fond of.
Because of these terms, the only ways that OOo/NeoOffice could legally support them would be if either the schemas are clean room reverse engineered from example documents or if Microsoft turns a blind eye to open source folk using their schemas. Since I wouldn't want to rely on Microsoft's generosity, the clean room solution is the only way I can see. Sun won't be the one to clean room them either; they don't have to. StarOffice (and Sun built OpenOffice.org for Linux/Solaris/Win) would be covered under Sun's cross-licensing arrangements with Microsoft as a result of their settlement. Those licenses don't extend to non-Sun OOo developers like me, however, so we're all up shit creek.
Just because you can read it and the format is "open" doesn't mean it's "free". You can be sure that Microsoft's lobbyists will make sure that all of those government directives still refer to "open" and no "free" gets snuck in there by mistake.
ed
An interesting thing is that trying to open one of those files in Pages results in a dialog that says "This XML files was created with an unsupported beta version of Word" and it doesn't open it. I'm not drawing any conclusions, I just think it's interesting.
Elmo knows where you live!
i guess you've never seen what a regular word file generates. you should be thankful!
I guess you're too young to remember bitching about the "BLINK" tag?
The problem is that they have a patent on it. It you create software and sell it without a license they can sue you. With or without makeing the software gpl
What's wrong with that? It's a document, it shouldn't be able to pose any kind of security risk at all.
It violates compartmentalization. It should not be possible for anything in a web page to cause a document to be passed off to any application that has not specifically registered itself as a handler for web content.
In the case of Firefox on the Mac, that means it shouldn't trust LaunchServices, because LaunchServices includes any application that wants to handle local content. It should only trust "Library/Internet Plugins", and then when necessary (such as for itms:) add specific cases of LaunchServices entries that are known to be intended for web content. So if YOU want to allow TextEdit to be usable as a handler, YOU should be able to _explicitly_ add it to the list firefox maintains.
If this isn't done, then they haven't learned the lesson of the help: hole and the x-man-page: hole. There will be more holes like that in the future, and a web browser that's supposed to be "secure" should simply close of that whole avenue of attack.
That's what I thought he was talking about; setting up FireFox to open .xml documents in TextEdit.
Of course it doesn't. It edits property lists, which are occasionally stored as XML. Its not an arbitrary XML editor.
Why not fork?
I just tried to load that page in Safari 1.2.4 under Mac OS X 10.3.8 and it displayed just the content of the file (no XML code), so I suppose that:
1. This is a part of NSTextEdit class (or whatever its name is) and is not specific to TextEdit.app
2. It's been around a bit longer, at least since 10.3.8, it just wasn't exposed in TextEdit.app
The good thing is that all the Cocoa apps that use this class will also get the ability to handle Word XML docs - for free.