MS Office XML Format Now In TextEdit
computerdude33 writes "Apparently, Apple heard of Microsoft Office changing to XML formats. If you have OS X 10.4.2, you can save documents in TextEdit in Word XML Format. They are saved with a *.xml extension, and are riddled with references to Word. Here is an example of one of these documents."
Now you just have to find a Microsoft product to read the future Microsoft Word XML file!
So a simple two word text file has the following 33 XML tags pasted here with the greater and less than signs removed...
/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve"o:DocumentProperties/o:Documen tPropertiesw:fontsw:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"//w:fontsw:docPr/w:docPrw:bodywx:sectw:pw:pP r/w:pPrw:rw:rPrw:rFonts w:ascii="Helvetica" w:h-ansi="Helvetica" w:cs="Helvetica"/wx:font wx:val="Helvetica"/w:sz w:val="24"/w:sz-cs w:val="24"//w:rPrw:tHot time!/w:t/w:r/w:pw:sectPrw:pgSz w:w="12240" w:h="15840"/w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"//w:sectPr/wx:sect/w:body/w:wordDocum ent
?xml version="1.0" encoding="UTF-8" standalone="yes"?
?mso-application progid="Word.Document"?
w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word
http://tinyurl.com/4ny52
OpenDocument from OASIS
Don't forget that in the days before IE, Netscape was the market leader and they defined the standard. Nobody cared about that then.
-- Cheers!
I understood that the new office XML formats had an extension the same as the original with an x at the end, as in .docx.
Possibly this was a wrapper for the format to encapsulate images etc? Can anyone who has actually looked at this clarify?
Thanks,
Stuart
It's all fun and games until a 200' robot dinosaur shows up and trashes Neo-Tokyo... Again
I don't really see the problem with "bloated" xml, when the files are zipped by default. Instead of smushing your efficiency requirements in with your readability and standardization requirements (and screwing all three), you first handle readability and standardization and then rap it in a standard efficiency layer. The upshot is, not only are the files often *smaller* than the old Word equivalent, but I can also hack through them using a couple of standard perl packages that have come with linux, OS X and cygwin for years.
Where's the downside?
I don't see where XML files are bigger than RTF. I just performed a test, and the RTF file was 3 times as large as the XML file.
Ironically, the word ironically is often used incorrectly.
Nor was it true that "nobody cared". Lots of people bitched about it.
OpenOffice 2.0 Beta already supports WordML.3 3450
http://www.openoffice.org/issues/show_bug.cgi?id=
"Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
Yes, w: at the start of the XML tag indicates that the tag is part of a namespace, which would be defined somewhere in the file by adding an xmlns attribute to a tag. In this case, it's in the w:wordDocument tag, and in fact several namespaces are defined:
/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve">
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word
End of Line.
For some reason Firefox is hiding the standard xml header and the xmlns declaration. Just save the file to disk and open it in your favorite text editor, and you'll see it's there.
End of Line.
Which is why even the dedicated MS-haters blanched at having to use NN4. It was bloated, buggy, crappy.
MS didn't achieve browser dominance just through (mis)use of their monopoly. Netscape helped them by releasing NN4.
When they came for the communists, I said "He's next door. Take him away. Goddam commies."
One thing to note is that the Microsoft XML formats and schemas, either those exported by TextEdit or by the .docx format, are not necessarily done by Microsoft by choice. They're not even in response to OpenOffice.org. In my opinion, they are the result of "government forced technology", similar to how the California clean air regulations back in the 70s started to force Detroit to pour more money into catalytic converters and environmentally friendly cars.
There have been numerous government proposals and mandates that require open document formats. Some of the Massachusetts proposals come to mind. I believe the EU also has proposals on the table that require the use of open document formats. The trick with the EU proposal is that it actually mentioned XML (I believe it's the ISIS proposal, but may have the wrong acronym). Governments are large Microsoft customers and Microsoft doesn't want to lose their business. Including the ability to save in publicly documented XML formats gives them a loophole to continue selling to governments, even if all of the open document format requirements are adopted.
The ability of OpenOffice.org (and NeoOffice/J) to support these formats really is dependent on two things. First, the schemas are licensed from Microsoft on non-OSS compatible terms. Each individual person or application has to enter into a licensing agreement with Microsoft individually. This is directly against the terms of either BSD style or GPL style licensing. Secondly, Microsoft may have software patents involved with their schemas according to their licensing terms. While the patentability of a schema itself is questionable, they seem to have several patents revolving around the interpretation of XML schemas that may apply to their Office schemas. This goes against the CDDL style licensing Sun is now fond of.
Because of these terms, the only ways that OOo/NeoOffice could legally support them would be if either the schemas are clean room reverse engineered from example documents or if Microsoft turns a blind eye to open source folk using their schemas. Since I wouldn't want to rely on Microsoft's generosity, the clean room solution is the only way I can see. Sun won't be the one to clean room them either; they don't have to. StarOffice (and Sun built OpenOffice.org for Linux/Solaris/Win) would be covered under Sun's cross-licensing arrangements with Microsoft as a result of their settlement. Those licenses don't extend to non-Sun OOo developers like me, however, so we're all up shit creek.
Just because you can read it and the format is "open" doesn't mean it's "free". You can be sure that Microsoft's lobbyists will make sure that all of those government directives still refer to "open" and no "free" gets snuck in there by mistake.
ed
I guess you're too young to remember bitching about the "BLINK" tag?
Ah, Pages. The program has some neat features, but has all of the hallmarks of being rushed out of the door for the 1.0 release. It's a nifty program for making flyers, and maybe short newsletters, but it's pretty much a loss to do any serious word processing in the thing, as it currently stands. In a way, it doesn't surprise me to hear that TextEdit is leading the way on the XML front, despite the fact that Pages has an XML native format...
Babar