Slashdot Mirror


MS Office XML Format Now In TextEdit

computerdude33 writes "Apparently, Apple heard of Microsoft Office changing to XML formats. If you have OS X 10.4.2, you can save documents in TextEdit in Word XML Format. They are saved with a *.xml extension, and are riddled with references to Word. Here is an example of one of these documents."

24 of 86 comments (clear)

  1. Beating MS... at their own game. by jpsowin · · Score: 3, Funny

    Now you just have to find a Microsoft product to read the future Microsoft Word XML file!

    1. Re:Beating MS... at their own game. by sycotic · · Score: 2, Informative

      I get the same thing in Microsoft Office Word 2003 :\

      --
      -- If I were a fish, I'd be wet
  2. in case you're curious... by ubiquitin · · Score: 3, Interesting

    So a simple two word text file has the following 33 XML tags pasted here with the greater and less than signs removed...


    ?xml version="1.0" encoding="UTF-8" standalone="yes"?
    ?mso-application progid="Word.Document"?
    w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve"o:DocumentProperties/o:Documen tPropertiesw:fontsw:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"//w:fontsw:docPr/w:docPrw:bodywx:sectw:pw:pP r/w:pPrw:rw:rPrw:rFonts w:ascii="Helvetica" w:h-ansi="Helvetica" w:cs="Helvetica"/wx:font wx:val="Helvetica"/w:sz w:val="24"/w:sz-cs w:val="24"//w:rPrw:tHot time!/w:t/w:r/w:pw:sectPrw:pgSz w:w="12240" w:h="15840"/w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"//w:sectPr/wx:sect/w:body/w:wordDocum ent

    --
    http://tinyurl.com/4ny52
    1. Re:in case you're curious... by That's+Unpossible! · · Score: 4, Insightful

      So a simple two word text file has the following 33 XML tags pasted here with the greater and less than signs removed...

      What is your point? Oh lord, this file is 1200 bytes long, for "just two words of text."

      I created the same two-word document and saved it in several text-based formats that preserve the formatting. HTML (2700 bytes), RTF (3600 bytes), PDF (16,600 bytes), and of course, Word .doc format (20,000 bytes).

      The XML version is smaller than all three, and I dare-say, easier to parse and manipulate with a 3rd party program.

      Yeah, if you don't want any formatting information stored with your text, use plain text. But otherwise, XML seems to be as good a format as any of the other markup doc formats commonly used in Office.

      --
      Ironically, the word ironically is often used incorrectly.
    2. Re:in case you're curious... by ubiquitin · · Score: 2, Interesting


      Well, sir, you made the point nicely. Although the HTML file that I came up with in vi came in at around 48 bytes. The 33 tags that TextEditor produces for doc-like-XML is actually a pretty compact way of describing a document along with formatting.

      Here's my $.02 on the bigger picture here: instead of fighting about document formats with Microsoft, we will now be fighting over XML data structures. Same old bully, just a different playground.

      --
      http://tinyurl.com/4ny52
    3. Re:in case you're curious... by Tim+Browse · · Score: 3, Interesting

      XML files can be a little ungainly if you want to partially update them, or just append data. Binary files can be better for this (note: 'can').

      As is evidenced by the lovely pause that happens whenever I close an MSN Messenger window of someone I chat to often, and it appends the chat history to the 1.5Mb XML file, by reading/writing the whole XML file again....wugga wugga wugga.

      (Either that, or their append code sucks!)

      But other than that, yes. The size argument doesn't stand up - a counter-intuitive result, but seems to be true. Especially when you start zipping XML files.

    4. Re:in case you're curious... by Trillan · · Score: 2, Insightful

      I thought he was demonstrating different exports from Word. Word 2004 (Mac) makes it 2,167 bytes. Granted, that's horrible HTML...

    5. Re:in case you're curious... by NutscrapeSucks · · Score: 3, Insightful

      Granted, that's horrible HTML...

      It's also a fair example, because Word-HTML can "round-trip" back to Word with no loss in fidelity. A barebones HTML file can not.

      --
      Whenever I hear the word 'Innovation', I reach for my pistol.
    6. Re:in case you're curious... by commodoresloat · · Score: 5, Funny

      <x-html><!x-stuff-for-pete base="" src="" id="0" charset=""><DIV></DIV> <DIV></DIV> <o:DocumentProperties/> <w:fonts> <w:defaultFonts w:ascii="Times New Roman" w:fareast="Times New Roman" w:h-ansi="Times New Roman" w:cs="Times New Roman"/> </w:fonts> <w:docPr/> <w:body> <wx:sect> <w:p> <w:pPr/> <w:r> <w:rPr> <w:rFonts w:ascii="Helvetica" w:h-ansi="Helvetica" w:cs="Helvetica"/> <wx:font wx:val="Helvetica"/> <w:sz w:val="24"/> <w:sz-cs w:val="24"/> </w:rPr> <w:t>I agree.</w:t> </w:r> </w:p> <w:sectPr> <w:pgSz w:w="12240" w:h="15840"/> <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/> </w:sectPr> </wx:sect> </w:body>

    7. Re:in case you're curious... by Trillan · · Score: 2, Informative

      Well, that's often the case, but I'm betting you could encapsulate two words in a way that could be transported back to Word (with formatting intact) a lot more efficiently.

      A lot of the bulk seems to be Word saving unused style sheets, which arguably doesn't need to be done to keep the document true.

    8. Re:in case you're curious... by rohanl · · Score: 2, Informative

      The PDF format is a particularly good example of this. The file contains a set of atoms, and finally at the end of the file is an index that selects which atoms to include and in what order.

      Multiple indexes can be included, and the last one found is used.

      This means that you can actually save, and update a PDF file, by just appending to the end. You can even save the file on a WORM device that allows multiple sessions.

      Doing this also maintains a full file history too. You can retrieve any version of the file by selecting one of the many indexes.

      Of course, whether any programs do this is another matter...

  3. Re:Who is maintaining the "standard"? by mroch · · Score: 3, Informative

    OpenDocument from OASIS

  4. Re:Who is maintaining the "standard"? by tsa · · Score: 2, Informative

    Don't forget that in the days before IE, Netscape was the market leader and they defined the standard. Nobody cared about that then.

    --

    -- Cheers!

  5. .xml? by Stuart+Gibson · · Score: 2, Interesting

    I understood that the new office XML formats had an extension the same as the original with an x at the end, as in .docx.

    Possibly this was a wrapper for the format to encapsulate images etc? Can anyone who has actually looked at this clarify?

    Thanks,
    Stuart

    --
    It's all fun and games until a 200' robot dinosaur shows up and trashes Neo-Tokyo... Again
  6. Re:Ugly format.. by Heisenbug · · Score: 4, Insightful

    I don't really see the problem with "bloated" xml, when the files are zipped by default. Instead of smushing your efficiency requirements in with your readability and standardization requirements (and screwing all three), you first handle readability and standardization and then rap it in a standard efficiency layer. The upshot is, not only are the files often *smaller* than the old Word equivalent, but I can also hack through them using a couple of standard perl packages that have come with linux, OS X and cygwin for years.

    Where's the downside?

  7. Re:terrible moderation by That's+Unpossible! · · Score: 2, Interesting

    I don't see where XML files are bigger than RTF. I just performed a test, and the RTF file was 3 times as large as the XML file.

    --
    Ironically, the word ironically is often used incorrectly.
  8. Re:Who is maintaining the "standard"? by fm6 · · Score: 2, Insightful
    Netscape never "defined the standard". There have always been W3C specs for HTML. The problem was that in the middle 90s, W3C was taking forever to define specs for more than the most trivial web pages, and Netscape wasn't willing to wait on them.

    Nor was it true that "nobody cared". Lots of people bitched about it.

  9. Re:OO.Org by EddWo · · Score: 3, Informative

    OpenOffice 2.0 Beta already supports WordML.
    http://www.openoffice.org/issues/show_bug.cgi?id=3 3450

    --
    "Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
  10. Re:Holy Riddler, Batman! by hawaiian717 · · Score: 2, Informative

    Yes, w: at the start of the XML tag indicates that the tag is part of a namespace, which would be defined somewhere in the file by adding an xmlns attribute to a tag.  In this case, it's in the w:wordDocument tag, and in fact several namespaces are defined:

    <w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/ 2003/2/wordml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:SL="http://schemas.microsoft.com/schemaLibra ry/2003/2/core" xmlns:aml="http://schemas.microsoft.com/aml/2001/c ore" xmlns:wx="http://schemas.microsoft.com/office/word /2003/2/auxHint" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C1488 2" xmlns:st1="urn:schemas-microsoft-com:office:smartt ags" xml:space="preserve">

    --
    End of Line.
  11. Re:that's gotta be the worst XML ever by hawaiian717 · · Score: 2, Informative

    For some reason Firefox is hiding the standard xml header and the xmlns declaration. Just save the file to disk and open it in your favorite text editor, and you'll see it's there.

    --
    End of Line.
  12. Re:Who is maintaining the "standard"? by martinX · · Score: 2, Insightful

    Which is why even the dedicated MS-haters blanched at having to use NN4. It was bloated, buggy, crappy.

    MS didn't achieve browser dominance just through (mis)use of their monopoly. Netscape helped them by releasing NN4.

    --
    When they came for the communists, I said "He's next door. Take him away. Goddam commies."
  13. Word XML not necessarily a voluntary move... by soullessbastard · · Score: 4, Informative
    Disclaimer: I am a Mac OS X OpenOffice.org developer and a NeoOffice project founder

    One thing to note is that the Microsoft XML formats and schemas, either those exported by TextEdit or by the .docx format, are not necessarily done by Microsoft by choice. They're not even in response to OpenOffice.org. In my opinion, they are the result of "government forced technology", similar to how the California clean air regulations back in the 70s started to force Detroit to pour more money into catalytic converters and environmentally friendly cars.

    There have been numerous government proposals and mandates that require open document formats. Some of the Massachusetts proposals come to mind. I believe the EU also has proposals on the table that require the use of open document formats. The trick with the EU proposal is that it actually mentioned XML (I believe it's the ISIS proposal, but may have the wrong acronym). Governments are large Microsoft customers and Microsoft doesn't want to lose their business. Including the ability to save in publicly documented XML formats gives them a loophole to continue selling to governments, even if all of the open document format requirements are adopted.

    The ability of OpenOffice.org (and NeoOffice/J) to support these formats really is dependent on two things. First, the schemas are licensed from Microsoft on non-OSS compatible terms. Each individual person or application has to enter into a licensing agreement with Microsoft individually. This is directly against the terms of either BSD style or GPL style licensing. Secondly, Microsoft may have software patents involved with their schemas according to their licensing terms. While the patentability of a schema itself is questionable, they seem to have several patents revolving around the interpretation of XML schemas that may apply to their Office schemas. This goes against the CDDL style licensing Sun is now fond of.

    Because of these terms, the only ways that OOo/NeoOffice could legally support them would be if either the schemas are clean room reverse engineered from example documents or if Microsoft turns a blind eye to open source folk using their schemas. Since I wouldn't want to rely on Microsoft's generosity, the clean room solution is the only way I can see. Sun won't be the one to clean room them either; they don't have to. StarOffice (and Sun built OpenOffice.org for Linux/Solaris/Win) would be covered under Sun's cross-licensing arrangements with Microsoft as a result of their settlement. Those licenses don't extend to non-Sun OOo developers like me, however, so we're all up shit creek.

    Just because you can read it and the format is "open" doesn't mean it's "free". You can be sure that Microsoft's lobbyists will make sure that all of those government directives still refer to "open" and no "free" gets snuck in there by mistake.

    ed

  14. "Nobody cared about that then"? by commodoresloat · · Score: 2, Funny

    I guess you're too young to remember bitching about the "BLINK" tag?

  15. Re:Interesting... by King+Babar · · Score: 2, Insightful
    An interesting thing is that trying to open one of those files in Pages results in a dialog that says "This XML files was created with an unsupported beta version of Word" and it doesn't open it. I'm not drawing any conclusions, I just think it's interesting.

    Ah, Pages. The program has some neat features, but has all of the hallmarks of being rushed out of the door for the 1.0 release. It's a nifty program for making flyers, and maybe short newsletters, but it's pretty much a loss to do any serious word processing in the thing, as it currently stands. In a way, it doesn't surprise me to hear that TextEdit is leading the way on the XML front, despite the fact that Pages has an XML native format...

    --

    Babar