Slashdot Mirror


Is the New Microsoft Office Really Open?

joesklein asks: "From CNET, there is an article about the new Microsoft Office 11. In summary 'Microsoft says it's opening its Office desktop software by adding support for XML--a move that should help companies free up access to shared information. But there's a catch: It has yet to disclose the underlying XML dialect.' Could this be grounds for another anti-trust suit against Microsoft?"

20 of 485 comments (clear)

  1. Here, I'll answer this simply. by Anonymous Coward · · Score: 0, Informative

    But there's a catch: It has yet to disclose the underlying XML dialect.' Could this be grounds for another anti-trust suit against Microsoft?"

    No.

  2. InfoWorld articles by andynms · · Score: 5, Informative

    There are a couple of good articles on this at InfoWorld. Try here and here.
    Good quote:
    THE GOOD NEWS is that Office 11 supports XML Schema. The bad news is that XML Schema has been described even by XML experts as "confusing," "impenetrable," "fuzzy," and "as user-friendly as a stick in the eye."

    1. Re:InfoWorld articles by frisket · · Score: 5, Informative
      I was at the launch presentation of Office-11 by Jean Paoli at XML 2003 in Baltimore MD last week, and I'm also a late sign to MS's extended beta list for the product (now closed).

      To clear up some points people have commented on (based on a very preliminary inspection plus a lot of discussion at the conference):

      1. The default save format is still .doc (ie you have to go the extra click to save in XML format)
      2. If you pick to click it, the default XML format is MS's own office-document vocabulary, which retains all the formatting, held in attributes. Hairy but processable, and they will be shipping their schema for it so people can reprocess it externally. But this format will (of course) only represent the appearance, not any structure.
      3. It will also let you specify your own schema (or an industry standard one) and let you supply a binding of named styles to your element types, so you can edit using what look like styles but actually get represented in the saved file as XML markup. There is some debate as to whether this constitutes "being an XML editor" or just "being a wordprocessor that saves data in XML" (my money is on the latter).
      4. It will not support DTDs, so you're stuck with W3C Schemas whether you like them or not*
      5. The discussion over a [more?] suitable schema/DTD for handling office documents (wordprocessing, spreadsheet, presentation) continues at the OASIS TC on Open Office XML Formats **
      With Office-11, Microsoft has nearly caught up with Corel's WordPerfect, (which has had a fully-fledged SGML and XML editor built-in for years) and XMetaL (which Corel took over from SoftQuad earlier this year). MS still has a long way to go to match industrial-strength applications like ArborText's EPIC or even Emacs with psgml-mode et al , but Office-11 will be a solution for the masses who believe the Word interface to be more desirable, or the Microsoft licensing régime to be more attractive, or the software to be more stable.

      * [Bias note] I think W3C schemas were a big mistake; provision for data content typing and validation, namespaces, and extended grouping could have been achieved by extending DTD syntax; and wimpy programmers who moan about having two syntaxes to handle should get a life - it's not a big deal, the code is free and has been in use for 15 years :-)

      ** Sun has donated the OpenOffice (aka StarOffice) XML file formats to the public domain. It's worth remembering that {Star|Open}Office has been saving in XML as its native format for some time now, and has a lot more experience at this than MS.

  3. Open? by Grip3n · · Score: 4, Informative

    I'd say the title of this article (Is the New Microsoft Office Really Open?) is extrmely misleading. Microsoft isn't even trying to be open, they're just adding support for another opensource language. A true open program would have its source code available. What this article is about has nothing to do with that. Microsoft Office is closed. Period.

    --
    To make a pun demonstrates the highest understanding of a language
  4. Re:That's still to be seen... by JebusIsLord · · Score: 3, Informative

    No because the dtd and/or namespace will have to be referenced in plain text in the xml document. so, even if they use absurdly complex element names, they have to use a valid dtd or namespace uri which can be easily referenced, or it just ain't xml at all. Also you aren't allowed to put binary data in an xml document, but even if they did reference their dtd by memory address for instance, its an easy task to just read that address. In conclusion they would have to break xml pretty hard-core in order to make their doc types proprietary. Besides, then what would be the point of going xml in the first place?

    --
    Jeremy
  5. XML-Dev thread on WordML by watchful.babbler · · Score: 4, Informative
    There was a fairly recent thread on this issue over at the XML-Dev list (see here). The upshot, according to W3C XMLWG member (and occasional Microsoft foe) Tim Bray, is that Word is capable of saving documents in a WordML format that is parsable even without a DTD:
    I didn't see anything that I couldn't pick apart straightforwardly with Perl, and if someone asked me to write a script to pull all the paragraphs out of a Word doc that contain the word "foo" in bold, well you could do that. Which seems pretty important to me.
    So, from a technical perspective, there isn't much to worry about right now. From a legal perspective, no, there's no grounds for another antitrust suit, any more than there's grounds for suing Quark for not disclosing their file format.
    --
    "Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
  6. Re:That's still to be seen... by EnVisiCrypt · · Score: 3, Informative

    The hell you can't put binary data in an XML document. As long as it's base64 encoded you can put anything in there.

    --


    *everything* is Orwellian to cats.
  7. Yes it could be grounds. by GOD_ALMIGHTY · · Score: 4, Informative

    This is a monopoly. They have been found in violation of Anti-Trust laws and held up on appeal. The government has a legitimate reason to tell them how to conduct their business and every right to do so.

    Simply because the Anti-Trust trial focused on the OS rather than Office software, does not mean that the government has no reason to impose restrictions to keep MS from shifting their monopoly power. MS's monopoly has been under government scrutiny for almost 10 years, but we still get a bunch of posts on here about how the government shouldn't be able to tell 'a company' what to do. Either the trolls are really busy or you guys decided to skip Economics 101 for Libratarian Fanaticism 101.

    In order to maintain a capitalist system, we must have competition. Without healthy competition, we don't have capitalism. The government has an obligation to step into an otherwise free market to ensure that competition stays healthy. There is no magical 'Free Market Fairy' that is going to come along and restore health to the industry.

    So yes, depending on the result of the States' AG cases and the DOJ's settlement, MS could very much be liable for making their document formats some sort of completely bastardized XML. If you want to know the probability, then you should go read the settlements, and the grievences in the new filings against MS.

    --
    Arrogance is Confidence which lacks integrity. -- me
  8. Even grep replacing doesn't help by burgburgburg · · Score: 5, Informative
    Word HTML output was always atrocious. It failed everywhere from correct tag order (as is shown above), not properly quote parameters (sometimes it uses ", sometimes it uses ', sometimes nothing). Multiple tags, all with different styles one after another (actual example below)
    <b style='mso-bidi-font-weight:normal'><i style='mso-bidi-font-style:normal'><span
    style='f ont-size:12.0pt;mso-bidi-font-size:10.0pt;font-fam ily:Arial;mso-fareast-font-family:
    "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
    mso-ansi-language:EN-US;mso-f areast-language:EN-US;mso-bidi-language:AR-SA'><br
    clear=all style='page-break-before:right;mso-break-type:sect ion-break'>
    </span></i></b>

    Even with grep replace tools, cleaning up this crap takes hours.

    1. Re:Even grep replacing doesn't help by sgarrity · · Score: 2, Informative

      I use this Word HTML cleaner web service. Works well. Drop a penny in the paypal bucket if you like it.

    2. Re:Even grep replacing doesn't help by kazad · · Score: 2, Informative

      Dreamweaver has a "clean up word html" option. But then again, another proprietary solution.

  9. Re:That's still to be seen... by Anonymous Coward · · Score: 1, Informative

    the XML specification talks about "well-formed XML" and "valid XML", where the former means valid in all the usual senses of the word, and the latter means "can be validated by a program".

  10. exactly by ink · · Score: 3, Informative

    I wish I had some mod points for you; that's exactly what Microsoft means when they say that their documents are saved using XML. They include Win32 class-ID objects all over the place.

    --
    The wheel is turning, but the hamster is dead.
  11. Re:Could new .XML doc format be LESS open than .DO by AnyoneEB · · Score: 2, Informative

    Someone will end up with a leaked alpha or beta copy of Office 11 and start working on the file format. If they will be able to figure it out fast enough is the question. It's possible, but if it's not done completely enough by Office 11's release what you describe will happen. Someone else said that Microsoft won't change .doc anymore partially because Google supports returning .doc's in search results... of course that just requires stripping all formating, which would probably be pretty easy.

    --
    Centralization breaks the internet.
  12. Re:Defaults by dillon_rinker · · Score: 5, Informative

    Yup. Government standards are why you can buy screws and nuts from different manufacturers and have them work together. They are why you can buy "orange juice" at the grocery store and know that it's not "juice" wrung out of a pile of autumn leaves (hey, it's juice, it's orange, what more do you want?). Government standards are why you can fill fly in an airplane and know it won't crash.

    Sure, all these needs could be fulfilled by voluntary industry standards, if it weren't for those pesky human beings, fallible and greedy creatures that they are.

  13. Re:LOL by loconet · · Score: 4, Informative

    I know exactly what you mean. Word spits out complete garbage when it converts .doc => .html . Microsoft attempted to address this issue by releasing an HTML filter plugin that you can install and cleans up the html word spits out. It does clean up the html but it's still kinda messy.

    --
    [alk]
  14. MIRROR: Original XML by gazbo · · Score: 2, Informative
    I've mirrored the actual xml file that has not been mutilated by slashcode policies.

    Look here using a browser that will display the raw xml nicely formatted - IE works fine, supposedly Mozilla does too but I can't seem to get it to work; it parses the file and just displays the text.

    Shame this is all so hidden away in the story.

  15. It's XML, get over it. by Ankh · · Score: 5, Informative

    Wow, what a lot of false information. Maybe this will help a little. Disclaimer: I am XML Activity Lead at W3C, so I have a bias.

    The new Visio is using SVG.

    The new Word lets you use any XML vocabulary you like. How obfuscated it is is *entirely* up to you.

    It's not using base64 to put binary propietary data into XML documents. It's using plain XML.

    It's well-formed, and Word appears not to make up thousands of elements. The person in charge of this project is actually clueful, and was in the W3C XML Working Group (1996-1998 by the way).

    The tools all use XSLT extensively.

    It wouldn't surprise me if you could get Word to read and write the OpenOffice format just fine. There's a restriction that you can't re-order content in Word right now, I think.

    People claiming to have "insider info" and then posting blatant falsehoosd, or claiming you can put binary data directly in XML, aren't helping here. Even if you get high from hating Microsoft, the open source community and Free software world need to understand that the goalposts have moved a little.

    The extent of corporate assets tied up in memos, reportsand other documents is very large, massively higher than the collective value of relational databases.

    Yes, it looks as if Microsoft has suddenly discovered XML just as they suddenly discovered the Web. In fact, they were involved heavily in XML from the start, were among the first to ship commercial support for XML, and have been working on XML in Office 11 for a long time.

    --
    Liam Quin

    --
    Live barefoot!
    free engravings/woodcuts
  16. Re:LOL by Mike+Schiraldi · · Score: 3, Informative

    Dude: mmencode -u

  17. Matter Of Air Superiority by Puu · · Score: 2, Informative

    The testing is sickening. But it's us or them, really.