Slashdot Mirror


Is the New Microsoft Office Really Open?

joesklein asks: "From CNET, there is an article about the new Microsoft Office 11. In summary 'Microsoft says it's opening its Office desktop software by adding support for XML--a move that should help companies free up access to shared information. But there's a catch: It has yet to disclose the underlying XML dialect.' Could this be grounds for another anti-trust suit against Microsoft?"

29 of 485 comments (clear)

  1. sure it is! by Anonymous Coward · · Score: 5, Funny

    it supports .DOC, the de facto standard for documents. What's this XML you're talking about?

  2. Defaults by Snoe · · Score: 5, Insightful

    RTF has been in office for years and it is an open, portable standard readable on many platforms and with many programs. The problem is that Microsoft chooses to retain their obfuscated binary format as the default save type for documents.

    If the XML files office produce are not made the default save types or if the XML merely encapsulates large portions of binary code, it will not matter one lick that office can save these xml documents because the majority of people will be stuck on the default, unreadable formats.

    1. Re:Defaults by EisPick · · Score: 5, Insightful

      It's not "obfuscated" so much as it's "optimized." The whole idea seems to be for Word to save as quickly as possible--which the doc file is best at for Word for some reason, probably becuase it's derived from how the program structures documents, and not how some document spec says documents should be handled.

      In an era of 2+ GHz computers with 7200+ rpm hard drives, it seems odd that Microsoft would be unable to write an application than can quickly save and open text files that, on average, run well under 50 kilobytes.

    2. Re:Defaults by MadAhab · · Score: 5, Insightful
      You are goddamned fucking lucky that the government tells you what the default values for things should be. That's what the government is there for, mostly; to tell you that the default value for a building is to have a fire exit and that it may not be locked. And without standards, there is no interchangeability of parts. And without that, every consumer and customer gets assraped by manipulative vendors. And since you can never tell precisely how this battery differs from that battery, you just have shit exploding battery acid all over the place.

      But if you really think they have no right doing these things, go live in a 3rd world country; they generallly have the government telling you less about what to do. Except once in a while when they kill your familiy. You could be armed of course. You know what a totally armed society with a weak government looks like? Afghanistan.

      That being said, it's hard to see what business the government has engineering document formats. They could, on the other hand, specify disclosure of formats as a remedy in an anti-trust case, but they generally fall into one of two categories which precludes this: stupid or bought.

      --
      Expanding a vast wasteland since 1996.
    3. Re:Defaults by dillon_rinker · · Score: 5, Informative

      Yup. Government standards are why you can buy screws and nuts from different manufacturers and have them work together. They are why you can buy "orange juice" at the grocery store and know that it's not "juice" wrung out of a pile of autumn leaves (hey, it's juice, it's orange, what more do you want?). Government standards are why you can fill fly in an airplane and know it won't crash.

      Sure, all these needs could be fulfilled by voluntary industry standards, if it weren't for those pesky human beings, fallible and greedy creatures that they are.

  3. Can you copyright/patent a schema ? by aron_wallaker · · Score: 5, Insightful

    The big question (to me) is whether Microsoft can put a legal encumbrance on the XML schema they use for a new file format. Could you publish a schema but have it so wrapped in legalese that (for example) open source projects could not be allowed to use it ?

  4. "Could this be grounds for another lawsuit?" WTF? by Wakko+Warner · · Score: 5, Funny

    Yes, mister Hairtrigger, we should sue Microsoft simply because they won't release trade secrets. We will surely win.

    - A.P.

    --
    "Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
  5. what does it matter by greechneb · · Score: 5, Insightful

    No matter what microsoft does, all they will get is a slap on the wrist. Microsoft will just point to staroffice and openoffice and say, hey, there's compitition, its not a monopoly.

    Big deal if they don't open it up anyway (I don't really expect them to), staroffice/openoffice will crack it to a certain extent anyway. For most people's file conversions, its not that much of a difference to convert documents. Doesn't always look pretty, but it works fairly well.

    Wake me up when something Microsoft does is suprising...

  6. InfoWorld articles by andynms · · Score: 5, Informative

    There are a couple of good articles on this at InfoWorld. Try here and here.
    Good quote:
    THE GOOD NEWS is that Office 11 supports XML Schema. The bad news is that XML Schema has been described even by XML experts as "confusing," "impenetrable," "fuzzy," and "as user-friendly as a stick in the eye."

    1. Re:InfoWorld articles by frisket · · Score: 5, Informative
      I was at the launch presentation of Office-11 by Jean Paoli at XML 2003 in Baltimore MD last week, and I'm also a late sign to MS's extended beta list for the product (now closed).

      To clear up some points people have commented on (based on a very preliminary inspection plus a lot of discussion at the conference):

      1. The default save format is still .doc (ie you have to go the extra click to save in XML format)
      2. If you pick to click it, the default XML format is MS's own office-document vocabulary, which retains all the formatting, held in attributes. Hairy but processable, and they will be shipping their schema for it so people can reprocess it externally. But this format will (of course) only represent the appearance, not any structure.
      3. It will also let you specify your own schema (or an industry standard one) and let you supply a binding of named styles to your element types, so you can edit using what look like styles but actually get represented in the saved file as XML markup. There is some debate as to whether this constitutes "being an XML editor" or just "being a wordprocessor that saves data in XML" (my money is on the latter).
      4. It will not support DTDs, so you're stuck with W3C Schemas whether you like them or not*
      5. The discussion over a [more?] suitable schema/DTD for handling office documents (wordprocessing, spreadsheet, presentation) continues at the OASIS TC on Open Office XML Formats **
      With Office-11, Microsoft has nearly caught up with Corel's WordPerfect, (which has had a fully-fledged SGML and XML editor built-in for years) and XMetaL (which Corel took over from SoftQuad earlier this year). MS still has a long way to go to match industrial-strength applications like ArborText's EPIC or even Emacs with psgml-mode et al , but Office-11 will be a solution for the masses who believe the Word interface to be more desirable, or the Microsoft licensing régime to be more attractive, or the software to be more stable.

      * [Bias note] I think W3C schemas were a big mistake; provision for data content typing and validation, namespaces, and extended grouping could have been achieved by extending DTD syntax; and wimpy programmers who moan about having two syntaxes to handle should get a life - it's not a big deal, the code is free and has been in use for 15 years :-)

      ** Sun has donated the OpenOffice (aka StarOffice) XML file formats to the public domain. It's worth remembering that {Star|Open}Office has been saving in XML as its native format for some time now, and has a lot more experience at this than MS.

  7. well, of course by Planesdragon · · Score: 5, Interesting

    Could this be grounds for another anti-trust suit against Microsoft?

    Of course it could. But so could any bit of news about MS on /. in the past twenty years, from EULA alterations to Palladium.

    But "could" and "is" are differnent things. I suspect MS will decide that closing XML will render it useless, and make it at least as open and useable as their MS-HTML files.

    So, at the worst, we'll have a new "save as" option that's bit sloppy--but since MS won't have to extend XML to get their office functionality, they probably won't do it just to spite a few OSS coders who'll figure it out in a year anyway.

  8. Re:Reverse Engineer by Phroggy · · Score: 5, Insightful

    I suppose they could put some weird binary or encrypted data in the files, but that would defeat the purpose of XML.

    The purpose of XML is to have buzzword compliance, and this doesn't defeat that.

    (Of course that's not the purpose most other people use XML for, but we're talking about Microsoft.)

    --
    $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
    $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
  9. Points to remember... by MosesJones · · Score: 5, Insightful


    1) XML, SOAP and all these new technologies were pioneered by Microsoft

    2) They killed all the standards they didn't pioneer (CORBA anyone ?).

    3) There is NOTHING in the XML spec that _requires_ people to open up their schema definitions. Its purely a structure definition in the same way as Microsoft's old Word documents were stored, its just that now the markers are in Text format and any standard XML parser will be able to read the file.

    4) Open Office can already read word documents even though they aren't in XML.

    5) So can Word Perfect.

    6) Using XML doesn't stop you embedding binary into the document, often people do this to store data (images for instance), thus an OLE reference might still be binary.

    7) Pure XML and XSLT are great ways to use up all the power on your processor. Binary has previously been used here because its inefficient, if MS had opened the format up everyone would just complain that its too inefficient and its quicker to save using an older format. So MS are either trying to burn cycles or are customising the XML or their application for speed, is that wrong ? Would it be wrong if KDE did it ?

    8) People won't switch to or from Word because of XML, Open Office and other tools will be able to read the Word files because other tools (Google for instance) need the format and MS can see real business need to allow them to see it.

    9) XML is a meta-language as such anything can be written. Hell they could have a bitch of an external format and then a simple parser that makes it useful, but not tell anyone about the simple parser so everyone elses documents take years to load.

    10) XML is the buzzword of today, OLE to be replaced by SOAP as the buzzword for Office next ?

    Get off the high horse guys, whether its binary or XML is irrelevant, making something XML doesn't make it open. Thats like saying that everything you do makes sense, but just because people don't understand the Mayan Calendar and Ancient Greek they complain.

    MS will always use Mayan and Ancient Greek, and we _can_ understand them, its just easier for them as its their native language and calendar.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  10. Adoption of standard no guarantee of interop... by Sigh+Phi · · Score: 5, Insightful

    Microsoft (and Netscape) essentially tried the same thing with HTML. Sure, we're using HTML, but to actually view our HTML, you have to use our browser.

    Adoption of a "standard" is no guarantee of interoperability. Understanding the conceptual underpinnings of the standard is just as important. The question is, when Microsoft says they are using XML as a document format, are they doing it because they believe in the principles underlying it, or solely for the cynical "this is what is selling now" aspect?

    The body of HTML out there is an paresable, babble of a mess, largely because the two dominant browser makers did not respect many of the underlying notions of markup and hypertext to begin with. The state of the art progressed, but not in the way a lot of people wanted it to go.

    This could bode poorly if the meme survives somehow that the Office format is now equivalent to XML. When it "doesn't work," who knows where the blame will fall?

  11. Re:LOL by Anonymous Coward · · Score: 5, Funny



    <head>
    <META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
    charset=3Dus-ascii">

    <meta name=3DGenerator content=3D"Microsoft Word 10 (filtered)">

    <style>
    <!-- /* Font Definitions */
    @font-face
    {font-family:Tahoma;
    panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */
    p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0in;
    margin-bottom:.0001pt;
    font-size:12.0pt;
    font-family:"Times New Roman";}
    a:link, span.MsoHyperlink
    {color:blue;
    text-decoration:underline;}
    a:visited, span.MsoHyperlinkFollowed
    {color:purple;
    text-decoration:underline;}
    span.emailstyle17
    {font-family:Arial;
    color:windowtext;}
    span.emailstyle18
    {font-family:Arial;
    color:navy;}
    span.EmailStyle19
    {font-family:Arial;
    color:navy;}
    @page Section1
    {size:8.5in 11.0in;
    margin:1.0in 1.25in 1.0in 1.25in;}
    div.Section1
    {page:Section1;}
    -->
    </style>

    </head>

    <body lang=3DEN-US link=3Dblue vlink=3Dpurple>

    <div class=3DSection1>

    <p class=3DMsoNormal><font size=3D2 color=3Dnavy face=3DArial><span =
    style=3D'font-size:
    10.0pt;font-family:Arial;c olor:navy'>

    I agree.

    </span></font></p>

  12. XML can be as cryptic as binary by Jelloman · · Score: 5, Insightful
    All the hype about XML seems to skip over the fact that XML is never guaranteed to be any less cryptic than binary data formats. For example:
    <?xml version='1.0' ?>
    <wordDoc>
    <base64 value='kjkjKJ+kyRgMhiuI9KqU/hjkj'/>
    <base64 value='OlRg8LKp8UI883Jjk+krNhjkj'/>
    <base64 value='pRhjjhO9asdJiQ99kjkjU8j=='/>
    </wordDoc>
    XML was designed to be machine-readable, not human-readable, much less human-understandable, or easily-reverse-engineerable.

    The Office file formats will be open if M$ decides to:
    • Document them, and
    • Not change them with every update.
    I doubt they will do either of those things.

  13. Re:Lovely... by ceejayoz · · Score: 5, Funny

    Oh no! Heaven forbid someone extend the eXtensible Markup Language!

  14. Re:"XML dialect"?!? by Frobnicator · · Score: 5, Funny
    Who died and made you incorrect corrector of common terms of speach?
    ahem. speech

    :-)

    frob.

    --
    //TODO: Think of witty sig statement
  15. Re:LOL by commodoresloat · · Score: 5, Interesting
    Or anything close to "standard." The best we can hope for is code that is recognized as valid, and I wouldn't hold my breath for that either. I've seen HTML like the following come out of Word:

    <B><A HREF="http://whatever.org"> Link </B></A>.

    I'm not kidding, either. Seems like an easy thing to avoid in an HTML generator. Validator routinely reports hundreds of coding errors in simple short documents generated by Word. Ugh. What really sucks is when you're working on a web page for someone and cleaning out all the crap that Word generates, then at the last minute they send you the same document with some minor errors corrected.... and all the same major errors generated by Word. Fun.

  16. Open but Secure by mugnyte · · Score: 5, Interesting

    Something in my gut tells me that beyond all the extraneous tags, attributes and data types, the XML is going to have a hash code built into it.

    Edit this file outside of MS Office (invalidating the hash code) and suffer the consequences: MS treats it as "untrusted" input and rips out only the text content, no formatting.

    The hash will be a giant number created through a secure portion of the Intel-ish hardware calls. Keys hidden where? That'll be interesting to see who posts 'em first. Perhaps on a .NET server at MS hosting? Nah, this cripples offline Office. Keyless hash?
    Curious Curious.

    mug

  17. Re:Embrace and Extend by Rick+the+Red · · Score: 5, Funny

    The difference between Microsoft and their competitors is that MS is willing to take a long-term view:

    1) Establish a monopoly on office productivity software
    2) Profit!
    3) See income drop once everyone has Office. Market saturation!
    4) Less Profit :-(
    5) Release new Office with new file formats; use monopoly to get it pre-loaded on all new PCs.
    6) Eventually everyone else upgrades Office in order to read new file formats they're getting from their co-workers.
    7) Profit!
    8) Release new OS with filesystem that looks like a database.
    9) Release YAO (Yet Another Office) [see 5 & 6] that only works with new database/filesystem in new OS.
    10) Now, not only do the masses have to upgrade Office to read co-workers files, they have to upgrade Windows as well.
    11) Profit!!!!!

    --
    If all this should have a reason, we would be the last to know.
  18. Draw you Own Conclusions by Alien54 · · Score: 5, Funny
    well, tongue in cheek

    the Love Caculator demonstrates that

    Draw your own conclusions. cute little widget.

    --
    "It is a greater offense to steal men's labor, than their clothes"
  19. Re:That's still to be seen... by 9jack9 · · Score: 5, Insightful
    But they can make it so massively complex that it is very difficult to implement interoperability with foreign tools, but that it is somehow much easier to implement with MS-centric tools.

    The registry in Windows NT/2000/XP is sort of like that. It makes a lot more sense from a Microsoft-centric viewpoint than it does from a non-Microsoft-centric viewpoint. Now that it's been around so long, there are lots of ways to get at registry data (for instance, using Perl modules), but when the registry was new the only way to do it was through the Microsoft API, but until many people went through the pain of encapsulating the MS API, the pain of accessing the registry from a non-MS-centric toolset was high.

    So maybe the XML format will be like that. If you're Linux-centric, for instance, the threshold of pain for accessing Word XML docs will be fairly high, but if you're Microsoft-centric, with all of their tools, code-snippets, documents, etc., then it won't be nearly as painful.

    This way MS gets to claim interoperability, make Word data easily accessible to MS-centric solutions, but put a damper on non-MS-centric solutions.

  20. Even grep replacing doesn't help by burgburgburg · · Score: 5, Informative
    Word HTML output was always atrocious. It failed everywhere from correct tag order (as is shown above), not properly quote parameters (sometimes it uses ", sometimes it uses ', sometimes nothing). Multiple tags, all with different styles one after another (actual example below)
    <b style='mso-bidi-font-weight:normal'><i style='mso-bidi-font-style:normal'><span
    style='f ont-size:12.0pt;mso-bidi-font-size:10.0pt;font-fam ily:Arial;mso-fareast-font-family:
    "Times New Roman";mso-bidi-font-family:"Times New Roman";color:black;
    mso-ansi-language:EN-US;mso-f areast-language:EN-US;mso-bidi-language:AR-SA'><br
    clear=all style='page-break-before:right;mso-break-type:sect ion-break'>
    </span></i></b>

    Even with grep replace tools, cleaning up this crap takes hours.

  21. Re:That's still to be seen... by MrResistor · · Score: 5, Insightful

    No because the dtd and/or namespace will have to be referenced in plain text in the xml document. so, even if they use absurdly complex element names, they have to use a valid dtd or namespace uri which can be easily referenced

    I think an analogy to Frontpage is appropriate here. Sure, it produces HTML, but the result just doesn't look right unless it's viewed in IE. Maybe the dtd is referenced, but encrypted or otherwise proprietary. Maybe MSXMLVIEWER (whatever it may be called) doesn't need the reference to be in plain text.

    There are any number of things MS could do to ensure that the document just doesn't look right in other viewers. Since formatting is the whole point of XML, people will use MSXMLVIEWER and whatever it reads will be the de facto XML standard, just like whatever IE renders is the de facto HTML standard.

    or it just ain't xml at all.

    While technically correct, the point is sadly irrelevant. As long as MS is effectively a monopoly XML will be whatever they say it is, for the majority of people.

    Also you aren't allowed to put binary data in an xml document

    Not true. It's recomended that you don't put binary in an XML document, but nothing prevents you from doing so. This is exactly what will give MS the ability to hijack the standard.

    In conclusion they would have to break xml pretty hard-core in order to make their doc types proprietary.

    Only in spirit, I'm afraid, but that will likely be enough.

    Besides, then what would be the point of going xml in the first place?

    To make documents searchable. This is an ability which is extremely valuable to anyone who has a large amount of information they need to access. The upshot is that the actual content will likely be plain text, though important markups may not be. Sadly, format is more important than content for a lot of people.

    Of course, most people won't use the XML format at all, since it won't be the default.

    --
    Under capitalism man exploits man. Under communism it's the other way around.
  22. Re:LOL by Wolfier · · Score: 5, Funny
    <?xml version="1.0" encoding="base-64?>
    <!doctype MS_WORD
    <!ELEMENT WORD_DATA>
    ]>
    <WORD_DATA>SGFoYSwgaWYgeW91IHJlYWx seSBhcmUgdHJ5aW5nIHRvIGRlY29kZSB0aGlzLCB5b3UgaGF2Z SB0b28gbXVjaCB0aW1lIG9uIHlvdXIgaGFuZHMh<WORD_DATA>
    </xml>
  23. Re:Are you paying attention? It's Microsoft. by gmack · · Score: 5, Insightful

    That right there is one of the things that makes working with windows a pain.

    On any Unix or Unix clone you can just run standard tools or write your own.

    Unfortunatly with everything in a proprietary format you then end up having to build scripting languages into everything making all of your data files potential entry points for malicious code.

    The move to XML has the potential to eliminate that sort of brain damage once and for all provided they actually open their file formats.

    I hope they do it.. but given their past I'm not holding my breath given that the options are long term financial security for MS or Security for their customers and the risk of losing market share in the future.

  24. Re:That's still to be seen... by CondeZer0 · · Score: 5, Insightful

    How does this misinformed crap get moderated up?

    As some others have pointed out:

    1) You don't need a DTD or Schema to have XML
    2) The url used in a namespace declaration doesn't need to correspond to a real document
    3) Even in case the document used a DTD or Schema, that DTD or Scheme where available, and the document actually validated against it, you still don't know what the hell the tags mean, the DTD or Scheme are just syntactical(and grammatical?) rules, and don't tell you how to interpret the tags or attributes.
    4) You can always include binary data in an XML document(ie., base64 encoded)
    5) The point of using XML is Buzzword compliance and *perceived* openness

    There are more reasons why XML not necessarily = openness. But this ones are more than enough.

    XML means nothing, it's just a way to define languages, is like an charset, just because I have a document that is ASCII doesn't mean that I understand what is written on it if I don't know the meaning of the words that are on it(eg., just because you know the name of each letter doesn't mean that you know the meaning of "lkasdertunxsjd", right?)

    Even if a language is in XML, you still need to *document it* to be able to *understand* it.

    Sorry if I was a bit rough, but I'm sick of people that assume that because something is in XML it's automatically open. That is one of the biggest myths the XML buzz-wagon is based on, and is spreaded by people
    that don't really understand what XML is.

    Please, before you post to /. make sure you know what you are talking about.

    Best wishes

    \\Uriel

    --
    "When in doubt, use brute force." Ken Thompson
  25. It's XML, get over it. by Ankh · · Score: 5, Informative

    Wow, what a lot of false information. Maybe this will help a little. Disclaimer: I am XML Activity Lead at W3C, so I have a bias.

    The new Visio is using SVG.

    The new Word lets you use any XML vocabulary you like. How obfuscated it is is *entirely* up to you.

    It's not using base64 to put binary propietary data into XML documents. It's using plain XML.

    It's well-formed, and Word appears not to make up thousands of elements. The person in charge of this project is actually clueful, and was in the W3C XML Working Group (1996-1998 by the way).

    The tools all use XSLT extensively.

    It wouldn't surprise me if you could get Word to read and write the OpenOffice format just fine. There's a restriction that you can't re-order content in Word right now, I think.

    People claiming to have "insider info" and then posting blatant falsehoosd, or claiming you can put binary data directly in XML, aren't helping here. Even if you get high from hating Microsoft, the open source community and Free software world need to understand that the goalposts have moved a little.

    The extent of corporate assets tied up in memos, reportsand other documents is very large, massively higher than the collective value of relational databases.

    Yes, it looks as if Microsoft has suddenly discovered XML just as they suddenly discovered the Web. In fact, they were involved heavily in XML from the start, were among the first to ship commercial support for XML, and have been working on XML in Office 11 for a long time.

    --
    Liam Quin

    --
    Live barefoot!
    free engravings/woodcuts