Slashdot Mirror


Tim Bray on Microsoft Office

jgeelan writes "The co-inventor of XML, Tim Bray, has been talking about the newly XML-enabled version of Microsoft Office, code-named 'Office 11' and tells XML-Journal that 'when the huge universe of MS Office documents becomes available for processing by any programmer with a Perl script and a bit of intelligence, all sorts of wonderful new things can be invented that you and I can't imagine.'"

25 of 495 comments (clear)

  1. Yay Evil Monopoly Of Doom! by Sneftel · · Score: 3, Interesting

    Wow, I was way off when I predicted that Microsoft would further obfuscate their Word format. This seems to be in all respects a Good Thing.

    StarOffice has used XML for their native file formats for some time now; I wonder if this means we'll see an even better-quality translator between the two formats?

    --
    The opinions stated herein do not necessarily represent those of anybody at all. Deal with it.
    1. Re:Yay Evil Monopoly Of Doom! by OrangeSpyderMan · · Score: 3, Interesting

      It will indeed be harder to mount the partition. It may also be harder to use that XML data, since what we may be talking about is XML encapsulation of binary, proprietary, encrypted file formats. Don't necessarily think you're going to receive at the other end a plaintext file with a few tags - what you will receive will have been through a closed kernel "request" to an encrypted database "filesystem", a proprietary DRM system (hardware and software) - and you genuinely believe there just gonna bang it out as plaintext at the other end?

      --
      Try NetBSD... safe,straightforward,useful.
  2. Historical turningpoint? by haeger · · Score: 5, Interesting
    I just thought about someone saying that somewere, when you look back in history, you can see some historical turningpoint where tings just went wrong or right.

    One small such point is when IBM gave out the specs to their hardware for PC allowing everyone to clone it, while Apple did not.

    This could be such a point. Maybe in 10 years we'll look back at this and ask ourselves "Why the heck did MS XML-enable their Office app, releasing the hold that they had"

    Only time will tell I guess.

    .haeger


    I Play Hattrick

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
  3. XML takes away Microsoft's main advantage by Zeddicus_Z · · Score: 5, Interesting

    As far as I can tell, one of the major reasons many businesses refuse to change over from Microsoft Office to cheaper options is due to file compatability. As our company's IT admin put it recently on the suggestion of using OpenOffice, "I get sent hundreds of Microsoft Word, Excel and Access documents a week. I need to know that I can open and access every single one of those without problems". An example of proprietry file formats helping Microsoft keep the monopoly.

    However, if Microsoft Office documents become "built around an open, internationalized standard", i.e. XML, would this not enable the people behind OpenOffice, StarOffice etc to acheive total 100% file compatability and thus negate Microsoft's largest advantage with Office?

    Of course, this could be yet another Microsoft "embrace and extend" tactic, a la` kerberos. Incorporate the standard in a bastardised form, claim standards compatability, then pollute it so you must be using Microsoft technology to properly interact with it.

    --
    Janie took my gun...
  4. HTML from Word by e8johan · · Score: 5, Interesting

    Just look at an HTML file exported form Word2k. I would not call that compatible with any HTML I've ever learned. Most probably the XML file exported from Office 11 will be a Microsoft specific file, specifying lots of Office specific ActiveX (aka OLE) info that cannot be emulated. And, hey, they can probably store binary data in XML. The only change is that most competing products will emit files that Word can easily read, i.e. M$ will get the biggest benefits.

  5. Re:read through "EULA" in the XML? by McCall · · Score: 2, Interesting
    COULD it make it illegal to "reverse engineer" the document format? I can very easily see that if it could, microsoft could include a clause that explicitly prohibits GPL programs from interpreting the XML...

    No way. What happens when I recieve a MS 11 XML Word document on my Linux system via email. I haven't accepted any sort of EULA, and I can start hacking out the DTD straight away - which I must point out, a complex XML document is close to worthless without.

    They may prevent MS users from reverse engineering the documents on their MS OS's and I suppose they could even forbid users emailing their documents to other OS's (EULA's are great eh?) - but I doubt they will do this, it would cripple Microsoft Office.

    Andrew McCall.
  6. Re:Too good to be true by MrHanky · · Score: 5, Interesting

    Maybe they need a migration path away from the win32-based format they use now. .NET also seems to follow that path. Remember that MS needs access to other platforms than the i386/desktop in the future - mobile devices for instance. Keeping a format that is basically a binary image from a PC is good for locking out competition, but not when you have to start competing with yourself.

  7. Too good to be true? by varslot · · Score: 2, Interesting

    The article states that:

    "The important thing," he explains, "is that Word and Excel (and of course the new XDocs thing) can export their data as XML without information loss..."

    Does this mean that MSO will have the same support for XML as currently for RTF? In that case I'm not that excited. If the default will be to save as MS-word format, and not XML (or MS-XML as the case may be), then we are no better off. Only Microsoft is, as they are now able to import OpenOffice/StarOffice documents.

    It's sort of like when Word could read WordPerfect documents in the old days.

    --
    There arises from a bad and unapt formation of words a wonderful obstruction to the mind. (Francis Bacon)
  8. What I heard.... by LarsBT · · Score: 3, Interesting
    I can't remember the reference, but I heard that they will embed binary code for different word-objects within XML tags e.g.

    <equation> 0100100100111101010011010101101010010 </equaition>
    which is allowed in XML (if I understand XML correctly). So not much gain if everything is still in propriety closed binary format.

    I think maybe it was the CEO of Microsoft Denmark. I'm NOT sure though

  9. Re:Typical XML-proponent mistake by smallpaul · · Score: 5, Interesting

    As long as you don't get a DTD with extensive comments on how to interpret the elements, along with some promise/guarantee that the DTD won't change every minor release, there is no real improvement at all.

    Have you ever tried to reverse engineer a binary file format? And have you ever tried to do the same thing with an XML file format? I learned huge chunks SVG yesterday _without_ opening an SVG book, just by mucking around in an existing SVG file and with an SVG viewer. Of course, Microsoft could do something clearly in violation of the spirit of XML, by making the whole thing one tag full of base64ed text or something. But as long as they use tags in a semi-sane way (which is the whole point, for integration with corporate systems), XML will be a big step forward.

  10. They already did this for two other products by pvera · · Score: 5, Interesting

    SQL Server has had an XML web gateway since version 2000. You can run any query and output it as xml or have an xml template pull the query and transform the results with XSL, all without one line of server side script.

    ASP.net uses XML for all the human-readable files, and the IIS in windows.net server finally uses Apache-style configuration files which are also XML.

    --
    Pedro
    ----
    The Insomniac Coder
  11. Hype! Hype! Hype! by RobotWisdom · · Score: 5, Interesting
    This article is pure PR, with no new content. The XML-cult will keep waving their hands and promising great payoffs 'RSN' (real soon now) until people actually start trying to implement uniform semantic tags in their data and documents... at which point universal disillusionment will set in because the problem is way too hard even for trained AI-PhDs. [more]

    The thread a couple of weeks ago about the death of META headers will apply 1000 times worse for semantic tags-- if the semantic web is going to work at all it needs to start from headers describing the webpage as a whole.

    (Also, what's with XML-Journal's claim the article has three pages when it only has two?)

  12. What we need is a ISO standard by javilon · · Score: 5, Interesting

    The open office group should get together with the rest of the guys (abyword, koffice and maybe wordperfect) and work out a format that can be submitted to the ISO. Possibly based on the open office format.
    Then goverments and corporation will adopt it for official documents so they can read their own documents in ten years.

    --


    When his defense asked, "Which computer has Jon Johansen trespassed upon?" the answer was: "His own."
    1. Re:What we need is a ISO standard by pubjames · · Score: 3, Interesting


      This may interest you:

      http://www.1dok.org/eng/index.html

    2. Re:What we need is a ISO standard by Jason+O'Neil · · Score: 2, Interesting
      That's actually a really good idea. If all the OSS Word Processors created a file format that worked seamlessly from program to program, it would be a major plus for all the smaller word processors.

      It would allow for competition in Linux word processors, without having to worry about file format compatibility problems.

      Then if someone just creates a script which converts MS Office docs (on mass, like every one inside the directory structure) to this wonderful new format (Should be possible thanks to Open Office) and it would be much easier to then switch to OSS.

      I personally have no problems with the current open office format, but if they made it human readable, so it can be created from plain text editors if necessary...

      Quick somebody suggest it to them

  13. Re:I doubt it. by ianezz · · Score: 5, Interesting
    I'm guessing their XML document format will be just as hard to decyper and the current office formats.

    There are 2 problems with the current format of Microsoft Office file:

    1. Give the correct interpretation to the bytes representing the document content, in order to import the Office document in some other office suite using a different representation.
      This is mostly solved (thanks to years of trials and errors).
    2. Give the correct interpretation to the bytes representing the document itself AND all the extra cruft having nothing to do with the document contents that the Microsoft Office suite puts in, in order to generate documents readable by the various versions of the Office suite.
      This is definitively more difficult, as nobody knows Office internals and how they expect such additional data to be. StarOffice guys managed to make an acceptable job, at the price of years of trials and errors. It's like watching at a dump of your computer's memory, guesssing what's code, what's data, what's padding and the meaning of every byte...

    Now, do an XML format simplifies things? Well, yes, just as an RTF text is easier to manage than a pure binary format, but nothing prevents putting extra cruft in an XML document, so it's just that instead of having to use a hex editor, you now may use a text editor, but giving a correct interpretation of tags and attributes is something that only Microsoft can do, unless it publishes the full specifications (present and future: after all, XML is eXtendible, right?)

    Personally, I think that:

    • Microsoft is realizing that the current Office formats are getting out of control, so it wants to get rid of them, because mantaining backwards compatibility is becoming too much painful.
    • An XML-based format may be the right answer for Microsoft, in that all the subtles of parsing binary data simply disappear, while it may still make difficult to everyone else to understand what's the real meaning of data. Let's say <obscuretag_42 foobarizer="xyzzy"/>
    • Microsoft was not giving out the specifications of the formats of its Office suite before: should we now suppose it's giving out the DTD/Schema AND a good explanation of how to interpret it? I'd hope the answer is yes, but giving the company's precedents...
  14. Re:I doubt it. by Penguin · · Score: 2, Interesting

    ... in fact, Microsoft has code examples for perl in their Knowledge Base:

    http://support.microsoft.com/default.aspx?scid=k b; en-us;Q214797

    (furthermore I'm impressed that a reply like "They'll probably do something evil..." would be rated as "Insightful")

    --
    - Peter Brodersen; professional nerd
  15. Re:What will be the default save format? by StefMeister · · Score: 5, Interesting
    According to this article on ZDNet, it wil probably NOT be the primary file format:

    To make that happen, Microsoft is turning to what some analysts say is a risky strategy. The company is adopting Extensible Markup Language (XML) as a second file format in all Office applications, to enable better data exchange between the productivity suite and back-end software, such as databases.
    --
    "Son, in a sporting event, it's not whether you win or lose, it's how drunk you get" - Homer J. Simpson
  16. the most wonderful thing... but it's not happening by g4dget · · Score: 3, Interesting
    The most wonderful thing that would happen would be that people can finally dump that messy piece of software and move to a better toolset.

    Unfortunately, Microsoft won't let it happen. The data may be "in XML", but that doesn't mean you can read it or generate it well. Instead, Microsoft will give you just enough to serve their business interests and nobody else's.

    How? Office will probably stick undocumented base64 encoded binary stuff into the output, containing formatting information. You can use the document content, for example, with a database, but you can't load it into another word processor and preserve all the formatting. And in the other direction, sure, you can generate simple documents that Office will import, but you can't generate arbitrary Word documents--they will, again, have weird, undocumented tags and binary stuff.

    In short: don't hold your breath. Microsoft isn't stupid.

  17. Re:What are you all complaining about? by nmg196 · · Score: 2, Interesting

    "somewhere" - that really good reliable source of information.

    "about MONO, we'll see" - go and see then - you only had to click the fscking link that I put there for you. Even a Windows user should be able to manage that.

    "all kinds of IP rights" - and you reckon Sun doesn't have those for Java?

  18. Re:There is some documentation of Office XML alrea by eetu · · Score: 2, Interesting

    The document at MSDN doesn't seem to have anything to do with MS Office 11 or the new "built around XML" Office file formats. It simply explains how files can be imported to/exported from Access and Excel of MS Office XP.

    --
    "If I can't have a revolution, what is there to dance about?" - Albert Meltzer
  19. No, it doesn't by alispguru · · Score: 3, Interesting

    Look up at this. Putting information in XML makes the first baby step of reverse engineering easier, nothing else.

    XML helps only if the creator of the document wants the information to be easily accessible by programs other than their own.

    --

    To a Lisp hacker, XML is S-expressions in drag.
  20. Inightful my ass! by Anonymous Coward · · Score: 1, Interesting

    You haven't got a clue about this have you?

    Your post is just a bunch of paranoid, slashbot FUD. No wonder you got modded up!

  21. Government Contracts Might be The Reason by bobaferret · · Score: 4, Interesting

    I think the reason that they are switching over is probably due to the trend in emerging foriegn markets. Peru being a prime example. Countries are starting to enact legislation that requires any government procurments of software to only be for software that uses an open file format. Due to the long term storage problems.
    This tied to the fact that US sales are going to slow down or are already, due to the complete inundation of PC, they need new markets, and unless they use an open format they won't be able to get them. I'd be panicked Linux and Java eroding their server market. Governments are eroding their Office market. They only way they can grow is add value.

  22. Genuine XML? by J.+Random+Software · · Score: 4, Interesting
    Good in theory, but HTML support in Office 2000 was such a debacle that there are third-party tools designed just to unmangle the markup. They compltely ignored Processing Instruction syntax, which is intended to do just what they wanted, and
    <![if !supportEmpty Paras]>
    wasn't even well-formed SGML.