Slashdot Mirror


Microsoft Ends Era Of Closed File Formats

RzUpAnmsCwrds writes "According to an MSDN Channel 9 interview with an Office file-format developer, the next version of Microsoft Office (Office 12) will default to newly-developed XML file formats in Word, Excel, and PowerPoint. The new formats will apparently include XML files along with other files (images, etc) inside of a Zip file. Microsoft will also be providing extensive documentation of the new format to the public through MSDN. The developer likewise announced that Microsoft would be releasing updates for Office 2000, XP, and 2003 to read and write the new formats when the new version of Office is released. If this interview is correct, it could mean the beginning of the end of Microsoft's proprietary file formats." Coverage at Beta News, Information Week, and the Washington Post.

23 of 651 comments (clear)

  1. Patents? by Anonymous Coward · · Score: 1, Insightful

    Patents (n/t)

    1. Re:Patents? by rben · · Score: 5, Insightful

      I use Open Office exclusively and have for the past couple of years. Reading the files in certainly isn't a problem for me. The only files that are slow to load are the master document files, and that's because they link to dozens of other files.

      The XML specification is being expanded (it might already be done) to allow binary formats. There are good reasons, though, why it's best to keep data files in straight XML text format. It eliminates the need to worry about machine architecture. Little endian or big endian, it maks no difference to you. The files are perfectly portable across platforms, which is increasingly important these days. XML files zip very nicely, making them almost as small as a corresponding binary file.

      It is far easier to provide backwards compatability to earlier file formats when you are using XML than if you are using binary file formats. With XML, if it sees a tag it doesn't understand, the parser ignores it. If a binary file format loader sees stuff it doesn't understand, it bails out with an illegal file format error.

      When you move to a new expanded file format with XML, you don't have to write a conversion utility. Since you are merely adding new tags, your program can read any of your old data just fine, then add the appropriate tags and new data. This saves a great deal of trouble for programers.

      Machines are fast and cheap. People are slow and expensive. It is far better to have our computers do a little extra work on loading a text file and eliminate conversion utilities and complicated loading routines that a prone to bugs.

      --

      -All that is gold does not glitter - Tolkien
      www.ra

    2. Re:Patents? by Ruphuz · · Score: 2, Insightful
      What information do you have that suggests that the XML format will be bloated?

      Not to be a troll, but Microsoft Word's HTML output gives a good idea of how greatly can they bloat XML.

      --
      My other post is a First.
    3. Re:Patents? by jedidiah · · Score: 2, Insightful

      So you think that a 1Ghz cpu is going to be slowed down because someone's resume or board of directors presentation isn't binary anymore?

      Oh puuuulllleeze.

      Such concerns might be relevant for a C64 or perhaps even a MacPlus. However, for small consumer documents such notions are absurd.

      This isn't exactly someone's corporate data warehouse we're talking about here.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    4. Re:Patents? by Zeinfeld · · Score: 2, Insightful
      Implementing a file format as binary data or even a simple SGML structure such as RTF means less overhead. Using XML you have to run an XML parser, and the file is more freeform. There are no set data structures, it is just a stream of text.

      The term you are looking for here is 'self describing structure'.

      If you have data in the form of LISP S-Expressions you know where structures start and stop. If you have just one document in that format you can pretty much work out the entire file format - or at least the features being used.

      If you have a binary document you have to do a lot more digging and it can take you days to just work out the basic structure.

      This will make Word much more useful, it will be much easier to create documents with other applications and emit them in Word format. So for example if I have a report writer component in my server I could spit out a Word Document rather than HTML which I would use today.

      I can also write filters to automatically convert from Word format to other formats, so I can take HTML source and spit out Word, I can take an XML data structure and emit word.

      So why would I prefer Word format over HTML when I was one of the people who helped write HTML? Well the answer is that virtually all HTML editors are optimized for editing Web pages. I write books and other reports that really don't fit that structure. There is nothing in HTML land that I have seen that provides the power of the Word outline mode and has built in spell checking as you go.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
  2. Hopefully the end of .doc, etc incompatibilities by Mikito · · Score: 2, Insightful

    Hopefully this file format change will bring about the end of ever-changing file formats from one version of an app to the next. Who among us doesn't have files saved in an old version of, say, Word, which can no longer be read correctly in a newer version of Word?

    --
    Anakin Simpson: If you're not with me, then you're my enemy--ooh, donuts!
  3. Re:Convenient... by AKAImBatman · · Score: 4, Insightful

    Also, "Microsoft Ends Era Of Closed File Formats" is a little overreaching, don't you think?

    That's exactly what I was thinking. If Microsoft was really opening up Office, why didn't they go for the OASIS Spec? Me thinks that this is an attempt by Microsoft to lead the industry around by the nose, thus solidifying their place as "Industry Leader". And with a proprietary document format, they can make minor, but frustrating, changes every version just to keep the competition on its toes.

  4. Catching up, but still missing OpenDocument by SgtChaireBourne · · Score: 2, Insightful
    MS looks like it's goal is to catch up with OpenOffice.org/StarOffice, which have had this kind of XML support for many years. Other, lesser, suites also have zipped XML files, like AbiWord.

    The one thing that these others have in common, that MS Office lacks, is support for the OpenDocument DTD. OpenOffice.org v2 will use OpenDocument as its main format.

    Note that many of the articles linked to by the original post express skepticism about how open MS' XML will actually be. Recall that in the last year, and even in the last weeks, MS has sought patents from the USPTO for XML and XML related functions. And is even now pushing to get legislation in Europe to make those same patents valid in the EU. That smacks more of a PR stunt rather than an actual opening up.

    Furthermore, since the articles don't mention the current leaders in productivity tools with XML-based formats (i.e. OpenOffice.org or StarOffice), that looks all the more like warmed over press release being passed of onto the public as news. What's next? A press release about MS suddenly supporting PDF export like in OOo or StarOffice?

    --
    Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
  5. Re:Convenient... by metlin · · Score: 1, Insightful

    They are a business, they need not do the altruistic best thing unless something was in it for them.

    Right now, they're choosing the middle ground - opening up a format in a way that they have the upper hand and yet, folks can't fault them.

    If you were a company whose motive was profit, would you really care about doing something that would make you part of the pack?

    MS sees itself as being different from the pack, and so, this is a logical choice for them. It may not the perfect choice or the right one, but it does make a lot of sense from their perspective.

    (Need to open up the formats in a way that the competition can be kept on its toes, like you said - what better way?)

  6. Nice marketing ploy. Too bad it's a scam by doublem · · Score: 2, Insightful

    Access to the MSDN documentation will require a MSDN developer's subscription and a signed NDA. The NDA will of course forbid the use of file format specification in unsecured software. Appropriate copyright, patent and other licensing fees will be required of developers writing commercial software to access the new file format.

    All kidding aside, I think any hope about this is misplaced. There will no doubt be numerous restrictions on the use of the format information.

    There's also the fact that MS has done quite a bit to document their Office Formats in the past. The major issue is that the documentation differs significantly from the implementations.

    In other words, this is a load of marketing, designed to grab a few buzz words so the sales staff can toss around the phrase "Open Format" when necessary.

    --
    "Live Free or Die." Don't like it? Then keep out of the USA
  7. Microsoft begins era of patent encumbered formats by dyfet · · Score: 5, Insightful
    Take look at Office 2003 XML Reference Schema Patent License and reconcile that with the claims and headline of this article.

    In particular; consider "Microsoft may have patents and/or patent applications that are necessary for you to license in order to make, sell, or distribute software programs that read or write files that comply with the Microsoft specifications for the Office Schemas." taken from the same page...

    What changed? How is that an "improvement" exactly?

  8. Re:Loosing lock-in capability? by IntlHarvester · · Score: 4, Insightful

    The future profitibility of MS Office is as a component of network groupware systems. Because if you are primarily using Office in standalone mode, you are just fine with any version of Office released in the last 8 years. So, the "value" has to be in improved collaboration or document management.

    In this respect, Microsoft needs open formats just as much as anyone. Ever try to write a server-based system that reads information from DOC files? Using winword.exe with automation just doesn't really work. XML lets MS use a relatively lightweight parser in a server-based system.

    Oh, and changing the default fileformat will surely spur some upgrades, but from what I've seen the corporate market is generally not in a big hurry to get onto the latest version of Office. I don't foresee a repeat of Office 97.

    --
    Business. Numbers. Money. People. Computer World.
  9. Lock-in continues via DRM by SgtChaireBourne · · Score: 5, Insightful
    Would'nt this approach cause MS to loose its lock-in ability based on file format?
    No. The lock-in continues via DRM and ties to "Office Servers". MS is really pushing the server based aspects of Office 12, so there will be hooks to the server like crazy. MS is also really pushing the DRM encumberance in Office 12. In all likelihood, the XML files will still have key components encrypted so as to support MS' DRM and as a 'side effect' lock out competitors.

    The interesting thing is that all this server based control and logging of DMR'd functions gives an enormous boost to the type of information available for international and corporate espionage. Through backdoors, security holes or escrow keys it was possible before to get only the documents themselves for the most part. Now it's possible to monitor who's collaborating with who, and see everyone in the distribution chain.

    That much can be guessed even now during the vaporware stages. However, as more technical information becomes available it will be possible to guess whether these same functions can be used for more than monitoring and can actually be used to stifle or suppress dissent or specific individuals or groups.

    --
    Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
    1. Re:Lock-in continues via DRM by Anonymous Coward · · Score: 1, Insightful

      What a moronic thing to say. How typical of the reactionary MS-haters: put the hatred before the logic.

      Yes, there are DRM-like aspects to Office12.

      No, they are not meant to lock you in to Office. Rather, they specifically exist to help companies manage security concerns with their own data (some of which concerns are mandates like SOX and HIPAA). So companies will be able to turn on this DRM-like feature *at their option* for specific documents.

      Are you really dim enough to think that MS would sell a single copy of Office if you couldn't use it to send your resume to a potential employer, or send a quote or whitepaper to a client?

  10. Re:Convenient... by telbij · · Score: 3, Insightful

    Well I gotta hand it to you for a what amounts to an absolutely brilliant troll. You had me nodding my head the whole way through, but actually your response is just as hyperbolic as the story title. I really don't care to get into all the details... but one thing you said,

    I can't see this as anything more than a much belated, empty gesture on Microsoft's part.

    is true from the MS perspective, but that doesn't mean nothing good can come of it. Having a documented XML format could do wonders for OpenOffice compatibility, which wouldn't necessarily put a dent in Microsoft's monopoly, but it would make life a lot easier for those of us who don't want to participate in it. I'm not saying it'll pan out, just that there are possible real benefits.

  11. So they claim. by mcc · · Score: 4, Insightful

    Watch the video - the entire file format is completely open.

    Honestly, I am not going to believe it until I see it.

    Microsoft has lied before.

    It's quite possible they don't intend to open their file formats at all, they just intend to make the Washington Post and its readers think they've opened their file formats. In the meantime, if Microsoft actually wanted to "end the era of closed file formats", all they'd have to do is, you know, actually comply with the letter of the antitrust decision currently handed down against them in the E.U. and the spirit of the toothless antitrust "settlement" currently in effect against them in the U.S.. Mysteriously, they haven't.

  12. Re:Convenient... by KagatoLNX · · Score: 2, Insightful

    Ummmm, hello people. This is an XML format.

    If MS needs extensions, that's what namespaces are for.

    As long as MS extensions don't change formatting functionality (this is really not rocket science, Word is not an innovator here), they can tack whatever metadata they need into the file format and still have it be portable.

    If you don't believe me, look at what Inkscape has done with SVG. Psodipodi built on it, adding a namespace to provide their needed data. Inkscape did the same on top of that. It produces one file containing three XML namespaces containing with interoperable metadata for two editors--and it's viewable in stock SVG viewers.

    Obviously it's up to Microsoft to "do the right thing" with their metadata, but this certainly levels the playing field so that others can do what they need to with the documents.

    Now the patent on XML-to-object-mappings, that's another story...

    --
    I think Mauve has the most RAM. --PHB (Dilbert Comic)
  13. Re:Loosing lock-in capability? by Trigun · · Score: 2, Insightful

    you have a rock solid (Yes, solid) platform for group work, communication and management that OSS can't even touch.

    Which, imho, is one of the two problems with the OSS business desktop. Gnome and KDE are great desktops, Linux has a long pedigree of network interoperatiblity, but it is really nothing more than a chain of islands, each doing its own thing. There is not a large network collaboration software for linux that takes care of all of the needs of business users. You can cobble the parts together, but if developers don't control all aspects of the equation, it makes things difficult.

    Microsoft is in a unique position where it can tie in parts of its operating system and application software together for a 'just works' solution. People can cobble together a 'works' solution, and even a 'works better than MS' solution, but there is a lot of issues with setting these solutions up. To date (and I have been looking) there is no single definitive solution for something as simple as network logon, and the preferred solution (Ldap, pam and Kerberos) is not the easiest thing to deploy.

    Even if you were to create a ldap-pam-kerberos network, with a document management system that used the kerberos authentication, e-mail that used kerberos authentication, and a plugin that allowed you to check out and check in documents into the dms for OpenOffice,without using a third party middleware that added twenty extra steps into it, you would need a huge company or dedicated group to do it, do it right, and do it seamlessly.

    Novell's working on it, but because it's new, it isn't mature enough for business to see it as a viable solution. Novell still has its fanboys, and their stuff does warrant it, but they are not seen as a competitive threat to MS.

  14. Re:CPUs that run Gzip are not free(beer) by tepples · · Score: 2, Insightful

    XML data structures are all going to make life a whole lot easier now that memory and processing time are such commodities.

    Commodity != free, especially when you are trying to deploy something on a battery-powered device.

  15. Re:Loosing lock-in capability? No... by Kiryat+Malachi · · Score: 2, Insightful

    Considering that even MS sometimes has problems interpreting their own binary formats from way back when, I could see it just being a move on their part to eliminate that problem in the future.

    That said, I don't believe the GPL incompatibility was intentional; I believe it to be a side-effect of what MS thought was an appropriate way to make sure that their patents and patent issues were appropriately labeled in software using their schema.

    --

    ---
    Mod me down, you fucking twits. Go ahead. I dare you.
    (I read with sigs off.)
  16. Visual Diff of application files by CarpetShark · · Score: 2, Insightful
    There are good reasons, though, why it's best to keep data files in straight XML text format. It eliminates the need to worry about machine architecture.
    One of my favourite features of XML vs. binary application files is that, if you run diff on two XML files -- say, SVG drawings --, you can actually read how that drawing has been changed, just like with code or any other text file. It gets difficult to visualise on larger changes of course, but that's not unlike code either. I think, if people start noticing that aspect of XML file formats, and using it more, then there might come a time when we develop visual diff for XML etc.
    1. Re:Visual Diff of application files by Anonymous Coward · · Score: 1, Insightful

      Of course, this doesn't help one whit if some goofball has decided that they need to spew their machine-generated XML all on one line, without any newlines; unfortunately, the line-oriented diff utility can only do so much in this situation. Personally, I'd like to see a successor to the dynamic diff/patch combo that can handle deltas more gracefully. Something that could also be a portable changeset interchange format for source repositories would be nice, too.

  17. Re:ZIP patent... by evilviper · · Score: 2, Insightful
    The huge advantage of zip over compressed tar archives comes from the fact that you have random access, i.e. can extract a single file from a potentially HUGE archive).

    Actually, I don't find much of an advantage. In my experience, even if you are trying to extract a tiny file from a large archive, it still seeks through the majority of the zip file, and is only slightly faster than uncompressing the entire thing.

    Despite that, tar and gzip could be even better. A little programming and you could modify tar to transparently compress individual files with gzip/bzip2 before adding them to the tar archive. In other words, instead of a "tar.gz" file, you'd have a "gz.tar" file...

    Why this hasn't been done yet, I don't know.
    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant