Slashdot Mirror


Can XML Replace Proprietary Document Formats?

Pauly asks: "My former profession of Technical Writer was made very painful by my customers' requirement to have their documents delivered in MS Office formats. PDF/FrameMaker was not acceptable, as they needed to be able to edit the documents as well. Let me tell you, it is painful watching a 3,000+ page Word97 manuscript, the fruit of weeks of hard labor, rendered into rubbish by my customer's Word95. I've missed deadlines, lost money, and will never forgive Microsoft for their abuse of me and my kind. My question: is it possible that XML-based standard file formats suitable for word processor, spreadsheets, etc. could be created that forever do away with proprietary binary formats and inadequate file conversion routines? This notion seems to be working for the graphics crowd in the form of SVG. The benefits are obvious, what are the drawbacks?"

6 of 291 comments (clear)

  1. Not Already Happened by Matts · · Score: 5

    No, it hasn't (already happened).

    Microsoft want you to believe that they are buzword compliant, but in reality the output from Microsoft's "Save As HTML" looks like XML, smells like XML, but isn't. Try parsing it.

    See the recent Byte article "The cup is half full" for more details. I'm surprised you haven't heard about this. MS is using it's proprietary XML Islands inside a HTML document. That means you have to get a HTML parser to be able to parse it. The content of the XML is just as proprietary. It's basically a conversion of their OLE Document objects into XML.

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  2. Re:XML needs to be integrated into Linux by Deven · · Score: 5

    Why have you gotten so offended? If you don't like what I have to say then at least be polite, after all, it only reflects badly on you and hence Slashdot as a whole. I have commonly found an amazing resistance to different opinions amongst the "open" source community, which seems to me to be the antithesis of what you stand for.

    Comments such as "XML should be in the kernel" betray a lack of understanding as to the proper function of the kernel. Worse yet, (unlike, say, khttpd), putting an XML parser in the kernel wouldn't provide any benefit. All you're doing is encouraging the kind of useless feature bloat that Microsoft is rightly loathed for. That's why people get upset about remarks like this; they don't want this attitude to spread further than it already has.

    Anyway, what you are clearly unaware of is that the perception of performace and stability is far more important in the corporate domain than the actuality of the situation. By integrating XML into the kernel, you have provided Linux with a major marketing point for the people who are actually in charge of what their company uses.

    You won't be able to maintain the perception of performance and stability if the actuality is the opposite. Even Microsoft, with its legendary marketing might, has begun to pick up on this fact a little. (Note how stability has become a marketing point for them; why would it need to be, but for the constant crashing of their existing products?)

    The exact breakdown of an operating system varies from one OS to another. In general, the purpose of any "operating system" is to arbitrate and manage hardware resources. Anything else is basically fluff. XML parsing is an application support issue, and detracts from the core function of managing hardware resources. Occasionally, an application function may be put in the kernel for good reasons, usually related to huge performance advantages gained by an in-kernel implementation. (khttpd is an example of this.) Even this is resisted strongly, because it "pollutes" the most critical code in the entire system, and poses an inherent risk to the stability, integrity and maintainablility of the system as a whole.

    Basically, to add an application-specific function to the kernel, you had better have a really good reason to be suggesting it, one that can be justified (and defended) on a technological basis. If Linus were to allow marketing considerations (such as this) to drive kernel development, not only would he lose the respect of most of his supporters, but the end result would be just as crappy as Windows, sooner or later.

    Given that Linus himself has talked about "world domination", doesn't it seem short-sighted to ignore a major selling point in favour of your petty-minded arguments?

    Keep in mind that "world domination" remarks are somewhat tongue-in-cheek. Yes, he's half-serious, but only half. He wants people to use Linux over Windows because it's a better system. It wouldn't remain better if this approach to kernel development were adopted. Keeping the kernel pure isn't a "petty-minded argument"; it's a critical element of good design.

    All that said, you would have received a much different response had you suggested that Linux systems (as a whole) start integrating XML support , use XML for system configuration and provide XML services for applications. There's a good argument to be made for that, and the marketing value should be similar. There's also technological arguments to be made in favor of it. The distinction here is that this support would all be in "user space" rather than the kernel, even though it might be an integral part of the operation of the system as a whole. The kernel is the core of the system, and the idea of integrating XML into Linux does not imply that it belongs in the kernel.

    --

    Deven

    "Simple things should be simple, and complex things should be possible." - Alan Kay

  3. An open question by DonkPunch · · Score: 5

    The question is: Why do software consumers tolerate this?

    The compatibility breaking between different versions of Word is well-known and oft-maligned. I have a hard time seeing it as anything more than a forced upgrade cycle, where Word users MUST buy the latest version in order to exchange documents.

    There are other document formats which deliver the same power, have been around longer, have not *radically* changed, and are open to implementation by other vendors. HTML and XML-based grammars are only one example of this. PostScript would be an even better example.

    So why have business environments settled on a standard which seems clearly to not be in their best interests? Why do they blindly pay for new versions every few years when their current versions do everything they need and more?

    I'm all for letting the free market determine the best product, but Word strikes me as a solid example of the free market failing in this regard. Perhaps poor consumer education is preventing software from being a truly free market. The feature set of Word is nice, but the upgrade-insuring file format should cause people to run away. I would be skeptical of a car that used non-standard gasoline and forced me to buy an engine upgrade each year to handle new gas.

    How has this been allowed to happen?

    --

    Save the whales. Feed the hungry. Free the mallocs.
  4. Sabotage by overshoot · · Score: 5

    What's the downside? Simple. Lack of tool support. There are lots of portable document formats out there already. MIF is published, WordPerfect doc format is published, even RTF is supposedly for portability, etc. Why not send your customers docs in these formats? Because the word processor that has 94% of the market has no incentive to enable competitors by supporting them, and even has a great deal of incentive to minimize compatibility between its own generations (as you found out.)

    Assuming that any open document standard emerges, you can pretty well bet that saving from the market leader to that format will be an ugly process (have you looked at the HTML that that turkey produces? Blech!) You can also bet that imports from it will be better but still a pain. For real fun, try repetitive translations between the native format and the portable one and compare the starting and end results.

    The sad fact is that monopolists have a huge stake in incompatibility (read the Halloween Documents) and every reason to maintain it. The rest of us will just have to survive in that environment until it changes. Changing it is another topic entirely, but for once I'll say, Vive le France!

    --
    Lacking <sarcasm> tags, /. substitutes moderation as "Troll."
  5. Nope by p3d0 · · Score: 5

    XML can't replace proprietary document formats. That's like asking if ASCII could replace proprietary document formats. XML and ASCII are not really file formats. They simply don't do the same job as file formats.

    If you have ever used lex or yacc, then you'll know what I mean when I say that XML parsers essentially do the job of lex, but not of yacc. An XML parser is little more than a scanner which breaks a file into chunks to simplify the next level of processing. The XML parser gives the illusion of hierarchical processing that lex can't do, but it's an illusion nonetheless.

    Your example of Word formats changing is a perfect one. If Word95 used XML, Word97 could still be incompatible if it used different elements and attributes.

    So no, XML will not replace proprietary file formats. XML + proprietary DTD specifications + proprietary semantics could replace proprietary file formats. Is this an improvement? Probably. Will it make backward (or forward, or sideways) compatibility problems go away? Nope.
    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  6. Biased rant from co-inventor of XML by tbray · · Score: 5

    First off, while there's a place for MS Word, a 3000-page document ain't it. In my experience it tends to severe breakage in this situation.

    Office2K will already save docs in a kind of bastardized HTML++ format which truly sucks because it is neither rules-following HTML nor well-formed XML, and it could have been without much trouble. A little bird has told me that a not-too-distant future release of Office will have a *real* XML save format, which would be cool. I mean, a lot of the tags will still be proprietary MS gibberish, but at least you can parse 'em, and it'll be way less susceptible to inter-version breakage.

    A basic part of the XML dream was the notion that the idea that software packages have proprietary data formats is just as silly as the 80's notion that computer networks should have proprietary per-wire data formats (remember DECnet, Wangnet, SNA?). So what pauly wants is exactly what XML is trying to do.

    Having said that, a lot of the infrastructure we need to make it easy to author and deliver XML isn't here yet.

    What I'm doing these days for complex documents is writing them in HTML++, by which I mean mostly well-formed HTML to which I add my own tags (e.g. , ) whenever I need to; because you can display what you've written in old browsers, which helpfully ignore the non-HTML tags, and you can write perl scripts or use XSL to turn it into RTF if you want to publish paper, and with Mozilla you can write a CSS stylesheet and dress up your own tags the way you want.

    Cheers, Tim Bray