Slashdot Mirror


Does the World Need Binary XML?

sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."

6 of 481 comments (clear)

  1. KISS by stratjakt · · Score: 5, Interesting

    On the face of it, compressing XML documents by using a different file format may seem like a reasonable way to address sluggish performance. But the very idea has many people -- including an XML pioneer within Sun -- worried that incompatible versions of XML will result.

    I agree with his point.

    What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2?

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:KISS by Ramses0 · · Score: 5, Informative

      On the surface that works, but it only solves a portion of the problem.

      Data => XML.

      XML == large (lots of verbose tags)

      XML == slow (have to parse it all [dom], or
      build big stacks [sax] to get at data)

      Solution:

      XML => .xml.gz

      You've solved (kindof) the large problem, but you still keep the slow problem.

      What they're suggesting is nothing more than:

      XML => .xml.gzxml

      Basically using a specialized compression schemes that understand the ordered structure of XML, tags, etc, and probably has some indexes to say "here's the locations of all the [blah] tags", attributes so you can just fseek() instead of having to do domwalking or stack-building. This is important for XML selectors (XQuery), and for "big iron" junk, it makes a lot of sense and can save a lot of processing power. Consider that Zip/Tar already do something similar by providing a file-list header as part of their specifications (wouldn't it suck to have completely to unzip a zip file when all you wanted was to be able to pull out a list of the filenames / sizes?)

      "Consumer"/Desktop applications already do compress XML (look at star-office as a great example, even JAR is just zipped up stuff which can include XML configs, etc). It's the stream-based data processors that really benefit from a standardized binary-transmission format for XML with some convenient indexes built in.

      That is all.

      --Robert

  2. Maybe this is like comparing assembly to C by Stevyn · · Score: 5, Insightful

    Programs written in assembly can run faster than programs written in C, but it's easier for someone to open a .c file and figure out what's going on.

    I'm sure when C came out, the argument was similar that the performance hit doesn't make up for the readability or cross compatibility. But as computers and network connections became faster, C becomes a more viable alternative.

  3. Amen To That by American+AC+in+Paris · · Score: 5, Insightful
    XML, as originally designed, is deliciously straightforward. Data is encoded into discrete, easy-to-process chunks that any given XML parser can make sense of.

    XML, as implemented today, is often little more than a thin wrapper for huge gobs of proprietary-format data. Thus, any given XML parser can identify the contents as "a huge gob of proprietary data", but can't do a damned thing with it.

    Too many developers have "embraced" XML by simply dumping their data into a handful of CDATA blocks. Other programmers don't want to reveal their data structure, and abuse CDATA in the same way. Thus, a perfectly good data format has been bastardized by legions of lazy/overprotective coders.

    The slew publications exist for the sole purpose of "clarifying" XML serves as testament to the abuse of XML.

    --

    Obliteracy: Words with explosions

  4. Re:there are already standards for this... by rootmonkey · · Score: 5, Insightful

    I'll say it again.. Its not the size of the document its the overhead in parsing.

    --

    Yes but every time I try to see it your way, I get a headache.
  5. The article doesn't go far enough... by Da+VinMan · · Score: 5, Insightful

    It doesn't tell us what the specific performance problems are with XML. Does it take too long to transmit? Does it take too long to validate? Does it take too long to parse? Does it take too long to format? What's the real problem here?

    From experience, I can state that using XML in any high performance situation is easy to screw up. But once you get past the basic mistakes at that level, what other inherent problems are there?

    Oh, and just stating "well, the format is obviously wasteful" just because it's human readable (one of its primary, most useful, features) is NOT an answer.

    I get the feeling that this perception of XML is being perpetuated by vendors who do not really want to open up their data formats. Allowing them to successfully propagate this impression would be a very real step backwards for all IT professionals.

    --
    Please mod this post only if you think others should/n't read this. I have enough ego^H^H^Hkarma. Thanks!