Slashdot Mirror


Does the World Need Binary XML?

sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."

2 of 481 comments (clear)

  1. Binary XML has been around a while... by PipianJ · · Score: 4, Informative

    Binary XML is nothing new, as I wager that many people here are already using it, albeit unknowingly.

    One of the earliest projects that has tried to make a binary XML (as far as I'm aware) was the EBML (Extensible Binary Meta-Language) which is used in the Matroska media container.

  2. Re:KISS by Ramses0 · · Score: 5, Informative

    On the surface that works, but it only solves a portion of the problem.

    Data => XML.

    XML == large (lots of verbose tags)

    XML == slow (have to parse it all [dom], or
    build big stacks [sax] to get at data)

    Solution:

    XML => .xml.gz

    You've solved (kindof) the large problem, but you still keep the slow problem.

    What they're suggesting is nothing more than:

    XML => .xml.gzxml

    Basically using a specialized compression schemes that understand the ordered structure of XML, tags, etc, and probably has some indexes to say "here's the locations of all the [blah] tags", attributes so you can just fseek() instead of having to do domwalking or stack-building. This is important for XML selectors (XQuery), and for "big iron" junk, it makes a lot of sense and can save a lot of processing power. Consider that Zip/Tar already do something similar by providing a file-list header as part of their specifications (wouldn't it suck to have completely to unzip a zip file when all you wanted was to be able to pull out a list of the filenames / sizes?)

    "Consumer"/Desktop applications already do compress XML (look at star-office as a great example, even JAR is just zipped up stuff which can include XML configs, etc). It's the stream-based data processors that really benefit from a standardized binary-transmission format for XML with some convenient indexes built in.

    That is all.

    --Robert