Does the World Need Binary XML?

← Back to Stories (view on slashdot.org)

Does the World Need Binary XML?

Posted by michael on Friday January 14, 2005 @05:52AM from the using-gzip-would-be-too-easy dept.

sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."

2 of 481 comments (clear)

Min score:

Reason:

Sort:

Binary XML has been around a while... by PipianJ · 2005-01-14 05:57 · Score: 4, Informative

Binary XML is nothing new, as I wager that many people here are already using it, albeit unknowingly.

One of the earliest projects that has tried to make a binary XML (as far as I'm aware) was the EBML (Extensible Binary Meta-Language) which is used in the Matroska media container.
Re:KISS by Ramses0 · 2005-01-14 06:37 · Score: 5, Informative

On the surface that works, but it only solves a portion of the problem.

Data => XML.

XML == large (lots of verbose tags)

XML == slow (have to parse it all [dom], or
build big stacks [sax] to get at data)

Solution:

XML => .xml.gz

You've solved (kindof) the large problem, but you still keep the slow problem.

What they're suggesting is nothing more than:

XML => .xml.gzxml

Basically using a specialized compression schemes that understand the ordered structure of XML, tags, etc, and probably has some indexes to say "here's the locations of all the [blah] tags", attributes so you can just fseek() instead of having to do domwalking or stack-building. This is important for XML selectors (XQuery), and for "big iron" junk, it makes a lot of sense and can save a lot of processing power. Consider that Zip/Tar already do something similar by providing a file-list header as part of their specifications (wouldn't it suck to have completely to unzip a zip file when all you wanted was to be able to pull out a list of the filenames / sizes?)

"Consumer"/Desktop applications already do compress XML (look at star-office as a great example, even JAR is just zipped up stuff which can include XML configs, etc). It's the stream-based data processors that really benefit from a standardized binary-transmission format for XML with some convenient indexes built in.

That is all.

--Robert