Does the World Need Binary XML?

Binary XML has been around a while... by PipianJ · 2005-01-14 05:57 · Score: 4, Informative

Binary XML is nothing new, as I wager that many people here are already using it, albeit unknowingly.

One of the earliest projects that has tried to make a binary XML (as far as I'm aware) was the EBML (Extensible Binary Meta-Language) which is used in the Matroska media container.

Re:KISS by Ewan · 2005-01-14 06:09 · Score: 2, Informative

gzip uncompression is built into internet explorer, it's used all the time for speeding up the transfer of html to clients.

There's no reason why it couldn't be used for xml just as it is for html.

Ewan

Ummm zip is open by Anonymous Coward · 2005-01-14 06:15 · Score: 1, Informative

While I do like Bzip2 and Gzip better, zip is open. There are numerous open source compression/decompression libraries for it.

Check out the analysis at: by Anonymous Coward · 2005-01-14 06:18 · Score: 1, Informative

http://news.com.com/5208-7345-0.html?forumID=1&thr eadID=4163&messageID=23888&start=-1

How would you grade XML? by Anonymous Coward · 2005-01-14 06:19 · Score: 1, Informative

The design goals for XML are:

1. XML shall be straightforwardly usable over the Internet.
Grade: A

2. XML shall support a wide variety of applications.
Grade: B

3. XML shall be compatible with SGML.
Grade: don't know / don't care

4. It shall be easy to write programs which process XML documents.
Grade: F

5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
Grade: F

6. XML documents should be human-legible and reasonably clear.
Grade: F

7. The XML design should be prepared quickly.
Grade: F

8. The design of XML shall be formal and concise.
Grade: C

9. XML documents shall be easy to create.
Grade: C

10. Terseness in XML markup is of minimal importance.
Grade: A+

Re:Several points. by michaelggreer · 2005-01-14 06:28 · Score: 2, Informative

The problem is that XML is being used for web services which are unlike HTML: the requesting machine will not like waiting 2-3 seconds for the response to the method call. These are interoperating applications, not people downloading text to read, so the response time is much more critical.

I agree that gzip compression is a simple solution to the network problem. It does not address the parsing time problem, and in fact exacerbates it, but in my opinion the network issue is the big one. Time works in favor of faster parsing (faster processors), but works against network issues (more congestion). I would go with compression, test the results, and only then look into a binary solution.

Re:KISS by Ramses0 · 2005-01-14 06:37 · Score: 5, Informative

On the surface that works, but it only solves a portion of the problem.

Data => XML.

XML == large (lots of verbose tags)

XML == slow (have to parse it all [dom], or
build big stacks [sax] to get at data)

Solution:

XML => .xml.gz

You've solved (kindof) the large problem, but you still keep the slow problem.

What they're suggesting is nothing more than:

XML => .xml.gzxml

Basically using a specialized compression schemes that understand the ordered structure of XML, tags, etc, and probably has some indexes to say "here's the locations of all the [blah] tags", attributes so you can just fseek() instead of having to do domwalking or stack-building. This is important for XML selectors (XQuery), and for "big iron" junk, it makes a lot of sense and can save a lot of processing power. Consider that Zip/Tar already do something similar by providing a file-list header as part of their specifications (wouldn't it suck to have completely to unzip a zip file when all you wanted was to be able to pull out a list of the filenames / sizes?)

"Consumer"/Desktop applications already do compress XML (look at star-office as a great example, even JAR is just zipped up stuff which can include XML configs, etc). It's the stream-based data processors that really benefit from a standardized binary-transmission format for XML with some convenient indexes built in.

That is all.

--Robert

XML performance by BillAtHRST · 2005-01-14 06:43 · Score: 2, Informative

The problem is that not everything in a typical XML message is text, so there can be a lot of translation going on between XML text and the binary format that an application needs (e.g., double). In our tests we've found XML to be 100x - 250x SLOWER than other approaches (e.g., JMS MapMessage). (FWIW, the 100x is using the MS parser, the 250x is with Xerces/Xalan). For high-volume, high-performance apps that's simply intolerable. Note that this has nothing to do with size on the wire, which is another consideration entirely.

XML doesn't need to be non-ascii to be small by iabervon · 2005-01-14 06:59 · Score: 3, Informative

Three ideas, in order of increasing significance and increasing difficulty:

Stop using bad DTDs. There seems to be a DTD style in which you avoid using attributes and instead add a whole lot of tags containing text. Any element with a content type of CDATA should be an attribute on its parent, which improves the readability of documents and lets you use ID/IDREF to automatically check stuff. Once you get rid of the complete cruft, it's not nearly so bad.

Now that everything other than HTML is generally valid XML, it's possible to get rid of a lot of the verbosity of XML, too. A new XML could make all close tags "</", since the name of the element you're closing is predetermined and there's nothing permitted after a slash other than a >. The > could be dropped from empty tags, too. If you know that your DTD will be available and not change during the life of the document, you could use numeric references in open tags to refer to the indexed child element type of the type of the element you're in, and numeric references for the indexed attribute of the element it's on. If you then drop the spaces after close quotes, you've basically removed all of the superfluous size of XML without using a binary format, as well as making string comparisons unnecessary in the parser.

Of course, you could document it as if it were binary. An open tag is indicated with an 0x3C, followed by the index of the element type plus 0x30 (for indices under 0xA). A close tag is (big-endian) 0x3C2F. A non-close tag is an open tag if it ends with an 0x3E and an empty tag if it ends with an 0x2F. Attribute indices are followed with an 0x3D. And so forth.

Re:WHO NEEDS FREAKING READABILITY ?! by rikkus-x · 2005-01-14 07:52 · Score: 2, Informative

What should we use instead of XML to encapsulate RPC calls? Something at least semi-human-readable, please. I don't need to be able to read a graphic image, but I'd like to see the name of the method I'm calling, and at least string and text parameters.

And when someone sends me a bunch of data they want importing into a database, in what format should they send it? I'd like to be able to ensure that their data is correct before giving it to my import routine, and when my validator says there's an error, I'd like to be able to see what's wrong by eye.

Suggestions?

Human readability makes it much easier by Baki · 2005-01-14 08:14 · Score: 2, Informative

to make inaccurate interpretations of the data and not using proper and accurate specifications.

Many people claim that XML is so great because you can "just read and understand it" without having to use cumbersome and hard to understand specifications. This exactly is what makes XML, indeed, nice for typesetting purposes like HTML, maybe as an alternative for simple configuration files etc, but indeed NOT for RPC and databases as you write. I couldn't agree more.

I have seen so much time and money lost due to intuitive but false interpretations of XML schema's. People think that because its human readable with "meaningful" tagnames that they don't need a proper spec no more. Well I guess it fits in nicely with todays "cut and paste" programmers who don't really know what they're doing :(.

Re:Isn't this what ASN.1 was for? by PengoNet · 2005-01-14 13:32 · Score: 2, Informative

The "Fast Infoset Project" for creating Binary XML as mentioned in the article is using ASN.1. See this blog entry by Rick Jelliffe for details.

Fast Infoset is to ASN.1 what XML is to SGML. At least if it becomes the standard anyway.

Slashdot Mirror

Does the World Need Binary XML?

12 of 481 comments (clear)