W3C launches Binary XML Packaging
Spy der Mann writes "Remember the recent discussion on Binary XML? Well, there's news. The W3C just released the specs for XML-binary optimized packaging (XOP). In summary, they take binary data out of the XML, and put it in a separate section using MIME-Multipart. You can read the press release and the testimonials from MS, IBM and BEA."
Unless I'm horribly misreading the specification, it appears to be a way to package up XML documents and binary data that they reference into a neat package with MIME - not a way to convert a (text) XML document into a binary one.
I thought I was losing my mind.
.. if they just mimic'd the CDATA section with an equivalent BDATA section. Would be so easy to just jump to the end of the binary data given the BDATA offset and continue on.
I know it's easy, because that's exactly what I did, hacked the gnome libxml library and it worked nicely, was easy to code (yank/paste from CDATA) and best of all it was *fast* without consuming resources like base64 does (tried that too originally).
Ummm...it's "OK". This is probably the least ambitious Binary XML spec imaginable. That may actually be good, but I don't know. Lets see what's up here...
First of all, it's completely impossible to stream this format. All the binary chunks have to be read at some point in the future when the actual XML non-opaque content is complete. In a stream, that never happens. (Of course, XML isn't the most stream friendly protocol...you can't validate a stream.)
Secondly, this isn't wonderful for large files either; you're constantly seeking for binary data that can be many megabytes away. We solve this in web pages by having the images be completely separate (binary) files.
Thirdly, its telling that they used a PNG as a data type. Besides being yet another file format that needs its own custom binary parser (heh, I like PNG, I'm just complaining about it in the XML whinespace), it's big and simple and there's just one there. One of the things I really liked about the various Binary XML formats was the degree to which they expressly typed things like arrays of floating point values or little-endian integers. Converting values between binary and string format is an enormously painful process, one that frankly I'm astonished hasn't received CPU acceleration at this point. Every other Binary XML format has seriously thought about how to efficiently but correctly manage large arrays of such values. XOP just says...heh...you wanna dump alot of data efficiently? Check your typing at the door. Feel free to bring a buffer-overflow ridden parser in with you if you like, though.
Don't get me wrong, there's a fundamental simplicity to XOP that I can certainly understand how it's appealing. But it seems to go so massively against what XML represents that I'm not entirely sure XOP encoded content deserves to be compliant with the very regulations that forced XML adoption in the first place: Opaque formats are too expensive to maintain for any amount of time, therefore either self-describe or don't get deployed. A self-decribing document that says "All performance-critical content is opaque" seems rather counter to this spirit.
What you are talking about is CSV. CSV is great, but it's only any good for table structured data. You can't implement a tree or any arbitrary nested structure like you can in XML.
http://www.perthonline.net
It is not binary XML. It is a method to extract binary data that is embeded in XML (e.g. CDATA) and store it outside the XML, but in the same document. It is NOT a method to reduce the text encoding (overhead) of XML to a binary format.
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
You, and whoever modded you up as "interesting", are an idiot.
This standard is not about representing XML in binary format.
This standard is about representing binary content in an XML document in binary format.
See, previously, if one wanted to include binary data in an XML file one had to Base64 encode it. This takes space and processor time.
This standard moves the bloated Base64 content into a pure binary MIME object.
Maybe you should have RTFA first, eh?
Antti S. Brax - Old school - http://www.iki.fi/asb/
These specs (XOP and MTOM) were created becase Web Services people wanted to be able to add binary attachments to XML messages (in SOAP). Initially the attachment technologies (like SOAP with Attachments) worked by just slapping the binary data alongside the XML message, without a clearly defined processing model for the receiver. Now with XOP attachments are logically in the XML document, but physically transported outside without the bloat of base64 or other XML-safe encodings. It's important to notice that XOP is just an optimization of the situation where binary data is put inside an XML document.
Yesterday was the time to do it right. Are we having a REVOLUTION yet?
And you're right, most people don't want to include huge binary stuff in their XML. But sometimes you DO need to combine XML with huge amounts of binary data. So far, the alternatives have been non-standard wrappers (including people doing more or less what this standard does, by using MIME multipart documents) or base64 or some other space wasting encoding inside the XML document, or wrapping everything in an archival format (like OpenOffice does, for instance).
All this does is define a standard way of letting you keep a document and associated raw binary data together, while allowing you to treat it as if it is inlined in the XML if you so choose.
The principles are exactly the same as for sending an HTML e-mail containing images (or other data) as attachments and referring to them with url's of the format "cid:foo" (they refer to the MIME element with the matching "Content-ID: foo" header.
Then of course you have the problem that your data wants to be variable length. Then you want to have the deliminator actually in the data, so you have to invent escape codes. Then in some lines you want to allow multiple occurances of some of the parameters so you put in some basic markup. Then you want to be sure that any data users enter is of the correct format, so you write a verifier. Then you are basically back at XML again.
XML isn't that great. However take at face value, it saves time and programming errors, the same way I wouldn't expect to have to wite my own doubly-linked-list, or hash table. Neither are complicated, but my language should come with one pre-written which is safer and faster than one I could knock together.
Combination - fun iPhone puzzling
And n is smaller for binary data; in a best-case situation (XML document consists largely of tags rather than text, tags are 10-20 characters but can be reduced to single bytes in a binary encoding), that would mean that switching to binary data would give you O(n/10) parsing, i.e. an order of magnitude faster.
Ergo, binary XML could theoretically give you a considerable performance enhancement.
I've lost count of the number of binary formats I've seen that in hex dump had vast numbers of zero bytes and were thus highly inefficient. The people who work at a "high level" designing such file formats without checking such simple things are poor programmers.
You demonstrate your ignorance once more.