W3C launches Binary XML Packaging

← Back to Stories (view on slashdot.org)

W3C launches Binary XML Packaging

Posted by CowboyNeal on Thursday January 27, 2005 @03:52PM from the huge-config-files-scared dept.

Spy der Mann writes "Remember the recent discussion on Binary XML? Well, there's news. The W3C just released the specs for XML-binary optimized packaging (XOP). In summary, they take binary data out of the XML, and put it in a separate section using MIME-Multipart. You can read the press release and the testimonials from MS, IBM and BEA."

13 of 239 comments (clear)

Min score:

Reason:

Sort:

Uhhh... by Phexro · 2005-01-27 16:10 · Score: 5, Informative

Unless I'm horribly misreading the specification, it appears to be a way to package up XML documents and binary data that they reference into a neat package with MIME - not a way to convert a (text) XML document into a binary one.
1. Re:Uhhh... by Anonymous Coward · 2005-01-27 16:15 · Score: 1, Informative
  
  You'd be entirely correct.
  
  Of course, this is Slashdot.
2. Re:Uhhh... by Anonymous Coward · 2005-01-27 23:01 · Score: 1, Informative
  
  XOP doesn't define a packaging format, MTOM does define that the packaging used for SOAP is MIME.
  With XOP you can define that your packaging format is zip or tar or jar or whatever suits your application.
Thank you! by gammoth · 2005-01-27 16:23 · Score: 3, Informative

I thought I was losing my mind.
Could have been simpler by Anonymous Coward · 2005-01-27 16:27 · Score: 1, Informative

.. if they just mimic'd the CDATA section with an equivalent BDATA section. Would be so easy to just jump to the end of the binary data given the BDATA offset and continue on.

I know it's easy, because that's exactly what I did, hacked the gnome libxml library and it worked nicely, was easy to code (yank/paste from CDATA) and best of all it was *fast* without consuming resources like base64 does (tried that too originally).
Critiques by Effugas · 2005-01-27 16:29 · Score: 4, Informative

Ummm...it's "OK". This is probably the least ambitious Binary XML spec imaginable. That may actually be good, but I don't know. Lets see what's up here...

First of all, it's completely impossible to stream this format. All the binary chunks have to be read at some point in the future when the actual XML non-opaque content is complete. In a stream, that never happens. (Of course, XML isn't the most stream friendly protocol...you can't validate a stream.)

Secondly, this isn't wonderful for large files either; you're constantly seeking for binary data that can be many megabytes away. We solve this in web pages by having the images be completely separate (binary) files.

Thirdly, its telling that they used a PNG as a data type. Besides being yet another file format that needs its own custom binary parser (heh, I like PNG, I'm just complaining about it in the XML whinespace), it's big and simple and there's just one there. One of the things I really liked about the various Binary XML formats was the degree to which they expressly typed things like arrays of floating point values or little-endian integers. Converting values between binary and string format is an enormously painful process, one that frankly I'm astonished hasn't received CPU acceleration at this point. Every other Binary XML format has seriously thought about how to efficiently but correctly manage large arrays of such values. XOP just says...heh...you wanna dump alot of data efficiently? Check your typing at the door. Feel free to bring a buffer-overflow ridden parser in with you if you like, though.

Don't get me wrong, there's a fundamental simplicity to XOP that I can certainly understand how it's appealing. But it seems to go so massively against what XML represents that I'm not entirely sure XOP encoded content deserves to be compliant with the very regulations that forced XML adoption in the first place: Opaque formats are too expensive to maintain for any amount of time, therefore either self-describe or don't get deployed. A self-decribing document that says "All performance-critical content is opaque" seems rather counter to this spirit.
Re:nothing else to work on? by lupin_sansei · 2005-01-27 17:04 · Score: 2, Informative

What you are talking about is CSV. CSV is great, but it's only any good for table structured data. You can't implement a tree or any arbitrary nested structure like you can in XML.

--
http://www.perthonline.net
RTFA - Re:Binary... XML... Nah! by Ignominious+Cow+Herd · 2005-01-27 17:19 · Score: 3, Informative

It is not binary XML. It is a method to extract binary data that is embeded in XML (e.g. CDATA) and store it outside the XML, but in the same document. It is NOT a method to reduce the text encoding (overhead) of XML to a binary format.

--
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
Re:nothing else to work on? by asb · 2005-01-27 18:33 · Score: 2, Informative

You, and whoever modded you up as "interesting", are an idiot.

This standard is not about representing XML in binary format.

This standard is about representing binary content in an XML document in binary format.

See, previously, if one wanted to include binary data in an XML file one had to Base64 encode it. This takes space and processor time.

This standard moves the bloated Base64 content into a pure binary MIME object.

Maybe you should have RTFA first, eh?

--
Antti S. Brax - Old school - http://www.iki.fi/asb/
Attachments, Not Binary XML by Kopretinka · 2005-01-27 20:28 · Score: 2, Informative

These specs (XOP and MTOM) were created becase Web Services people wanted to be able to add binary attachments to XML messages (in SOAP). Initially the attachment technologies (like SOAP with Attachments) worked by just slapping the binary data alongside the XML message, without a clearly defined processing model for the receiver. Now with XOP attachments are logically in the XML document, but physically transported outside without the bloat of base64 or other XML-safe encodings. It's important to notice that XOP is just an optimization of the situation where binary data is put inside an XML document.

--
Yesterday was the time to do it right. Are we having a REVOLUTION yet?
Re:base64? by vidarh · 2005-01-27 22:55 · Score: 3, Informative

Duh. Read the spec. Most people who include binary in XML DO base64 encode it. But base64 wastes a lot of space. If you want to include larger amounts of binary data, this standard lets you save space by using a MIME wrapper and referencing a MIME part containing the raw binary data from the document instead of inlining it directly.
And you're right, most people don't want to include huge binary stuff in their XML. But sometimes you DO need to combine XML with huge amounts of binary data. So far, the alternatives have been non-standard wrappers (including people doing more or less what this standard does, by using MIME multipart documents) or base64 or some other space wasting encoding inside the XML document, or wrapping everything in an archival format (like OpenOffice does, for instance).
All this does is define a standard way of letting you keep a document and associated raw binary data together, while allowing you to treat it as if it is inlined in the XML if you so choose.
The principles are exactly the same as for sending an HTML e-mail containing images (or other data) as attachments and referring to them with url's of the format "cid:foo" (they refer to the MIME element with the matching "Content-ID: foo" header.
Re:nothing else to work on? by Chris_Jefferson · 2005-01-27 23:09 · Score: 3, Informative

Whatever happened to the virtues of simplicity, like a file containing a header record detailing the field names, and rows containing the data in either fixed-length or delimited form? Damn fast to implement, debug, read from and write to. Parsing? What parsing? Read the first line, split it to get your headers, and read 1 line per record.
Then of course you have the problem that your data wants to be variable length. Then you want to have the deliminator actually in the data, so you have to invent escape codes. Then in some lines you want to allow multiple occurances of some of the parameters so you put in some basic markup. Then you want to be sure that any data users enter is of the correct format, so you write a verifier. Then you are basically back at XML again.
XML isn't that great. However take at face value, it saves time and programming errors, the same way I wouldn't expect to have to wite my own doubly-linked-list, or hash table. Neither are complicated, but my language should come with one pre-written which is safer and faster than one I could knock together.

--
Combination - fun iPhone puzzling
Re:More bloat! by Anonymous Coward · 2005-01-28 01:44 · Score: 1, Informative
Good parsers are damn fast and can operate in O(n) time.

And n is smaller for binary data; in a best-case situation (XML document consists largely of tags rather than text, tags are 10-20 characters but can be reduced to single bytes in a binary encoding), that would mean that switching to binary data would give you O(n/10) parsing, i.e. an order of magnitude faster.

Ergo, binary XML could theoretically give you a considerable performance enhancement.

I've lost count of the number of binary formats I've seen that in hex dump had vast numbers of zero bytes and were thus highly inefficient. The people who work at a "high level" designing such file formats without checking such simple things are poor programmers.

You demonstrate your ignorance once more.
- Those vast numbers of zero bytes will be being used to pad records to equal lengths. This can make lookups O(1).
- Those vast numbers of zero bytes (a) compress better than ASCII text in any decent compression program (so gzip would do much better on those formats than text-based XML) and (b) take up NO SPACE AT ALL on any modern file system that can handle sparse files.