W3C launches Binary XML Packaging
Spy der Mann writes "Remember the recent discussion on Binary XML? Well, there's news. The W3C just released the specs for XML-binary optimized packaging (XOP). In summary, they take binary data out of the XML, and put it in a separate section using MIME-Multipart. You can read the press release and the testimonials from MS, IBM and BEA."
I was drownding in debt. There was no where to turn. My wife left me, my friends all left me. Even my dog, he left me too. I had to do something.
That's when I found Binary XML. They were able to help with the debt. They got the creditors off my back and got me back on my feet.
Thanks Binary XML!
(I thought this was going to be about a standardization of compressing XML files that got rid of the excess bloat in the markup.)
The tech industry seems really starved for ideas lately.
Binary file formats are hard.
Let's use XML because it's easier.
No wait... let's represent that XML in a more efficeint binary format.
Ah yeah that's the ticket - the best of both worlds!
Now let me just fire up my code-morphing processor which, through emulation ahieves x86 compatibility with "low" power consumption. Never mind it's slower overall and has worse MIPS/mW than an underclocked x86 - look Ma, we *inveted* something!!!!
There are some real technical problems out there... why are people chasing non-problems like XML?
except SQL isn't very useful when it comes to technologies like RSS say......
if you're using an XML file in a place where you need a high performance SQL database then you're doing something wrong. If you're using XML as datastorage for some small webapp who cares so long as it's fast enough for that particular application.
Photos.
As you point out, it is the wrong tool for the job, much like using tables to layout HTML pages (as the CSS religionists like to point out).
My 64 million dollar question is why they put an acronym inside another acronym: XOP stands for XMLOP? WTF??!!
They REALLY have too much time on their hands!
slashdot: A failed experiment.
Here's my binary XML-like file format which gives the best of both text and binary file formats. It's human readable and efficient at the same time! Finally, an end to the text-versus-binary wars. Here's an example file:
The following data is in binary.
UH)(&T^( @#t79nui**&tb x9#@ $Y*_@$ji[P{O@JIOHXIOU$HIIU#$hiuoHOP$UJ [etc.]
This seems like it would be an ideal fit for services such as Flickr as it would allow for image (or other binary media files) to be sent with xml data - in a compressed binary format.
As a software developer I find this particularly good.
While I myself would prefer to write a binary protocol and send the data through a TCP socket I can no longer do that.
When we land big contracts at work that deal in government and health the key thing they need now is interoperability with others. What does this mean? XML. Whether or not you like it, XML is here to stay. Its what everyone is pushing.
Therefore we had to adapt and start using it. Not just for B2B, our rich desktop clients now communicate with the server using XML web services.
The problem we've encountered is sending binary data. Right now we have to encode the data in base64 XML which uses lots of resources. I will give more look at this but it looks particularly good.
Unless I'm horribly misreading the specification, it appears to be a way to package up XML documents and binary data that they reference into a neat package with MIME - not a way to convert a (text) XML document into a binary one.
And they're going to do what, say "gzip it" ? The amount of bandwidth and CPU time this wastes is abysmal.
Someone needs to stop these people.
o/~ Join us now and share the software
This is simply a way to reference binary data from within an XML document and to have that binary data included in the same payload (using MIME).
Passing binary data in XML is a big problem. Everybody just invents their own method of doing it (although most are just variations on the theme presented here).
There is a need for this specicification but it is not ground breaking or even particularly /. newsworth.
I thought I was losing my mind.
Ummm...it's "OK". This is probably the least ambitious Binary XML spec imaginable. That may actually be good, but I don't know. Lets see what's up here...
First of all, it's completely impossible to stream this format. All the binary chunks have to be read at some point in the future when the actual XML non-opaque content is complete. In a stream, that never happens. (Of course, XML isn't the most stream friendly protocol...you can't validate a stream.)
Secondly, this isn't wonderful for large files either; you're constantly seeking for binary data that can be many megabytes away. We solve this in web pages by having the images be completely separate (binary) files.
Thirdly, its telling that they used a PNG as a data type. Besides being yet another file format that needs its own custom binary parser (heh, I like PNG, I'm just complaining about it in the XML whinespace), it's big and simple and there's just one there. One of the things I really liked about the various Binary XML formats was the degree to which they expressly typed things like arrays of floating point values or little-endian integers. Converting values between binary and string format is an enormously painful process, one that frankly I'm astonished hasn't received CPU acceleration at this point. Every other Binary XML format has seriously thought about how to efficiently but correctly manage large arrays of such values. XOP just says...heh...you wanna dump alot of data efficiently? Check your typing at the door. Feel free to bring a buffer-overflow ridden parser in with you if you like, though.
Don't get me wrong, there's a fundamental simplicity to XOP that I can certainly understand how it's appealing. But it seems to go so massively against what XML represents that I'm not entirely sure XOP encoded content deserves to be compliant with the very regulations that forced XML adoption in the first place: Opaque formats are too expensive to maintain for any amount of time, therefore either self-describe or don't get deployed. A self-decribing document that says "All performance-critical content is opaque" seems rather counter to this spirit.
"Remember the recent discussion on Binary XML? Well, this has nothing to do with it, but we are proud to present a standard for larding out XML even more before attaching it to an email."
I, for one, welcome our new bandwidth eating plaintext overlords.
Dave
I write a blog now, you should be afraid.
It's time to stop thinking of "web sites" and start thinking along the lines of "web apps" - not the old-style form-based "web app", but more along the lines of gmail - heavily client-side-scripted, nice presentation and data manipulation.
What I see is very few pages (or even just 1 page) as the UI, data exchanged between server and client w/o page refreshes (can be done just w. javascript by sticking the data in iframes with a width and height of 0px, and reading/writing to the iframe. no need for a separate "data window" hidden from view - happy coincidence - I wrote code to do this last night).
These work without a hassle even with popup blockers, etc. It's not necessary to turn off *all* scripting capabilities. Just get a competent browser :-)
It is not binary XML. It is a method to extract binary data that is embeded in XML (e.g. CDATA) and store it outside the XML, but in the same document. It is NOT a method to reduce the text encoding (overhead) of XML to a binary format.
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
In addition the server code is written in perl so for storing status and configuration information, I used serialized perl data strucures processing requirements fell dramatically. With serialized scipt you still have the clear text editing and inspection capabilities without the speed and space issues. for example instead ofIt seems like serialized script code, in either perl, python, java provides the benefits of xml without the headaches.
I've been thinking about the shortcomings of HTML (and everything else that followed it!) from the position of a computer scientist for YEARS... Those standards ARE shitty, big time.
.
Conmtrast this to IEEE standards -- they get developed when a bunch of companies are ready to invest several mega$$ for a chip spin -- and they just want to choose the best course, arguing with each other about technical merit of this or that approach. And in the whole HT|X/ML world there can be (almost) no competition on technical merits, just a bunch of guys arguing if it should be or BAR
I wish I'd have the time on my hands and their budgets to actually try something revolutionary. Leke the original WWW, which was NOT designed by a committee...
Paul B.
XML has become at least two things since its evolution:
The interesting part of the story is that #2 came first. Since then, the W3C has recommended the Infoset abstract concept.
For the developers out there, think of how often you parse the "angle brackets" yourself. Most everyone these days (yes, I know there are exceptions) uses an API which presents elements and attributes in a wire-format-agnostic way.
As a developer, I would love to have the option to flip a switch in my code to permit Binary XML. If I can read and use the Infoset in exactly the same way, why would I object to the wire format being binary instead of text? My API is the same, but the transport is more compact and efficient.
Human-readable wire formats are great for debugging during development, but provide no real advantage in production systems (provided there are utilities available to produce human-readable XML from the binary wire format.)
"Power corrupts, and absolute power corrupts absolutely." -- Lord Acton
Incorrect.
XML, being a text format, required proper text encoding. In particular, XML does not allow most of the codepoints (speaking in unicode terms) between 0 and 31 (tab and newline excluded). If you use UTF-8, you cannot use byte values beyond 126 as those are used for forming higher-value unicode characters. In addition, the five main XML markup characters (< > and &) can only be used in some places.
So, to make a long story short, you base64 everything. For every three bytes you have, you output 4, giving you a 33% increase in space.
outside of the XML document you do not have to require text. data can be considered 8-bit clean and sent in a big binary block.
So for example, an additional requirement of 200 bytes for specifying all the MIME information would be made up for within the first 600 bytes worth of binary data. Even without this space benefit, you get the benefits of a standard way of including binaries, and the ability to potentially access the binary data directly if the transport was indeed 8-bit clean.
Reminds me of a meeting I had a couple of years ago with some representatives for one of the largest market making houses in the US.
Bascially we were promoting an automated trading system and the first question I get is...
"Does it use XML?"
There you have it.
It's too early yet. I'm waiting until MSBinary_XML comes out
,pif,.src,.bat and .exe files within the context of an XML binary and what's more MS will be writting low level OS support into future XP updates and Longhorn and a special API to execute the contents of said MSBinaryXML files. It will also communicate with hooks in IE and Active X Controls and MS's excellent Java.
I hear it's going to introduce 263 special MS tags and nodes and extra layers into the standard that only works on MSWord in Windows XP. It won't validate as XML anymore but who cares. You will use a special version of Front Page to do this.
The files will be a little bigger too, so with MSBinaryXML will add approx 257k thanks to the special proprietary MS extensions but will have superior functionality compared to other types.
It will be particularly good at carrying
this is gonna be sweet. I can't wait
Amazingly, HTML compatibility was easier before it was "standards" this and "standards" that.
Are you *sure* about that ?
<blink >
<marquee >
<object >
<bgsound >
No-one forces you to validate your html (unless you work for me =). Why I come from it's comformance first, compatibility second.
So, You're Against Innovation?
A common misconception is that folks who advocate HTML validation are retro-thinking, "backwater unix geeks" who stubbornly oppose innovation. It's true that many advocates of HTML validation are indeed seasoned computer professionals, who have learned the hard way that portability and compatibility are key elements to ensuring the longevity of any software product (including Web pages).
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
These specs (XOP and MTOM) were created becase Web Services people wanted to be able to add binary attachments to XML messages (in SOAP). Initially the attachment technologies (like SOAP with Attachments) worked by just slapping the binary data alongside the XML message, without a clearly defined processing model for the receiver. Now with XOP attachments are logically in the XML document, but physically transported outside without the bloat of base64 or other XML-safe encodings. It's important to notice that XOP is just an optimization of the situation where binary data is put inside an XML document.
Yesterday was the time to do it right. Are we having a REVOLUTION yet?
And you're right, most people don't want to include huge binary stuff in their XML. But sometimes you DO need to combine XML with huge amounts of binary data. So far, the alternatives have been non-standard wrappers (including people doing more or less what this standard does, by using MIME multipart documents) or base64 or some other space wasting encoding inside the XML document, or wrapping everything in an archival format (like OpenOffice does, for instance).
All this does is define a standard way of letting you keep a document and associated raw binary data together, while allowing you to treat it as if it is inlined in the XML if you so choose.
The principles are exactly the same as for sending an HTML e-mail containing images (or other data) as attachments and referring to them with url's of the format "cid:foo" (they refer to the MIME element with the matching "Content-ID: foo" header.
I think everyone that's posted to this thread to this point has missed the point here. This XOP optimization has nothing to do with making XML more compact or anything. It has to do with delaying latency for large payload transfers and allowing the client application to decide if it wants the large binary payload.
Seriously, you guys need to re-read the article again.
The problem with XML binary payloads now is that you find out that you have a large chunk of payload too late in the game and can't avoid it if you don't need it.
This method allows you to know everything there is to know about the payload before you get to the payload.
In theory, you should be able to skip all the payload data until such time as you really need it, thereby speeding up large transfers of XML data when only the metadata about the payload is required.
Make sense? I think so too.
P.S. Binary XML is entirely different animal.
I create data-driven web apps for a living (i.e. data-driven graphics, UI and text via SVG and HTML), and I firmly believe that XML is the way to go for such creations. It offers a hierarchical structure that is excellent for temporarily storing data pulled from a database, which can then be converted to HTML or SVG or some UI markup (XUL, XForms, or your own thing) via XSLT.
I don't really care that XML is human-readable--I like the fact that because it is extremely well structured, it is therefore easy to create with authoring applications as well as being easy to manipulate real-time by with script (i.e. manipulating its DOM).
I have long wished for a true binary XML spec to make the transmission and parsing/decoding quicker, and this spec isn't it. But I think one day we'll have it, and that won't mean that we've "come full circle" and therefore XML is useless. It just means that we'll have the best of both worlds--speed plus standardized, hierarchical data structures.
Looking for political forums? Check out "The World Forum".