W3C launches Binary XML Packaging
Spy der Mann writes "Remember the recent discussion on Binary XML? Well, there's news. The W3C just released the specs for XML-binary optimized packaging (XOP). In summary, they take binary data out of the XML, and put it in a separate section using MIME-Multipart. You can read the press release and the testimonials from MS, IBM and BEA."
The way I see it, XML's only benefit over something like SQL is that it -is- plain text and easily user modifiable. Binary XML seems to me more like a step backwards than a step fowards. Of course, I've never understood the buzzwordiness of XML anyway. Things like SOAP make it seem like a protocol when it's a format. I think that the W3C should be spending their time on XML implimentations like SVG, MathML and XHTML, not on things like this.
My Systems
I was drownding in debt. There was no where to turn. My wife left me, my friends all left me. Even my dog, he left me too. I had to do something.
That's when I found Binary XML. They were able to help with the debt. They got the creditors off my back and got me back on my feet.
Thanks Binary XML!
(I thought this was going to be about a standardization of compressing XML files that got rid of the excess bloat in the markup.)
The tech industry seems really starved for ideas lately.
Binary file formats are hard.
Let's use XML because it's easier.
No wait... let's represent that XML in a more efficeint binary format.
Ah yeah that's the ticket - the best of both worlds!
Now let me just fire up my code-morphing processor which, through emulation ahieves x86 compatibility with "low" power consumption. Never mind it's slower overall and has worse MIPS/mW than an underclocked x86 - look Ma, we *inveted* something!!!!
There are some real technical problems out there... why are people chasing non-problems like XML?
I'll take a binary extra medium large to go please.
or is that our XML-binary Optimised Packaged overlords?
This sig is intentionally blank
Here's my binary XML-like file format which gives the best of both text and binary file formats. It's human readable and efficient at the same time! Finally, an end to the text-versus-binary wars. Here's an example file:
The following data is in binary.
UH)(&T^( @#t79nui**&tb x9#@ $Y*_@$ji[P{O@JIOHXIOU$HIIU#$hiuoHOP$UJ [etc.]
these testimonials are so much like the TWO THUMBS UP !! type things you see for movies like this one
This seems like it would be an ideal fit for services such as Flickr as it would allow for image (or other binary media files) to be sent with xml data - in a compressed binary format.
As a software developer I find this particularly good.
While I myself would prefer to write a binary protocol and send the data through a TCP socket I can no longer do that.
When we land big contracts at work that deal in government and health the key thing they need now is interoperability with others. What does this mean? XML. Whether or not you like it, XML is here to stay. Its what everyone is pushing.
Therefore we had to adapt and start using it. Not just for B2B, our rich desktop clients now communicate with the server using XML web services.
The problem we've encountered is sending binary data. Right now we have to encode the data in base64 XML which uses lots of resources. I will give more look at this but it looks particularly good.
.. virus scanner picking it up!
evil virus/xml?
Unless I'm horribly misreading the specification, it appears to be a way to package up XML documents and binary data that they reference into a neat package with MIME - not a way to convert a (text) XML document into a binary one.
Personally, I think it sounds like a good idea.
Large amounts of information stored in XML format can be costly on storage and network transmission and without some kind of compression. Basically, it's a waste of space.
Just because it's compressed, doesn't mean it can't be human readable... your favourite XML reader/editor will just have to implement the new standard to de/compress it, and it'll be back to it's human readable state.
We all know how much space you can save if you zip a large text file or bmp, so whats wrong with compressing a bit of XML.
This sounds like the era of small/efficient XML is gonna end: here comes the bloat! (Remember Netscape)
All your Sybase are belong to us.
And they're going to do what, say "gzip it" ? The amount of bandwidth and CPU time this wastes is abysmal.
Someone needs to stop these people.
o/~ Join us now and share the software
XML:XOP:Is this right? So the benefit is just standardizing the binary representation using MIME? But that doesn't make the tags less verbose... so how is this faster than XML?
The ENIAC Demo Competition
This is simply a way to reference binary data from within an XML document and to have that binary data included in the same payload (using MIME).
Passing binary data in XML is a big problem. Everybody just invents their own method of doing it (although most are just variations on the theme presented here).
There is a need for this specicification but it is not ground breaking or even particularly /. newsworth.
I thought I was losing my mind.
I really don't understand. XML is great for information that might need to be easily parsed, readed, or changed by a number of applications. In fact; it's probably something that we've needed a lot longer than we've had it. But that's where it ends.
.dat files). Still when I have been toying around with my own serializer; I haven't found a need to actually need to hand change any of this information.
Usually (at least, from my limited experience) a serialized object from a program is normally only needed to be loaded up by that program (which usually results in their own
Why reinvent the wheel though? It seems this would only be useful in web application for a database of sorts. The only usefulness (actual useful ness) would be for a document format (Word, Excel, Access, etc). This is the type of format that you would want to be portable; easily changeable (if need be) by hand; and thrown across several different platforms where it can be read.
I guess that would make there to be a use for it then; but do we really need a standard for the such? Anyone who plans on using this idea will create their own standard anyway (look what Apple has done with their new format for Pages).
I'm f#$king magic!
[CSV is] Easy to manipulate via javascript on the client. Simple to display and manipulate via the DOM (Document Object Model).
Problem: A lot of users will try to browse your site with DOM scripting turned off to avoid the more annoying advertisements and malware installers. What will you put in the <noscript> element?
.. if they just mimic'd the CDATA section with an equivalent BDATA section. Would be so easy to just jump to the end of the binary data given the BDATA offset and continue on.
I know it's easy, because that's exactly what I did, hacked the gnome libxml library and it worked nicely, was easy to code (yank/paste from CDATA) and best of all it was *fast* without consuming resources like base64 does (tried that too originally).
Ummm...it's "OK". This is probably the least ambitious Binary XML spec imaginable. That may actually be good, but I don't know. Lets see what's up here...
First of all, it's completely impossible to stream this format. All the binary chunks have to be read at some point in the future when the actual XML non-opaque content is complete. In a stream, that never happens. (Of course, XML isn't the most stream friendly protocol...you can't validate a stream.)
Secondly, this isn't wonderful for large files either; you're constantly seeking for binary data that can be many megabytes away. We solve this in web pages by having the images be completely separate (binary) files.
Thirdly, its telling that they used a PNG as a data type. Besides being yet another file format that needs its own custom binary parser (heh, I like PNG, I'm just complaining about it in the XML whinespace), it's big and simple and there's just one there. One of the things I really liked about the various Binary XML formats was the degree to which they expressly typed things like arrays of floating point values or little-endian integers. Converting values between binary and string format is an enormously painful process, one that frankly I'm astonished hasn't received CPU acceleration at this point. Every other Binary XML format has seriously thought about how to efficiently but correctly manage large arrays of such values. XOP just says...heh...you wanna dump alot of data efficiently? Check your typing at the door. Feel free to bring a buffer-overflow ridden parser in with you if you like, though.
Don't get me wrong, there's a fundamental simplicity to XOP that I can certainly understand how it's appealing. But it seems to go so massively against what XML represents that I'm not entirely sure XOP encoded content deserves to be compliant with the very regulations that forced XML adoption in the first place: Opaque formats are too expensive to maintain for any amount of time, therefore either self-describe or don't get deployed. A self-decribing document that says "All performance-critical content is opaque" seems rather counter to this spirit.
atleast this spec is some what useful. What is more useless is XML signature, which feels like a total waste. talk about a solution looking for a problem.
"Remember the recent discussion on Binary XML? Well, this has nothing to do with it, but we are proud to present a standard for larding out XML even more before attaching it to an email."
I, for one, welcome our new bandwidth eating plaintext overlords.
Dave
I write a blog now, you should be afraid.
Binary XML will enable superior buzz word compatibility to existing technologies. Here's your syntax for Word 2006 format:
<xml><bin src="oldWordCrap.doc"/></xml>
Now everyone can use Microsoft formats! It's open, it's flexible, it's human parsable and compatible with all XML-enabled solutions *.
*) Some assembly needed (pun intended).
Binary XML? Why not just gzip it?
Now I don't know much about XML, but I think you should check the link below.
Official news link
It will get out of control and we'll be lucky to live through it.
Seriously, this is strongly reminiscent of designing C++ APIs, called only in-process by C++ code, that use XML blobs for every single parameter type. I came across one of these and asked the "architect" why he chose to use XML for every parameted (at significant cost).
"Well, you know, it's XML," he said.
"And?" I asked.
"Well, it's... I mean, c'mon, it's extensible," he explained.
When all you have is a hammer, everything starts to look like a nail. And there are too many developers who are a few parts short of a toolbox.
If you mod me down, I shall become more powerful than you can possibly imagine.
This page is designed to be viewed with scripting enabled. Enable scripting and click here to refresh, or click here [goatse/tubgirl]
Which text-based web UA supports DOM scripting? Or would you deliberately shut out users with vision disabilities and run the risk of losing your U.S. government contracts under Section 508 and having to defend a lawsuit under the Americans with Disabilities Act or foreign counterparts?
It is not binary XML. It is a method to extract binary data that is embeded in XML (e.g. CDATA) and store it outside the XML, but in the same document. It is NOT a method to reduce the text encoding (overhead) of XML to a binary format.
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
I'd tell them to switch to firefox :-)
How well do the major Gecko-based web UAs work with screen readers, compared to say Lynx/Links/w3m?
For those who didn't RTFA:
The main application of this XML-referencing-to-binary-attachments is SOAP, and that means web services.
In other words, you can simplify your God-help-me-XML-handling-and-parsing-code into something maybe 10% simpler. This means leaving the binary stuff OUT OF THE XML PARSER, putting it into the upper levels or processing. Cleaner, faster.
Also, it helps adaptive compression (gzip) by tightening up the textual data - remember web services are about information transfer, not storage.
Can anyone please explain what the heck all these buzzwords mean: like SOAP, XAML, etc. I understand XML (s-expressions but with neat angled brackets!) but the rest could use concise descriptions.
In addition the server code is written in perl so for storing status and configuration information, I used serialized perl data strucures processing requirements fell dramatically. With serialized scipt you still have the clear text editing and inspection capabilities without the speed and space issues. for example instead ofIt seems like serialized script code, in either perl, python, java provides the benefits of xml without the headaches.
Only 120 posts before someone said something more intelligent than, 'Binary XML, blecchho!'
:)
Only wish I had a prize for you.
Lump lingered last in line for brains, and the ones she got were sorta rotten and insane.
Grab the data directly into javascript with xmlhttprequest. However, the web has many Ui conventions now which people don't want broken, even if its better. I still keep hitting the back button in Gmail, and I think it should work that way.
Data:URL! Implement Data:URL!
Encodes the frickin MIME type and base64 encoding inline!
Hey, where's the frickin binary tokenization dictionary of the XML elements for compact XML, with encoding schemes?
This is just an advertisement for all 835 MIME RFCs.
Find me an XML file that is not already represented as binary data. Oh, not looking so revolutionary now, is it?
Wait, you say this allows xml to reference binary data? I say "href" attribute, bi-atch, look it up.
You say, but no, it allows you to send the binary data along in the same stream / document? Check out multipart/mime. It's been around a long time.
Here's a wild thought. Have the XML file reference it's binary resources by relative filenames. Tar the XML file together with the resources. Now pay me $100,000 in consulting fees.
Gay!
So does this mean a 20k torrent can be actually inside an XML list now instead of just a link to it?
If so, I think distribution just got a whole lot easier.
Then again the poor servers sending 10k files have a hard enough time with people checking every 10 minutes, imagine if the file was 200k, crikey!
The circle is complete. We started with binary format, moved to XML for readability purposes and then switched XML back to binary for speed.
Obviously someone needs a knock on the head - when you design your application, don't you think about such things as a balance between performance and maintainability first and then implement what is suited better for your specific case? Obviously not! Just a little while ago everyone and their grandmother switched to XML for whatever reason but then they realized: -OMFG, XML processing is processor intensive! I probably shouldn't have handled every single internal type as an XML string that needs parsing and typecasting for every operation. I probably should have used more suitable memory structures for those data structures that are used within the same application on the same computer and not used a gigantic XML just because I can! What to do what to do? OH, I KNOW! Let's change this char based XML into a binary XML, that will make it faster! (It won't make it more human readable, that's for sure.)
So what's next? A char based XML that wraps around a binary XML for readability? A binary wrapper for a char based XML wrapper to a binary wrapper around a char based XML wrapper for recursive processing?
You can't handle the truth.
I was just involved in writing a routine that involved Base64-encoding something to be included in XML, then GZipping the whole XML file to recover the space the encoding tacks on. Now I can just do a XOP file, Awesome!
To blog is sublime
Amen brother. XM bloat bullsh-t needs to die the death it so richly deserves.
Wouldn't it make more sense to include the B for Binary, which is the essential purpose of the new "standard"? Plus, XBop sounds more natural when spoken than XOP does, and it's way more fun too! :)
putfwd.com - 1GB Free file storage with a twist
I've been thinking about the shortcomings of HTML (and everything else that followed it!) from the position of a computer scientist for YEARS... Those standards ARE shitty, big time.
.
Conmtrast this to IEEE standards -- they get developed when a bunch of companies are ready to invest several mega$$ for a chip spin -- and they just want to choose the best course, arguing with each other about technical merit of this or that approach. And in the whole HT|X/ML world there can be (almost) no competition on technical merits, just a bunch of guys arguing if it should be or BAR
I wish I'd have the time on my hands and their budgets to actually try something revolutionary. Leke the original WWW, which was NOT designed by a committee...
Paul B.
Pfft. Everyone's still reinventing the wheel. Lispers have been throwing S-expressions back and forth long before XML started down the same path.
There's something fishy about developers who use XML for Rich Web Client Desktop-Like Application Applets.
You should be using a string and two cans instead.
Technology stinks when SnapperHeads get their hands on it.
It's the year of Linux! To celebrate I have x free hotmail accounts to give away
Oh goodie. You're reinventing Curl. Itself a basterized version of Lisp.
XML has become at least two things since its evolution:
The interesting part of the story is that #2 came first. Since then, the W3C has recommended the Infoset abstract concept.
For the developers out there, think of how often you parse the "angle brackets" yourself. Most everyone these days (yes, I know there are exceptions) uses an API which presents elements and attributes in a wire-format-agnostic way.
As a developer, I would love to have the option to flip a switch in my code to permit Binary XML. If I can read and use the Infoset in exactly the same way, why would I object to the wire format being binary instead of text? My API is the same, but the transport is more compact and efficient.
Human-readable wire formats are great for debugging during development, but provide no real advantage in production systems (provided there are utilities available to produce human-readable XML from the binary wire format.)
"Power corrupts, and absolute power corrupts absolutely." -- Lord Acton
The idea of a plain text file to intercommunicate computers could easily be the worst idea in computer history that has succeeded.
Reminds me of a meeting I had a couple of years ago with some representatives for one of the largest market making houses in the US.
Bascially we were promoting an automated trading system and the first question I get is...
"Does it use XML?"
There you have it.
It's too early yet. I'm waiting until MSBinary_XML comes out
,pif,.src,.bat and .exe files within the context of an XML binary and what's more MS will be writting low level OS support into future XP updates and Longhorn and a special API to execute the contents of said MSBinaryXML files. It will also communicate with hooks in IE and Active X Controls and MS's excellent Java.
I hear it's going to introduce 263 special MS tags and nodes and extra layers into the standard that only works on MSWord in Windows XP. It won't validate as XML anymore but who cares. You will use a special version of Front Page to do this.
The files will be a little bigger too, so with MSBinaryXML will add approx 257k thanks to the special proprietary MS extensions but will have superior functionality compared to other types.
It will be particularly good at carrying
this is gonna be sweet. I can't wait
These specs (XOP and MTOM) were created becase Web Services people wanted to be able to add binary attachments to XML messages (in SOAP). Initially the attachment technologies (like SOAP with Attachments) worked by just slapping the binary data alongside the XML message, without a clearly defined processing model for the receiver. Now with XOP attachments are logically in the XML document, but physically transported outside without the bloat of base64 or other XML-safe encodings. It's important to notice that XOP is just an optimization of the situation where binary data is put inside an XML document.
Yesterday was the time to do it right. Are we having a REVOLUTION yet?
..if you're going to transfer a small fixed header, XML is not for you. If you actually have no binary data of signifiance, XOP is not for you. Let's say you want to include a 300kB picture in your XML. Your choices are:
1. External link (unpractical)
2. XML/Base64 encoding (~450kB)
3. XOP/binary encoding (~300kB)
In that case, your 30+ lines of extra code are completely irrelevant. That being said, I was under the impression that you could do this already by sending your binary data in a "document fragment" element, where the only valid characters to end it would be ]]> or so (been a while since I looked at it). A simple remap of that one combo using e.g. a yEnc variety ahould let you pass binary data with minimal overhead. It'd be a hack though, the "document fragment" is intended to let you place XML fragments there, hence it ignores normal XML tags.
Kjella
Live today, because you never know what tomorrow brings
Why not just base64 encode what you need to? Jeez, talk about a solution looking for a problem. Who in the world is encoding huge binary stuff in their XML anyway?
but isn't "binary XML" just "zipped XML"
What's the point of trying to compress an internal XML binary element if you have an external compressor? In fact, at that point, why even have even a CDATA binary internal elemement?
Private binary formats are dead, guys.
Reading the newest raft of W3C standards, complete with examples showing the increases in message size and total complexity at each step, I feel as if I have FINALLY understood how UK-style socialism works. And why the US is in Iraq. And why the tax code is 250,000 pages long, and why New Coke was created, and why there are people who will genuinely refuse to read a document if contains a diagram that is not in the most recent version of UML.
It's all a product of the same kind of thinking.
Whence? Hence. Whither? Thither.
Mod parent up (since I can't and I'm fed up of making the same point)
yes, but is it available in PDF format?
Why would an external link be unpractical?
It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
You just (re-?)invented TeX.
It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
what about matroska ?
isn't it supposed to be binary XML ?
It is wearisome to see the same old misconceptions and plain old wrongness of most of the posts.
1) XML is defined as "machine-readable", not "human-readable". With an XML file and a schema, no more knowledge is needed to parse, transform or generally munge XML data using generic algorithms. That cannot be said for binary data formats. That is why XML exists - to avoid all your wrong-headed spaghetti coding and homebrew formats from preventing systems from speaking to each other!
2) XOP applies the same disciplines of XML to binary data representation, packaging and referencing. And I don't need to pay you $1,000 a day to tell me how to re-hack your already-hacked workaround binary packaging methods.
3) Plaintext data is no worse than binary data when sending over TCP/IP. Guess what? IP routers are optimised to compress plaintext pre-transmission. If you send binary, you slow down the internal compression engines in the router. Ergo, TCP/IP don't care about how your bits are arranged - it's information density that counts. A 100K text document containing nothing but the same character is faster to transmit than a 10K JPG. Try it - it's true!
Rant over, and out.
idiots.
-- ac at work
Buy binary vioxx! It'll flip your bit!
Ironic that CowboyNeal posts this article then...
It looks like we've come full circle.
Remember how XML used to be lauded as the solution to binary-only data files? Har dee har har. This is not to bash the utility of this sort of format -- XML is pretty much unwritable by hand unless you have the right kind of editor, and binary XML is still far more a regular format than, say, microsoft word's.
Personally, I still like gzip compressed XML better. At least it's easily decodable and most every application has access to the decompression routines if the used XML parser doesn't support it right off the bat.
There are two types of engineers. Those who can get the job done, and those who will try to use the buzzwords of the day to get the job done. The former tends to be a better engineer, while the latter tends to look better at the beginning of the development process.
...it is like running in circles. The buzzword engineers need to put in their place. Let the real guys lead again, please.
XML will come full circle when true binary XML is a w3c standard. People will be using high-level GUIs to generate text-based XML files, which will be converted into binary XML. On the other end, somebody will receive binary XML, convert it to text-based XML, open it in their application that presents the data in a high-level graphical format.
Only then and only maybe, will everyone begin to realize what a farce XML really is. Its selling point was that it was easily human readable, as well as machine processable unlike other more simple formats. Everyone will realize that XML was never very readable (unless you were blind to angled brackets) nor was it easy to efficiently parse compared to existing formats. Everyone will realize how inefficient XML is, and with the roundabout nature of using GUIs to generate XML and then another tool to convert it to binary and then another for binary to text again and then a final GUI...
Hear hear. As soon as XMLHttpRequest is a proper standard I'm jumping on it like a shot. Like Gmail, it's time to raise the bar on which browsers we web developers are going to support. CSS2, DOM 2, and XHTML please. IE6 just about makes the cut ;)
I create data-driven web apps for a living (i.e. data-driven graphics, UI and text via SVG and HTML), and I firmly believe that XML is the way to go for such creations. It offers a hierarchical structure that is excellent for temporarily storing data pulled from a database, which can then be converted to HTML or SVG or some UI markup (XUL, XForms, or your own thing) via XSLT.
I don't really care that XML is human-readable--I like the fact that because it is extremely well structured, it is therefore easy to create with authoring applications as well as being easy to manipulate real-time by with script (i.e. manipulating its DOM).
I have long wished for a true binary XML spec to make the transmission and parsing/decoding quicker, and this spec isn't it. But I think one day we'll have it, and that won't mean that we've "come full circle" and therefore XML is useless. It just means that we'll have the best of both worlds--speed plus standardized, hierarchical data structures.
Looking for political forums? Check out "The World Forum".
And we have come full circle.
computers: 1
language designers: 0
Yea right.... It will be MSMTOM and won't work with anything BUT IE and M$ products. Look at what they did with their version of XML.
Read this if you want to know about binary XML (http://www.w3.org/TR/wbxml/).
The XOP proposal is a mechanism to represent and refer to binary data in an XML document.
XOP is not a proposal to compress XML documents.
You might say, oh, I can use CDATA, right.
Unfortunately, no. CDATA cannot be reliably used because the character range for CDATA is loosely 0x9, 0xA, 0xD, and anything above 0x20. (http://www.w3.org/TR/REC-xml/#NT-Char)
Currently, you have to resort to your own scheme to reliably include binary data in an XML document.-- Rob
The poster has mixed up. This is what W3C are doing with binary XML
http://www.w3.org/XML/Binary/
I don't find them publish anything yet.
What the poster link to is a spec to include binary data in XML message, mostly for use in SOAP.
"XOP" and "XMLOP" are acronyms. "XOP" is pronounced "zahp". "XMLOP" is pronounced "ZIHM-lahp". Any abbreviation can be an acronym. For example, "WTF" is pronounced "whihtf", and "GWB" is pronounced "gwuhb".
who remind you that a 8 byte set of parameters returning a 2 byte answer is enhanced by its protective wrapping of 60KB.