Does the World Need Binary XML?
sebFlyte writes "One of XML's founders says 'If I were world dictator, I'd put a kibosh on binary XML' in this interesting look at what can be done to make XML better, faster and stronger."
← Back to Stories (view on slashdot.org)
On the face of it, compressing XML documents by using a different file format may seem like a reasonable way to address sluggish performance. But the very idea has many people -- including an XML pioneer within Sun -- worried that incompatible versions of XML will result.
I agree with his point.
What's wrong with just compressing the XML as it is with an open and easy-to-implement algorithm like gzip or bzip2?
I don't need no instructions to know how to rock!!!!
Somebody fill me in ...
... its called zipping, most webservers have it as an option to zip the data up as it streams to the client browser
i fail to see the need to have a "binary xml" file format when there are already facilities in place to compress text streams
IBM has actually tried to introduce some goofy stuff into the XML standards, like line breaks, etc, that should not be in a pure node-based system like XML. Why are not you picking on them in your comment?
As far as SOAP and XML Web Services (standardized protocols for XML RPC transactions) Microsoft was way ahead of the pack. And I rather enjoy using their rich set of .NET XML classes to talk to our Unix servers. It helps my company interop.
Great ideas often receive violent opposition from mediocre minds. - Albert Einstein
However, if anything, XML has shown us the power of well-structured information. XML has given the possibility of universal interoperability. Developments in XML-based technologies have led us to the point where we know enough now to create a standard for structured information that will last for several decades.
It's time that we had a new ASCII. That standard should be binary XML.
When I think of the time that has been wasted by every developer in the history of Computer Science, writing and rewriting basic parsing code, I shudder. Binary XML would produce a standard such that an efficient, universal data structure language would allow significant advances in what is technically possible with our data. For example: why is what we put on disk any different from what's in memory? Binary XML could erase this distinction.
A binary XML standard needs to become ubiquitous, so that just as Notepad can open any ASCII file today, SuperNotepad could open any file in existance, or look at any portion of your computer's memory, in an informative, structured manner. What's more, we have the technology to do this now.
In my hands, bzip compresses better, but is somewhere between somewhat slower and orders of magnitude slower on my system, depending on the options used to invoke the command and the size of the file being compressed. gzip is fast, works on streams instead of blocks, and is available on nearly every system.
I think that's where the true problem lies. HTTP.
.gz files, .zip files etc. since that would be pointless).
We need to look towards http 2.0. What I would want:
- pipelining that works, so that it could be enabled for use on any server that supports http 2.0
- gzip and 7zip support.
- All data is compressed by default (a few excludes such as
- Option to initiate persistant connection (remove the stateless protocol concept), via a http header on connect. This would allow for a whole new level for web applications via SOAP/XML.
There are tons of other things that could be enhanced for today's uses.
HTTP is the problem. Not XML
form Re: Lisp syntax, what about resynchronization?
Attributes in XML are inherited from SGML and they were thingking markup for textual documents. When you want to represent data it being attribute or not is completely irrelevant.
Deep explanation: From:The horror that is XML
Dyslexics have more fnu.
Had data to be delivered to client, dumped from a database. As flat files they were ~20mb in size as flat files. That bloated ~120mb after conversion to XML.
Client attempted to open in a DOM based application which I suspect used recursion to parse the data (easy to code, recursion). Needless to say it brought their server to its knees.
We switched to flat files shortly there after.
In my problem domain, where 20MB is a small data set, XML is useless. XML seems does not scale well at all (though using a SAX parser helps at times).
YMMV.
putting the 'B' in LGBTQ+
<SomeTagName>some character data</SomeTagName>
According to the XML spec, the closing tag must close the nearest opening tag. So why does it have to include the opening tag's name? This is 100% redundant information, and is included in every XML tag with children or cdata. An obvious compression would be to replace this with:
<SomeTagName>some character data</>
I really don't know why this wasn't done from the outset (backwards compatibility with HTML, where tags often overlap - although they're not meant to - I suppose). Either allow tags to overlap (which allows some more interesting data structures to be easily encoded in XML) or make the name optional in the closing tag.
I am TheRaven on Soylent News
take an example on microsoft XML formats. Word, or the MSN messages format... they're _NOT_ xml. They're proprietary formats DISGUISED as XML.
If Microsoft doesn't respect text-only XML, what do you think will happen when^H^H^H^Hif binary XML is out?
DNS is binary; does that make it proprietary? Not at all. It is a published open standard in RFC 883 and later documents. Other examples include ASN.1/BER as used in SNMP. It's not whether it is binary or text that matters; it's whether it is openly documented and unencumbered by intellectual property claims (a separate issue some of XML has).
The decision of binary vs. text for a format should be the result of specific needs. XML is verbose. XML can be compressed for transmission purposes, but it still has to be uncompressed to its verbose form for parsing. If speed in parsing is necessary (it might be as I have noticed quite many XML based progams are rather slow), a binary format can have things like length prefixes and continuation tags, instead of having to detect and verify collection of characters whose position is unknown. A parser that does not recognize a given tag, or does not need to process it, in a binary format can simply skip it by jumping the specified number of bytes. Binary format is very optimal for machine processing.
The usual argument for a text format spans the range of permitting humans to create the content for most things directly in an editor like vi or emacs (no wars here, I listed my favorite last), or reading that content directly, such as to diagnose the real cause of misunderstood errors. XML is too utterly complex for human creation or interpretation to be effective on a direct basis. There may be some argument that it can still be effective for diagnostic purposes (I have in fact needed to do so many times). Given that it is the powerful tools of XML that are used as the basis for the benefit of XML and promoting it, then what does it really matter what format is underneath as long as it is open and unencumbered?.
A binary format for XML will absolutely not kill XML. DNS is obviously not dead (and you'll love it even more when IPv6 rolls into your network). What a binary format might do is weed out some of the weaker programmers who are sticking their fingers a bit too deep into the inner workings of some applications and tools.
now we need to go OSS in diesel cars