Celebrate the XML Decade
IdaAshley writes "IBM Systems Journal recently published an issue dedicated to XML's 10th anniversary. Take a look at XML application techniques, and general discussion of the technical, economic and even cultural effects of XML. Learn why XML has been successful, and what it would take for XML to continue its success."
<?xml version="1.0"?>
<content name="Shameless Self Promotion">
Good point, though there's a better way to edit binary files.
For example, I make a product called FileCarver which allows you to create a file format definition (in XML! heh), that describes the format of a binary file, and the program will automatically provide you with a GUI to edit it. Check it out at http:/fizzysoft.net/filecarver/
</content>
Pro
- Easy to change the schema, don't have to convert old data.
- They didn't know exactly what XML was, so if I recommended it,
... (a.k.a. "gee whiz" factor?)
- The other developers liked the idea
ConEventually we settled on gzipped xml. It required a little more code, but everyone seemed happy. Oh, and we stored images as separate
I think my experience is pretty common, though. And from experience, libxml2 + libz is still very, very fast, and there's not a (whole lot) of wasted space.
I'd like to hear other people's success stories, if anyone wants to reply... I liked reading the article, too.
XML and JSON are typically sent via servers that understand GZIP compression, which means they're only bloated at either end of the wire (the server and the client). Clients have enough memory to store even 100KB of XML (and that's a ridiculous scenario -- few Ajax apps send that much and people send much more HTML than they do XML and no one complains about that). So then it comes down to server resources, and whether 100KB or whatever is going to matter to a server. IMO the complaints seem to be unwarranted (by all means though if someones got a good scenario let hear it).
A lot of people ask about using a different syntax, such as @name{....} as Scribe (and later LaTeX) did. Note that @element{xxx} is in fact a possible syntax that can be defined using SGML. But we were after something different.
;-)
When we designed XML, we had over a decade of solid experience with interoperability in the world of SGML, and we also knew about the kinds of problems that different sorts of users had with different sorts of syntax.
The primary users of SGML-based documentation systems were not programmers. They were people who were often not likely to know about a bracket-matching option in an editor or about code indenting, for example. But they were still legitimate users.
You can't easily test the markup in a declarative system: if in an HTML document I used H3 instead of P in a document it might not look right, but it would still parse OK. If I muddle up Author and Title in a bibliography, same thing.
So, the redundancy of end tags in XML is there because, in practice, if you didn't have it, we had learned that our users had problems correcting their documents, and we knew that, in general, it was only rarely possible for software to give the users much help. There were some experiments early on with </>, allowed by SGML (with various options set) to end any element; it soon became obvious that this caused more problems than it was worth, and even Microsoft disabled the troublesome feature in their XML parser.
It's true that today XML is used in lots of situations we didn't predict. We were amazed that by the time we got XML published as a Recommendation there were over 200 users. So no, we didn't predict the future percfectly. But the popularity of XML shows we can't have done all that badly, really
Liam
(Liam Quin, currently W3C XML Activity Lead)
Live barefoot!
free engravings/woodcuts
The "slow processing" is caused by more than taking a lot of space. XML is basically a document markup but is frequently and regular used as a wire protocol, which has very different design requirements if you want a good standard. And in fact we already have a good standard for this kind of thing called "ASN.1", which was actually engineered to be extremely efficient as a wire protocol standard. (There is also an ITU standard for encoding XML as ASN.1 called XER, which solves many of the performance problems.)
Arguably the single biggest problem with XML that causes slow processing is that software can predict almost nothing about an XML stream and therefore has to allow for anything. The opening bracket tells you very little about what to expect, and creates few implicit failure or non-conformance tests that allows one to terminate processing because there is no definition of "unreasonable". If I want to embed a terabyte of data between XML tags, there is no built-in basic mechanism to inform the software of how much data I should expect to see before a closing tag and no basic mechanism to cue the software as to the type of data to expect. (Yes, you can sort of do it with lots of other layers strapped on, but it isn't core and strapping it on adds complexity.) This is the primary reason it gives miserable performance as a wire protocol format -- the software cannot make decisions about the data without slurping most or all of it, with no way to predict what "most" or "all" actually is. In well engineered standards such as ASN.1, they use the good old tag-length-value (TLV) format. The "tag" tells you what to expect, the length tells you how many bytes to expect, and the value is the actual data. In short, the encoding tells the software exactly what it is about to do before it does it in enough detail that the software can make smart and performant handling decisions.
The only real advantage XML has is that it is (sort of) human readable. Raw TLV formatted documents are a bit opaque, but they can be trivially converted into an XML-like format with no loss (and back) without giving software parsers headaches. There is buckets of irony that the deficiencies of XML are being fixed by essentially converting it to ASN.1 style formats so that machines can parse them with maximum efficiency. Yet another case of computer science history repeating itself. XML is not useful for much more than a presentation layer, and the fact that it is often treated as far more is ridiculous.
Thank you for your kind words :-)
We weren't really aiming at HTML users.
I'm afraid the only useability studies of SGML tools that I saw were not released to the public. At the time I worked for a vendor of SGML-based software (e.g. including an editor, a viewer, a development environment) and it was a matter of great concern to us.
It's possible we could open up the archives of the XML Working Group, but it would mean getting the permission of several hundred people. I'll ask some people at the upcoming XML conference in Boston and try and get a feeling for how hard that would be.
I agree with you that the involvement of Sun, Microsoft, Netscape and others was very helpful. There are, however, other file formats that were around in the past and had similar backing but which failed. Remember ODA? There were also some attempts early on to move the Web to a dialect of RTF, which was also supported by big companies.
Best,
Liam
Live barefoot!
free engravings/woodcuts