Celebrate the XML Decade
IdaAshley writes "IBM Systems Journal recently published an issue dedicated to XML's 10th anniversary. Take a look at XML application techniques, and general discussion of the technical, economic and even cultural effects of XML. Learn why XML has been successful, and what it would take for XML to continue its success."
This year I'll be sending out christmas cards in XML and then placing a large banner outside my house with the appropriate schema.
//recipient[@name='mum']
Then with every following year, I'll be sending a stylesheet card which they can apply to the original XML.
And if they need to locate their names on the card, they can use
Task Mangler
This is slashdot. Nobody reads the links.
:(){
Wait... let me figure this one out...
MCMXC was 1990...
MDCCCLX was 1860...
I give up! Which decade was XML?
- RG>
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
Pro
- Easy to change the schema, don't have to convert old data.
- They didn't know exactly what XML was, so if I recommended it,
... (a.k.a. "gee whiz" factor?)
- The other developers liked the idea
ConEventually we settled on gzipped xml. It required a little more code, but everyone seemed happy. Oh, and we stored images as separate
I think my experience is pretty common, though. And from experience, libxml2 + libz is still very, very fast, and there's not a (whole lot) of wasted space.
I'd like to hear other people's success stories, if anyone wants to reply... I liked reading the article, too.
A lot of people ask about using a different syntax, such as @name{....} as Scribe (and later LaTeX) did. Note that @element{xxx} is in fact a possible syntax that can be defined using SGML. But we were after something different.
;-)
When we designed XML, we had over a decade of solid experience with interoperability in the world of SGML, and we also knew about the kinds of problems that different sorts of users had with different sorts of syntax.
The primary users of SGML-based documentation systems were not programmers. They were people who were often not likely to know about a bracket-matching option in an editor or about code indenting, for example. But they were still legitimate users.
You can't easily test the markup in a declarative system: if in an HTML document I used H3 instead of P in a document it might not look right, but it would still parse OK. If I muddle up Author and Title in a bibliography, same thing.
So, the redundancy of end tags in XML is there because, in practice, if you didn't have it, we had learned that our users had problems correcting their documents, and we knew that, in general, it was only rarely possible for software to give the users much help. There were some experiments early on with </>, allowed by SGML (with various options set) to end any element; it soon became obvious that this caused more problems than it was worth, and even Microsoft disabled the troublesome feature in their XML parser.
It's true that today XML is used in lots of situations we didn't predict. We were amazed that by the time we got XML published as a Recommendation there were over 200 users. So no, we didn't predict the future percfectly. But the popularity of XML shows we can't have done all that badly, really
Liam
(Liam Quin, currently W3C XML Activity Lead)
Live barefoot!
free engravings/woodcuts
I have to do this once per year or so, here's the 2006 iteration: I am not XML's inventor. There were 150 people in the debating society and 11 people in the voting cabal and 3 co-editors of the spec. Of the core group, I (a) was the loudest mouth, (b) was independent so I didn't have to get PR clearance to talk, and (c) don't mind marketing work.
-Tim
The "slow processing" is caused by more than taking a lot of space. XML is basically a document markup but is frequently and regular used as a wire protocol, which has very different design requirements if you want a good standard. And in fact we already have a good standard for this kind of thing called "ASN.1", which was actually engineered to be extremely efficient as a wire protocol standard. (There is also an ITU standard for encoding XML as ASN.1 called XER, which solves many of the performance problems.)
Arguably the single biggest problem with XML that causes slow processing is that software can predict almost nothing about an XML stream and therefore has to allow for anything. The opening bracket tells you very little about what to expect, and creates few implicit failure or non-conformance tests that allows one to terminate processing because there is no definition of "unreasonable". If I want to embed a terabyte of data between XML tags, there is no built-in basic mechanism to inform the software of how much data I should expect to see before a closing tag and no basic mechanism to cue the software as to the type of data to expect. (Yes, you can sort of do it with lots of other layers strapped on, but it isn't core and strapping it on adds complexity.) This is the primary reason it gives miserable performance as a wire protocol format -- the software cannot make decisions about the data without slurping most or all of it, with no way to predict what "most" or "all" actually is. In well engineered standards such as ASN.1, they use the good old tag-length-value (TLV) format. The "tag" tells you what to expect, the length tells you how many bytes to expect, and the value is the actual data. In short, the encoding tells the software exactly what it is about to do before it does it in enough detail that the software can make smart and performant handling decisions.
The only real advantage XML has is that it is (sort of) human readable. Raw TLV formatted documents are a bit opaque, but they can be trivially converted into an XML-like format with no loss (and back) without giving software parsers headaches. There is buckets of irony that the deficiencies of XML are being fixed by essentially converting it to ASN.1 style formats so that machines can parse them with maximum efficiency. Yet another case of computer science history repeating itself. XML is not useful for much more than a presentation layer, and the fact that it is often treated as far more is ridiculous.