An Overview of Modern XML Processing Techniques and APIs
Dare Obasanjo
writes with a link to his article "A Survey of APIs and Techniques for Processing XML" on xml.net. It starts off "In recent times the landscape of APIs and techniques for processing XML has been in the
process of reinventing itself as developers and API
designers learn from their experiences and some past mistakes. APIs such as DOM and SAX which
used to be the bread and butter of XML APIs are
giving way to new models of examining and processing
XML. However although some of these techniques have become widespread amongst developers who
primarily work with XML they are still unknown to
the general body of developers. Nothing highlights
this better than a recent article by Tim Bray one
of the co-inventors of XML entitled
XML
is too Hard for
Programmers and the
subsequent responses on Slashdot." Read the entire article to learn more about the state of the XML art. Added in the missing link.
This is a horrible post!
There is no link to the article, and the one link that comes close (to xml.net ) points to a site that says:
xml.net will be online soon. Sign up now and we'll keep you posted on our progress.Timothy, how did you read this as the editor?
I am interested in the topic: please fix the post so that we can read the article.XML sucks because it's being used wrongly. It is being used by people who view it as being an encapsulation of semantics and data, and it's not. XML is purely a way of structuring files, and as such, really doesn't add much to the overall picture. XML came from a document preparation tradition. First there was GML, a document preparation system, then SGML, a document preparation system, then HTML, a document preparation system, and now XML. All were designed as ways humans could structure documents. Now we've gotten to the point where XML has become so obscure and so complex to write, that it can no longer be written by people. If you talk to people in Sun about their libraries that generate XML, they say humans cannot read this. It's not designed for human consumption. Yet we're carrying around all the baggage that's in there, because it's designed for humans to read. So XML is a remarkably inefficient encoding system. It's a remarkably difficult to use encoding system, considering what it does. And yet it's become the lingua franca for talking between applications, and that strikes me as crazy.
People think, "Once I've got my data in XML that's all I've got to do. I've now got self-describing data," but the reality is they don't. They're just assuming that the tags that are in there somehow give people all the information they need to be able to deal with the data. Now, for some things there are standards. For example, there are some standards like RSS and RDF, which give you very simple ways of describing web page content. But a random XML file, especially machine generated XML files, can be as obscure as binary data.
Ant is a really good example, because in that case you're using XML as a user-specified input language, which is really inappropriate in that context. I'd much rather have a genuine grammar. I want to be able to type something simple and easy for me. I don't care if it's easy for the tool to parse, that's the tool's problem. I want it to be easy for me to write. And in cases like that, it's really the case of the programmer saying, "Oh look, here's an XML parser. I can just take XML files. That's easier." So one programmer in one context puts a burden on the other 100,000 programmers trying to use it.
cpeterso
It's also "too hard" in a variety of circumstances where the reason it's too hard is that it's the wrong thing to use.
Good programmers can cope with XML just fine when it's just what they need to get the job done, and are smart enough to avoid it when it isn't.
Experience is a hard school, but fools will learn no other.
More often than not the data I work on don't fit naturally in a relational structure. A lot of data is more naturally structured in tree structures or graph structures than in a matrix. One of the reasons I like XML is because it fits my data much better than a matrix.