An Overview of Modern XML Processing Techniques and APIs
Dare Obasanjo
writes with a link to his article "A Survey of APIs and Techniques for Processing XML" on xml.net. It starts off "In recent times the landscape of APIs and techniques for processing XML has been in the
process of reinventing itself as developers and API
designers learn from their experiences and some past mistakes. APIs such as DOM and SAX which
used to be the bread and butter of XML APIs are
giving way to new models of examining and processing
XML. However although some of these techniques have become widespread amongst developers who
primarily work with XML they are still unknown to
the general body of developers. Nothing highlights
this better than a recent article by Tim Bray one
of the co-inventors of XML entitled
XML
is too Hard for
Programmers and the
subsequent responses on Slashdot." Read the entire article to learn more about the state of the XML art. Added in the missing link.
The article is actually on xml.com, not xml.net. Here is the url: http://www.xml.com/pub/a/2003/07/09/xmlapis.html
An XML processor (Note that the W3C XML Rec carefully avoids the term "parser". That is for a reason.) is more like a lexer than a parser in traditional terms. It tells you about the syntactic elements of an XML document, but nothing about their meaning or relation. In other words, XML is not a language, XML applications are. Yet languages is what people need.
It turns out that you actually cannot do all that much with plain XML without knowledge of the vocabulary used. It is great that you can mix vocabularies, and in a sense embed DSLs in whatever it is that you really want to use (like XInclude, which is usefull in lots of cases, but still an XML language of its own). Actually, I yet have to see anyone manipulate generic XML docs, it is always about specific XML applications. XML is not as general a solutions as everybody (or at least most marketeers) seems to think.
That these applications are easier to deal with than a WordPerfect document embedded in an ASN.1 stream is true, however. But it tells more about WordPerfect and ASN.1 than about the inherently stupid idea of a generic exchange format if understood as anything beyond syntax.
Programming can be fun again. Film at 11.
Yeah, I was surprised too.
I disagree about the human readable/writable bit. It is easily human readable/writable if it's properly structured (if it's complex because the information is complex, that's an inevitability. Make the data model simpler, if that's a problem to you). In terms of efficiency - sure, binary formats are more efficient, but they are much harder to debug when they go wrong.
I agree that XML documents are not necessarily self-documenting. That isn't surprising. XML is about syntax, not semantics. You can use XSD to provide basic (integer vs char) semantics, but anything more complicated comes back to human understanding and agreed specification. If you understand the objects in your schema, XML can provide a good presentation of those objects.
My team (myself and another guy) implemented a mapping framework in Java that I think is more useful than the other frameworks I've seen.
So when reading the comments about the weaknesses of object-mapping tools, keep in mind that some of us have overcome them. :)
Peace be with you,
-jimbo
XML Tools for Mac OS X
And those manipulations you do are just another form of parsing. There's really no difference in writing a grammar that parses data and manipulates it and using SAX or DOM to manipulate some XML data. In either case, you still have to know the semantics to do anything useful. Using SAX is a big pain in the neck to interpret/manipulate the data hierarchy. Using DOM wastes alot of memory making a tree out of your whole dataset.
However, it's ten million times easier for the end user of your data to create that data using a language made for that data, in which case, they only have to learn the mini language, as opposed to forcing them to use XML syntax on top of your data format, in which case they have to worry about XML syntax and learn your data format as well.
-the hermit