Slashdot Mirror


An Overview of Modern XML Processing Techniques and APIs

Dare Obasanjo writes with a link to his article "A Survey of APIs and Techniques for Processing XML" on xml.net. It starts off "In recent times the landscape of APIs and techniques for processing XML has been in the process of reinventing itself as developers and API designers learn from their experiences and some past mistakes. APIs such as DOM and SAX which used to be the bread and butter of XML APIs are giving way to new models of examining and processing XML. However although some of these techniques have become widespread amongst developers who primarily work with XML they are still unknown to the general body of developers. Nothing highlights this better than a recent article by Tim Bray one of the co-inventors of XML entitled XML is too Hard for Programmers and the subsequent responses on Slashdot." Read the entire article to learn more about the state of the XML art. Added in the missing link.

13 of 40 comments (clear)

  1. Actually on xml.com by DeathBunny · · Score: 5, Informative

    The article is actually on xml.com, not xml.net. Here is the url: http://www.xml.com/pub/a/2003/07/09/xmlapis.html

  2. No Link? by Snerdley · · Score: 4, Insightful

    This is a horrible post!

    There is no link to the article, and the one link that comes close (to xml.net ) points to a site that says:

    xml.net will be online soon. Sign up now and we'll keep you posted on our progress.

    Timothy, how did you read this as the editor?

    I am interested in the topic: please fix the post so that we can read the article.
    1. Re:No Link? by Anonymous Coward · · Score: 4, Funny

      I am interested in the topic: please fix the post so that we can read the article.

      The fix will be uploaded in a few days but subscribers can click now and beat the rush!

  3. Plaint Text and XML by cpeterso · · Score: 5, Insightful


    XML sucks because it's being used wrongly. It is being used by people who view it as being an encapsulation of semantics and data, and it's not. XML is purely a way of structuring files, and as such, really doesn't add much to the overall picture. XML came from a document preparation tradition. First there was GML, a document preparation system, then SGML, a document preparation system, then HTML, a document preparation system, and now XML. All were designed as ways humans could structure documents. Now we've gotten to the point where XML has become so obscure and so complex to write, that it can no longer be written by people. If you talk to people in Sun about their libraries that generate XML, they say humans cannot read this. It's not designed for human consumption. Yet we're carrying around all the baggage that's in there, because it's designed for humans to read. So XML is a remarkably inefficient encoding system. It's a remarkably difficult to use encoding system, considering what it does. And yet it's become the lingua franca for talking between applications, and that strikes me as crazy.

    People think, "Once I've got my data in XML that's all I've got to do. I've now got self-describing data," but the reality is they don't. They're just assuming that the tags that are in there somehow give people all the information they need to be able to deal with the data. Now, for some things there are standards. For example, there are some standards like RSS and RDF, which give you very simple ways of describing web page content. But a random XML file, especially machine generated XML files, can be as obscure as binary data.

    Ant is a really good example, because in that case you're using XML as a user-specified input language, which is really inappropriate in that context. I'd much rather have a genuine grammar. I want to be able to type something simple and easy for me. I don't care if it's easy for the tool to parse, that's the tool's problem. I want it to be easy for me to write. And in cases like that, it's really the case of the programmer saying, "Oh look, here's an XML parser. I can just take XML files. That's easier." So one programmer in one context puts a burden on the other 100,000 programmers trying to use it.

    1. Re:Plaint Text and XML by battjt · · Score: 4, Insightful

      Once data is in XML I can manipulate it without having to write a parser. This is pretty handy in an enterprise setting where data is coming from all over and headed somewhere else. Efficiency of the overall business process is important, not the efficiency of my program. Joe

      --
      Joe Batt Solid Design
    2. Re:Plaint Text and XML by __past__ · · Score: 2, Informative
      Once data is in XML I can manipulate it without having to write a parser.
      Um, no. Or yes, but only in not too interesting ways.

      An XML processor (Note that the W3C XML Rec carefully avoids the term "parser". That is for a reason.) is more like a lexer than a parser in traditional terms. It tells you about the syntactic elements of an XML document, but nothing about their meaning or relation. In other words, XML is not a language, XML applications are. Yet languages is what people need.

      It turns out that you actually cannot do all that much with plain XML without knowledge of the vocabulary used. It is great that you can mix vocabularies, and in a sense embed DSLs in whatever it is that you really want to use (like XInclude, which is usefull in lots of cases, but still an XML language of its own). Actually, I yet have to see anyone manipulate generic XML docs, it is always about specific XML applications. XML is not as general a solutions as everybody (or at least most marketeers) seems to think.

      That these applications are easier to deal with than a WordPerfect document embedded in an ASN.1 stream is true, however. But it tells more about WordPerfect and ASN.1 than about the inherently stupid idea of a generic exchange format if understood as anything beyond syntax.

    3. Re:Plaint Text and XML by the+hermit · · Score: 2, Informative

      And those manipulations you do are just another form of parsing. There's really no difference in writing a grammar that parses data and manipulates it and using SAX or DOM to manipulate some XML data. In either case, you still have to know the semantics to do anything useful. Using SAX is a big pain in the neck to interpret/manipulate the data hierarchy. Using DOM wastes alot of memory making a tree out of your whole dataset.

      However, it's ten million times easier for the end user of your data to create that data using a language made for that data, in which case, they only have to learn the mini language, as opposed to forcing them to use XML syntax on top of your data format, in which case they have to worry about XML syntax and learn your data format as well.

      -the hermit

  4. Re:You know... by ComputerSlicer23 · · Score: 4, Funny
    Don't worry, it'll get fixed up on the duplicate post in about 4 hours...

    Kirby

  5. XML is too hard for *rubbish* programmers by vbweenie · · Score: 4, Insightful

    It's also "too hard" in a variety of circumstances where the reason it's too hard is that it's the wrong thing to use.

    Good programmers can cope with XML just fine when it's just what they need to get the job done, and are smart enough to avoid it when it isn't.

    --
    Experience is a hard school, but fools will learn no other.
  6. Re:Plain Text and XML by jwdg · · Score: 4, Informative
    Manipulating XML may be cheaper than you think. libxml2 is very fast (IME) - I've used it with PostgreSQL for doing XPath queries on database columns and it is fast enough to make an XPath search (which involves building a DOM, parsing the XPath query and then executing it, for each row) across 1200 rows sufficiently fast to be useful. (It was a fraction of a second IIRC - obviosuly dependent on the nature of our XML docs).

    Yeah, I was surprised too.

    I disagree about the human readable/writable bit. It is easily human readable/writable if it's properly structured (if it's complex because the information is complex, that's an inevitability. Make the data model simpler, if that's a problem to you). In terms of efficiency - sure, binary formats are more efficient, but they are much harder to debug when they go wrong.

    I agree that XML documents are not necessarily self-documenting. That isn't surprising. XML is about syntax, not semantics. You can use XSD to provide basic (integer vs char) semantics, but anything more complicated comes back to human understanding and agreed specification. If you understand the objects in your schema, XML can provide a good presentation of those objects.

  7. Object Mapping / Marshalling techniques by jamesmrankinjr · · Score: 2, Informative

    My team (myself and another guy) implemented a mapping framework in Java that I think is more useful than the other frameworks I've seen.

    1. Order of fields in mapping file specifies order of elements in generated XML.
    2. Formatting of String, Date, etc. classes determined by formatter string in the mapping file.
    3. Can use an XPath like path to specify the location in the XML, not just a key name. This lets you decouple the structure of the object and the structure of the XML.
    4. Likewise, object fields are specified with a "keypath". E.g. Mapping "some string" to foo.bar.baz would result in getFoo().getBar().setBaz("some string").
    5. Constant mappings that let you just set an XML node to whatever you want. Maybe you just always want attribute "larry" of tag "bob" to always be "junior". Simple thing, but very useful.

    So when reading the comments about the weaknesses of object-mapping tools, keep in mind that some of us have overcome them. :)

    Peace be with you,
    -jimbo

  8. Re:XML often violates relational rules by vidarh · · Score: 2, Insightful

    More often than not the data I work on don't fit naturally in a relational structure. A lot of data is more naturally structured in tree structures or graph structures than in a matrix. One of the reasons I like XML is because it fits my data much better than a matrix.

  9. Attributes by Skeme · · Score: 2, Funny

    XML sucks because of attributes. I can have a and a thing and they are treated differently. How pointless that is.

    Plus any drooling idiot can come up with a way to represent a tree in a file. They did that 100 years ago with Lisp.