Slashdot Mirror


XML Co-Creator says XML Is Too Hard For Programmers

orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."

4 of 562 comments (clear)

  1. It's about tools, libraries by Anonymous Coward · · Score: 5, Interesting

    Well, first he chose a bad tool (Perl regexp) for XML processing, and then complains about his tools being insufficient.

    Using Perl regexps to parse XML is silly, because there's too much variability (e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling. You can do it, of course, but it quickly gets messy.

    There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.

    1. Re:It's about tools, libraries by Sique · · Score: 5, Interesting

      No. It is not. It is about basic computer science.

      XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction. What he is complaining about is exactly this: Lots of parsing to get a simple datum.

      With regexp your parsing is much faster, because you can concentrate on substrings, you can parse them without using a stack, you can use them in stream context. But regexp are Regular Expressions (Chomsky Type 3 grammar), so they are in fact just a subset of XML and not able to parse XML completely.

      One of the links in the article points to another rant, where the author wants some regulations for a limited XML. Badly enough the ideas he is proposing are in fact context sensitive and such they are Chomsky Type 1 (context sensitive grammar) and a superset of XML instead of a simplified subset. Someone remembers the Early algorithm with something that can be described as a multi dimensional stack?

      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

      --
      .sig: Sique *sigh*
  2. Oh please! by gwappo · · Score: 5, Interesting
    It's annoying when posters get presumptious. The people complaining in the article are by all means elite programmers, proclaiming xml is okay because "programming *is* a hard task" is non-sense and in the same league as "HLL's are for wussies, real men code in assembly" and other crap.

    The criticism on XML is accurate, correct, valid, if only for the simple reason that the code needed to interface with the libraries is 90% plumbing-work and 10% business-solution. That 90% plumbing-work leaves oppertunity for _a lot of bugs_ to be created and for any solution using XML to become a resource-hog.

    Having a standard interchange format like XML is a fun-thing, and "good", as it allows standardized processing of these formats. However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

  3. Stay on topic - problem isn't XML standard by cdthompso1 · · Score: 5, Interesting
    Tim Bray's article, if you didn't read it, is right on the money. The last paragraph basically states that XML is the best alternative to the data interchange problem because it provides a consistent format. Some of you guys who are rounding up the mob and lighting buildings on fire calling for book burnings and the downfall of all XML have to read the article! You're not in agreement with Tim when you say, "Sure, I think XML sucks, too."

    So to be clear, XML is here to stay. (An example of XML penetration: there is a working schema for using XML in the farming industry!) Just imagine the chaos that will insue once MS Office saves all documents in true XML.

    My take on the problem Tim's really talking about: inconsistency and the proliferation of people who want to be the next prodigy in their area of expertise. There are so many parsers and interfaces, even within a language domain, because vendors want to put their own spin on everything. The alphabet soup that results confuses the hell out of people. This has even happened in the open source world, where I can do a Google search on "php xml parsing" and read articles on no less than 10 different approaches. For the average guy who has been told by a project manager, "We need to take these XML files from our business partner, extract and store the data in our database," you need a standard approach. Not to stifle thought and innovation, yes, you should take the initiative to understand whether an event-driven approach (SAX parser) or an in-memory object model approach (DOM parser) is right for the job. After all, you do get paid to do this, so earn your keep! But the XML community hasn't done a good job of specifying best practices and leading people by the nose to a solution. Every XML book I've seen furthers the confusion, with each other offering his opinion with a slight variation of how to do things, leading programmers/scripters/whatevers to use the approach they most recently read about, and not necessarily the one that time has proven out to be the most efficient.

    Part of this is the divide between the .Net guys, the Java camp, the Perl/PHP folks, etc., but in the spirit of interoperability, maybe the XML promoters just need to dumb things down a bit to get some simple concepts and best practices into the hands of Joe Sixpack Programmer. Maybe a central authority, a la java.sun.com or php.net?