Slashdot Mirror


XML Co-Creator says XML Is Too Hard For Programmers

orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."

9 of 562 comments (clear)

  1. It's about tools, libraries by Anonymous Coward · · Score: 5, Interesting

    Well, first he chose a bad tool (Perl regexp) for XML processing, and then complains about his tools being insufficient.

    Using Perl regexps to parse XML is silly, because there's too much variability (e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling. You can do it, of course, but it quickly gets messy.

    There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.

    1. Re:It's about tools, libraries by Sique · · Score: 5, Interesting

      No. It is not. It is about basic computer science.

      XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction. What he is complaining about is exactly this: Lots of parsing to get a simple datum.

      With regexp your parsing is much faster, because you can concentrate on substrings, you can parse them without using a stack, you can use them in stream context. But regexp are Regular Expressions (Chomsky Type 3 grammar), so they are in fact just a subset of XML and not able to parse XML completely.

      One of the links in the article points to another rant, where the author wants some regulations for a limited XML. Badly enough the ideas he is proposing are in fact context sensitive and such they are Chomsky Type 1 (context sensitive grammar) and a superset of XML instead of a simplified subset. Someone remembers the Early algorithm with something that can be described as a multi dimensional stack?

      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

      --
      .sig: Sique *sigh*
    2. Re:It's about tools, libraries by Sique · · Score: 4, Interesting

      No, I am suggesting, that in general you have to use a stack machine. Surely you can use degenerated trees instead of fully balanced trees to store your data. And a concatenation of elements is a regular expression (and a degenerated tree). But then you are already making assumptions about the data you get. But with such limiting assumptions you can easily streamline your code. But you are loosing the full power of XML on the way. And you need a grammar that makes sure you don't mix terminals and nonterminals.

      It starts out already if you are using escape characters to mark nonterminals and escape those characters with itself to mark them terminal. Those markings are still regular, but you loose already some speed ups. For instance \\ matches \\" and \\\", but one means just \ and the end of the string, and the other one means \" and the string continues. The only way to stay out of the mess is to make sure you are using an only left bound parser, first parse for all escape characters and then for the nonterminals, which makes your parser already a (local) 2-pass-parser.

      --
      .sig: Sique *sigh*
  2. XML is good by Ender+Ryan · · Score: 4, Interesting
    I don't understand why so many people complain about XML so much. It's really quite useful for storing arbitrary data. We have several hundred thousand text-based documents where I work, and it has been a total nightmare, until I converted the whole thing(well, I'm not done yet...) to XML.

    The documents are generally displayed as HTML on the web, but they're also read by a couple different programs for different purposes. When I first started here, it was mostly a mess of poorly hand-written HTML, but thankfully there were *only* about 20k documents at the time.

    I was charged with the task of writing said programs to read these damn files. Unfortuneately, they weren't all marked up the same...

    Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.

    Yay for XML! :)

    So, to sum up, XML is doing what it was meant to do, no less. Unfortuneately, it's also probably doing a bit more as well, XSL anyone? Yeck, why not just have a stand XML scripting language, why the need for the language to be valid XML itself?

    --
    Sticking feathers up your butt does not make you a chicken - Tyler Durden
  3. XML: bad implementation of a good idea by g4dget · · Score: 4, Interesting
    I have to agree that XML has serious problems.

    Now, I have to say: a universal syntax for tree-structured data is very useful: experience since the 1970s with one such universal syntax, Lisp, has shown that. It is unfortunate that XML is about the worst imaginable implementation of that idea. XML combines being a nuisance to type with having comparatively complex semantics and lots of redundant features.

    What is ironic is that the same "real world programmers" who wax ecstatic about XML also condemn Lisp as too complicated and too difficult to read. The universal syntax that XML aspires to, Lisp syntax delivered many decades ago. It's just that prejudice and ignorance caused people to re-invent the wheel (and in square form, too) in the form of XML.

    I am pretty torn between whether XML is a blessing or a curse. We really need something like it, but XML is so bad that it may not even live up to the level of "poorly designed industry standard but better than nothing".

  4. Oh please! by gwappo · · Score: 5, Interesting
    It's annoying when posters get presumptious. The people complaining in the article are by all means elite programmers, proclaiming xml is okay because "programming *is* a hard task" is non-sense and in the same league as "HLL's are for wussies, real men code in assembly" and other crap.

    The criticism on XML is accurate, correct, valid, if only for the simple reason that the code needed to interface with the libraries is 90% plumbing-work and 10% business-solution. That 90% plumbing-work leaves oppertunity for _a lot of bugs_ to be created and for any solution using XML to become a resource-hog.

    Having a standard interchange format like XML is a fun-thing, and "good", as it allows standardized processing of these formats. However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

  5. Hahahah finallly something I know a lot about. by BeerSlurpy · · Score: 4, Interesting

    We use XML heavily in a project I'm working on at my company. Some genius decided that everything should be in xml, and that we would use XSLT for a lot of the data manipulation. Naturally we also make heavy use of DTD and SAX. Lots of XML related technologies.

    I can tell you now that XML is a Bad Thing. It strives to excel at too many things at once, and becomes inefficient and complex as a result.

    XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with. Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form. XML document tree traversal = 10000x more complex than getting column data out of a ResultSet... Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

    The real problem is that XML only partly addresses the problems that relational database solved years ago (organizing and data accessable), but it does it without any of the efficiency benefits of a well designed database server. In my opinion, 90+ percent of the places where XML is being used today would be better served by using columns in a relational database table to store object fields. You get indexing, you get universal, simple and efficient searching, and you get speed.

    XML has too many faults to really list in one short post. The truth of the matter is that it tries to do too many things and DOESNT DO ANYTHING WELL. Sort of like if someone tries to be skilled in all musical instruments but ends up being, at best, mediocre in a few of them.

  6. Re:But XML is great for computers... by Anonymous Coward · · Score: 4, Interesting

    Right, so instead of using one regexp for /etc/hosts and another regexp for /etc/passwd, I'd have to use ten pages of getTheGodDOMObjectFromTheGodDOMXMLFile crap for /etc/hosts.xml and another ten pages for /etc/passwd.xml.

    How, exactly, has XML simplified *anything*?

  7. Stay on topic - problem isn't XML standard by cdthompso1 · · Score: 5, Interesting
    Tim Bray's article, if you didn't read it, is right on the money. The last paragraph basically states that XML is the best alternative to the data interchange problem because it provides a consistent format. Some of you guys who are rounding up the mob and lighting buildings on fire calling for book burnings and the downfall of all XML have to read the article! You're not in agreement with Tim when you say, "Sure, I think XML sucks, too."

    So to be clear, XML is here to stay. (An example of XML penetration: there is a working schema for using XML in the farming industry!) Just imagine the chaos that will insue once MS Office saves all documents in true XML.

    My take on the problem Tim's really talking about: inconsistency and the proliferation of people who want to be the next prodigy in their area of expertise. There are so many parsers and interfaces, even within a language domain, because vendors want to put their own spin on everything. The alphabet soup that results confuses the hell out of people. This has even happened in the open source world, where I can do a Google search on "php xml parsing" and read articles on no less than 10 different approaches. For the average guy who has been told by a project manager, "We need to take these XML files from our business partner, extract and store the data in our database," you need a standard approach. Not to stifle thought and innovation, yes, you should take the initiative to understand whether an event-driven approach (SAX parser) or an in-memory object model approach (DOM parser) is right for the job. After all, you do get paid to do this, so earn your keep! But the XML community hasn't done a good job of specifying best practices and leading people by the nose to a solution. Every XML book I've seen furthers the confusion, with each other offering his opinion with a slight variation of how to do things, leading programmers/scripters/whatevers to use the approach they most recently read about, and not necessarily the one that time has proven out to be the most efficient.

    Part of this is the divide between the .Net guys, the Java camp, the Perl/PHP folks, etc., but in the spirit of interoperability, maybe the XML promoters just need to dumb things down a bit to get some simple concepts and best practices into the hands of Joe Sixpack Programmer. Maybe a central authority, a la java.sun.com or php.net?