XML Co-Creator says XML Is Too Hard For Programmers
orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."
XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.
None of this would have ever been needed had CS been tuaght properly. There are other concepts to describe how files are to be organized. Some of the systems date from the 1950's. BNF (which seems to work very well for programmers to describe file formats to other programmers) dates from the early 1960's. What was needed is a BNF type grammar that is machine readable.
Would XLM have ever taken off if the web used something sane and not a hacked version of a nasty text formatting system from decades ago?
You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.
My point: XML is over-used for a lot of things. In some places it makes sense, but in many places it doesn't.
Did you actually read the article?
I can sum it up very easily:
He's looking for a nicer api for processing XML, he's not looking to replace XML entirely.
He's stating that he'd basically like others coders write more code the way he sees fit.
[quote]
while () {
next if (XX);
if (X|||X)
{ $divert = 'head'; }
elsif (XX)
{ &proc_jpeg($1); }
# and so on...
}
[/quote]
Repeat after me: I will never leave parsing XML up to a regexp especially if my xml may contain CDATA and Comment sections. I will never...
Unless you are 100% certain the file you are parsing is directly under your control, ie: no comments, no cdatas, params always in the same order, same indentation, same bloody encoding [pardon my french], well, you just will have to acces the data using some kind of DOM or abstract tree representation.
I don't think he thinks no one uses XML, he seems to deplore the fact that some people don't get it at all and resort to heavy duty tools for trivial tasks [thus justifying his example above].
Basically XML is quite simple, but that's not the matter, the problem is that XML bundles ACTUAL DATA, it's all about the complexity of those data, not the API used to access it [although writing a DOM implementation is a real pain]
You just gave the best argument for adopting XML as widely as possible. Yes, all these can be parsed (with the possible exception of sendmail's config files which may be Turing-complete) but they all require *different* code for each config file. If they were in XML you'd still need different semantic code, of course, but a whole wodge of syntax issues (how do I quote strings, how do I escape newlines, how do I mark nested scopes, what happens when the string delimiter character occurs inside a string, how do I deal with comments, what is the character set, is there a formal grammar for the document, etc etc) would be dealt with. Maybe not in the way that you or I think is perfect - IMHO XML is a little bit verbose compared to say Lisp- or Tcl-style encodings. But they would be dealt with *once*. No need to learn a new or almost-the-same-but-slightly-different set of syntactic conventions for every single config file.
Maybe XML is over-used for a lot of things, but making up your own file format is definitely over-used a lot more. Simple line-oriented files are reasonable to have as plain text, for everything else please avoid the temptation to reinvent the wheel by devising a new syntax and block structure.
-- Ed Avis ed@membled.com
It might be too late to correct some things in XML.
Good about XML is, that whatever will emerge in the future,
it will always be possible to convert old documents into any
new form, using simple tools.
There is a point with critics: Unlike Latex or HTML which
can be written easily by hand, XML can become too bloated to
be authored directly by humans.
Similar problem with MathML:
Latex: $x^5+3x-9=0$
MathML:
<mrow>
<mrow>
<msup>
<mi>x</mi>
<mn>5</mn>
</msup>
<mo>+</mo>
<mrow>
<mn>3</mn>
<mo>⁢</mo>
<mi>x</mi>
</mrow>
<mo>-</mo>
<mn>9</mn>
</mrow>
<mo>=</mo>
<mn>0</mn>
</mrow>
You can write complicated formulas in Latex directly but it is
almost impossible to do so in MathML, where one has to rely
on tools to generate it (i.e. export it with Mathematica or
TeX -> MathML converters). Wouldn't it be nice if browsers
would understand a basic version of LateX? (That it is possible
has been shown with IBM's texexplorer plugin).
1. Doctype is necessary. Perhaps you've never tried handling a very complex text (a big DOCBOOK text or a big TEI text). You need to know what kind of text you're dealing with, and there's no way to come up with one universal solution for all kinds of texts. The only character entities needed are the handful of named entities that are part of the standard: < > & etc. The rest can be handled by Unicode (including the PUA) and transcoding (if you are using a ISO 8859 encoding and you need a character outside that encoding, then you need to rethink the encoding you've chosen to use. UTF-8 is your friend). Entities really are good for more complex units (strings, etc.), rather than single characters. What character entities have to do with DOCTYPE is beyond me.
2. True
3. Standardize element IDs? Element IDs are part of the text, not part of the structure. They're simply a way of simplifying the difficulty of accessing random parts of text.
I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.
So you're saying we need a meta-meta-language? The *MLs are a standard for arbitrary abstract data (and text) models (because not all texts are hierarchical like DBs).
I think the problem here is that DB programmers (I'm excepting Bray from this) are overusing XML for very simple DB tasks that it wasn't intended for. If you're just doing a 40 field, 30,000 record flat DB, XML is NOT the solution. But it is the best solution for complex non-hierarchical data (i.e., books, etc.).
As for Bray, I don't think he's saying XML itself (the markup standard alone) is too hard, that it should be abandoned. I think he's saying we haven't come up with simple enough ways of accessing XML data through APIs. But of course that wouldn't be a spicy enough meatball for the Taco.
If you're working with data that can be meaningfully represented with columns, you're using the wrong damned tool. XML is for complex structured data, which it does fine. It is not for tables. Don't blame the tool, blame the idiot who thought that XML was a good way to do DBs.
XML got one thing right over unadorned S-expressions - document packaging, specifically versioning and character-set labeling. XML inherited this from SGML, and it's one of the few things it took from there that was actually worth keeping.
For a good laugh, read the Origin and Goals section of the XML spec. Of the ten goals for XML listed there:
XML shall be straightforwardly usable over the Internet.
XML shall support a wide variety of applications.
XML shall be compatible with SGML.
It shall be easy to write programs which process XML documents.
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
XML documents should be human-legible and reasonably clear.
The XML design should be prepared quickly.
The design of XML shall be formal and concise.
XML documents shall be easy to create.
Terseness in XML markup is of minimal importance.
I'd say two of them were met, but were bad ideas (SGML compatibility, terseness unimportant), and five of them were completely missed (ease of use, human legibility, quickly designed, formal and concise, ease of creation).
Thirty per cent is a failing grade, folks...
To a Lisp hacker, XML is S-expressions in drag.
Yeah, the world needs more half-assed barely functioning and noncompliant XML parsers.
Seriously I think it's much more robust to just use a normal XML parser. You get all the character set support. If someone hacked up their own parser at work I would reject it in a code review. There's no sense in maintaining your own XML parser these days; they are a commodity.
-Kevin
XML is bad like Democracy is bad. It's just better than the alternatives.
.xls. Without ever looking at our system's BOM files before I wrote a program that read the .xls and built a proper XML BOM file our system could read. If our system wasn't using XML, who knows how long it would have taken me to figure out the intricacies of a proprietary file format.
I had a problem at work when we switched from AutoCAD to Solidworks. Our manufacturing software couldn't read the new BOM files, which were Excel's
OddManIn: A Game of guns and game theory.
You know, using VB is just code reuse. It's just reusing more code than you're use to. It's got some serious strengths. The app you write in a couple days the VB programmer can toss out after lunch. How about data aware controls? Those are a pain in the ass in C/C++, although you can make it easier by using third party components. Like ActiveX controls. Which are a pain in C/C++, but are painless in VB. On the other hand, your code won't be small, and you'll be linking to a massive runtime, and you're using a language who's syntax makes me feel dirty.
Oh, and if you're making web-based apps, wtf are you using C for?
The market for *real* programmers has been destroyed by corporate America.
I think that the *real* programmers that you have talked about all write libraries now. These guys all have jobs at the tool makers like MS, Apple, etc...
Businesses in general don't want (and generally don't need) *real* programmers, they want software engineers. They want someone who can sit down, work out some requirements and provide a timely, cost effective solution. It has taken me some time to fully realize this, but the right technical solution is not always the right business solution. The PHB could really care less if the app is written in VB, C, Java, as long as the application works to within their parameters. It is those parameters that are specified by the people paying for the software that will direct the language/technology you ultimately use.