Tim Bray on the Birth of XML, 10 Years Later

← Back to Stories (view on slashdot.org)

Tim Bray on the Birth of XML, 10 Years Later

Posted by ryuzaki0 on Monday February 18, 2008 @04:34AM from the all-bloatetd-and-grown-up dept.

lazyguyuk writes "Tim Bray posts a lengthy blog on the birth of XML, formalized as 1.0 in Feb 1998. 'XML is ten years old today. It feels like yesterday, or a lifetime. I wrote this that year (1998). It's really long. The title was originally Good Luck and Internet Plumbing but the filename was "XML-People" and I decided I liked that better. I never got around to publishing it, so why not now?'"

14 of 260 comments (clear)

Min score:

Reason:

Sort:

Re:10 Years and still waiting by CRCulver · 2008-02-18 04:50 · Score: 4, Informative

Do you maintain a website? XML has been a godsend for those who want to maintain web and print output side by side. By keeping your data in an XML format, you can use simple XSL stylesheets to generate multiple types of output. See e.g. O'Reilly's XSLT Cookbook for dozens of very real-world examples (it's probably in your library).

That's just one example of how XML technology has made coding easier. Others I'm sure will point out others.

If you aren't a developer, then I'm not sure XML was supposed to directly revolutionize your end-user experience.
Re:XML and Interfaces by MBCook · 2008-02-18 05:02 · Score: 4, Informative
Here are some of the "fun" things I have run across in other people's (almost certainly custom) XML interpreters/producers:
- Tags must be upper case
- Tags can't be upper case
- You must put line breaks between elements
- There can't be any whitespace between elements
- It's import to URL encode the XML before it gets sent from them to me
- You don't need CDATA blocks, just put the ampersands and >s right in there, it'll be OK
- Your XML should all be inside a CDATA block in container XML
- No tags can self-close
- Self closed tags need a space between the slash and bracket
- Self closed tags can't have a space between the slash and bracket
That's just what I can think of off the top of my head. We've seen quite a bit of crazy stuff. If everyone would just use one of the already written XML producers or parsers (the big ones, the ones that work) life would be much easier around here from time to time.
--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:10 Years and still waiting by CRCulver · 2008-02-18 05:22 · Score: 3, Informative

Just like LaTeX! Reinvention is a wonderful thing.

LaTeX is restricted to certain types of print output. It emphatically cannot output HTML easily. Just look at the umpteen thousand threads on comp.text.tex where someone complains that
latex2html</ecore> can't handle anything more than a handful of quasi-default LaTeX packages. Plus, Unicode support in LaTeX has been shoehorned in and is still incomplete (though xetex is making strides), while at least XML was designed around Unicode. And then there is the fact that XML encourages semantic markup, while LaTeX contains non-semantic tags like <ecode>\textit
.
Re:10 Years and still waiting by EMN13 · 2008-02-18 05:36 · Score: 3, Informative

I use it in web development constantly, and have for about 8 years. It's great for documents mostly since it's much easier to process than a home-grown set up.

You want to transform the document, you can use any of a number of techniques, and trivially guarantee that the resulting document is at least syntactically valid. If you use a home-grown format (or HTML), you'll need to resort to regular expressions, or a custom parser - which works fine up to a point. Regex's are error prone (it's quite difficult, for instance, to make an untrusted HTML document safe with regex'es), and parsing is difficult, and doesn't solve the transformation step very elegantly - wheras XPath and others are absolutely brilliant for quickly distilling the stuff you need from a document.

But on the parsing side... take a look at ANTLR, it's just great :-).
Re:YAML and JSON by tjansen · 2008-02-18 05:40 · Score: 2, Informative

As you say, YAML is a specialized markup-language (data-centric, almost human-readable) and not a good choice for many use-cases (document-centric languages like XHTML and DocBook, combining languages with XML namespaces). In other words, it can not replace XML, it's just another syntax to learn. It needs a completely new infrastructure: new parsers, new editors, new schema description language, new translation languages and so on. Is that really worth it, only to make editing files with a simple text editor easier?
Re:Java and XML, bad tastes that are worse togethe by fartrader · 2008-02-18 05:54 · Score: 2, Informative

Java is clearly moving away from the massive over-use of XML in everything from configuration to messaging. From Java 5 onwards, annotations are rapidly becoming the configuration mechanism of choice, where infrastructure configuration is placed in the source code directly, in a way thats significantly less obtrusive than writing code to manage things like persistence and transactions yourself, and significantly easier to follow than placing it in many XML files. Anyone who has migrated from EJB 2.1 to 3.0 for example should be much happier now that the various XML files needed to get it to run are going the way of the dodo. This use of annotations to replace XML is an emerging trend popular in many frameworks, from EE 5 through to Hibernate and Spring. On the messaging side there are a slew of code generation tools and XML-to-POJO (annotation-based) mappings that keep you away from raw XML - yes its another layer of abstraction but it keeps you away from the coding horrors of SAX, DOM, and yes even the comparative simplicity of JDOM.
Re:Classic by shutdown+-p+now · 2008-02-18 06:34 · Score: 2, Informative

To me that says that XML handles a problem that wasn't there. Parsing problem for pretty much everything is almost universally solved by regex...
God, no... another Perl hacker...
Regex are not a solution to everything, and most certainly not to writing fast parsers!
(Not that XML is easy to parse fast, but that's another story. You still don't write a JSON parser using regex.)
Re:10 Years and still waiting by TheRaven64 · 2008-02-18 06:44 · Score: 4, Informative

Does anyone still use latex2html? All of the TeX users I know who care about HTML output switched to tex4ht years ago. It produces a variety of XML formats, including XHTML (with MathML) and OpenDocument.

--
I am TheRaven on Soylent News
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 06:48 · Score: 2, Informative

"Ever tried parsing CSV?"
All the time. Its not that hard. Also, if you're worried about such things as quoting, etc., you can always use fixed-width fields - makes indexing, looking up, and modifying values REAL FAST. Compare that to the mess of xml.

--
Kevin Smith on Prince
Re:Regex by TheRaven64 · 2008-02-18 07:02 · Score: 5, Informative

You fail Computer Science 101. Regular expressions are exactly as expressive as finite automata. A finite automaton is incapable of solving the matching brackets problem, since that requires a potentially infinite number of states in order to keep track of the number of open brackets in an input stream. Because of this, a regular expression can not be used to parse any XML schema that allows an arbitrary depth of nesting, since parsing such a form with would require counting the open and close tags to make sure they match, which is not possible with a regular expression.
This is why regular expressions are typically used for lexical analysis (tokenisation) not syntactic analysis (parsing).

--
I am TheRaven on Soylent News
Re:Regex by WilliamSChips · 2008-02-18 07:04 · Score: 2, Informative

No, you cannot with a regex. If you can, it's not really a regex, it's something different.

--
Please, for the good of Humanity, vote Obama.
Re:Java and XML, bad tastes that are worse togethe by CoughDropAddict · 2008-02-18 08:14 · Score: 3, Informative

So you're the guy who shits tabs in random places in source files, because you haven't figured out how to set up your editor to show you the difference. Please stop doing that. Tabs and spaces are different characters, even if the language you're using today treats them the same. If you're a VIM user, please learn to use "list" and "listchars."
Re:Your comments seem tainted with inexperience. by argent · 2008-02-18 09:22 · Score: 2, Informative

If you think XML a poor choice, then could you suggest an alternative?

Depends on the problem you're trying to solve.

A hell of a lot of the stuff I'm seeing in XML these days would be better off as token-separated self-describing tables (tables where the column names are the first row), or a modestly extended token-separated format like CSV.

For binary data something derived from Electronic Arts semi-self-describing interchange file format is good, examples in current use are MIDI File Format and Portable Network Graphics...

For arbitrary self-describing data there's always ASN.1.

For tagged arbitrary chunks of data descendants of RFC-822 are common.

For shallow-nested keyword-value data there's Microsoft's INI files.

And, of course, Lisp S-Expressions do absolutely everything XML does, more compactly, and are easier to parse.

Incidentally, that suggestion should not imply that everyone reinvent their own formats (again).

But XML doesn't solve that problem. I've found that the amount of code it takes to extract data from an arbitrary XML file even with an XML parser at hand is not significantly less than the amount of code it takes to parse and extract data from any other self-describing format.
Re:Here, let me fix that for you ... by trolltalk.com · 2008-02-18 11:20 · Score: 2, Informative

If you know how many fields there are in each record, then why did you need a special record delimiter to begin with? Sounds like a design mistake, which isn't surprising since it was ad-hoc...

Wrong - the special null delimiter is needed only for variable-length (and zero-length) fields and records. For fixed-length fields and records, no delimiter is needed.
For example: First Name\0x00Last Name\0x00Age0x00\0x00
Joe\0x00Blow\0x0042\0x00\0x00
Mary\0x00Doe\0x0024\0x00\0x00
\0x00Cowboyneal\0x00\0x00\0x00
In the above example, Cowboyneal has no first name and no age.
What's so hard to understand about that? For a fixed-length field?recordset, just include a header ... FirstName:10:LastName:10:Age:3\n
Joe_______Blow_______42\n
Mary______Doe_______24\n
__________Cowboyneal___\n
Both are human-readable, both are easy and intuitive to parse out, the second one is self-documenting and fully supports random access, etc (and neither one is new - the first is used on most *nixes, with either a : or | instead of a null, databases have been using the latter format for decades).
By contrast, xml is an abortion. Heck, I'll go further - xml is the ultimate triumph of navel-gazing over real-world experience.

--
Kevin Smith on Prince