The Future of XML
An anonymous reader writes "How will you use XML in years to come? The wheels of progress turn slowly, but turn they do. The outline of XML's future is becoming clear. The exact timeline is a tad uncertain, but where XML is going isn't. XML's future lies with the Web, and more specifically with Web publishing. 'Word processors, spreadsheets, games, diagramming tools, and more are all migrating into the browser. This trend will only accelerate in the coming year as local storage in Web browsers makes it increasingly possible to work offline. But XML is still firmly grounded in Web 1.0 publishing, and that's still very important.'"
Sparingly. JSON is just plain better, and doesn't inflict an enterprisey mindset on anyone that tries to use it.
XML is like violence. If it doesn't solve your problem, you're not using enough of it.
I don't get it. We can argue the merits of data exchange formats 'till we're blue in the face; yet I cannot see why XML is so popular. For the majority of applications that use it, it's overboard. Yes, it's easier on the eye, but ultimately how often do you have to play with the XML your CAD software uses?
I'm a programmer, just like the rest of you here, so I'm quite used to having to write a parser here or there, or fixing an issue or two in an ant script. The thing that puzzles me, is why it's used so much on the web. XML is bulky, and when designed badly it can be far too complex; this all adds to bandwidth and processing on the client (think AJAX), so I'm not seeing why anyone would want to use it. Formats like JSON are just as usable, and not to mention more lightweight. Where's the gain?
ilovegeorgebush
I have had far too many 'this stuff sucks' moments with XML to ever consider using it in any capacity where it is not forced upon me (which unfortunately, it is, with great frequency).
I first heard about XML years ago when it was new, and just the concept sucked to me. A markup language based on the ugly and unwieldy syntax of SGML (from which HTML derives)? We don't need more SGML-alikes, we need fewer, was my thought. This stuff sucks.
Then a while later I actually had to use XML. I read up on its design and features and thought, OK well at least the cool bit is that it has DTDs to describe the specifics of a domain of XML. But then I found out that DTDs are incomplete to the extreme, unable to properly specify large parts of what one should be able to specify with it. And on top of that, DTDs don't even use XML syntax - what the hell? This stuff sucks.
I then found that there were several competing specifications for XML-based replacements for the DTD syntax, and none were well-accepted enough to be considered the standard. So I realized that there was going to be no way to avoid fragmentation and incompatibility in XML schemas. This stuff sucks.
I spent some time reading through supposedly 'human readable' XML documents, and writing some. Both reading and writing XML is incredibly nonsuccinct, error-prone, and time consuming. This stuff sucks.
Finally I had to write some code to read in XML documents and operate on them. I searched around for freely available software libraries that would take care of parsing the XML documents for me. I had to read up on the 'SAX' and 'DOM' models of XML parsing. Both are ridiculously primitive and difficult to work with. This stuff sucks.
Of course I found the most widely distributed, and probably widely used, free XML parser (using the SAX style), expat. It is not re-entrant, because XML syntax is so ridiculously and overly complex that people don't even bother to write re-entrant parsers for it. So you have to dedicate a thread to keeping the stack state for the parser, or read the whole document in one big buffer and pass it to the parser. XML is so unwieldy and stupid that even the best freely available implementations of parsers are lame. This stuff sucks.
Then I got bitten by numerous bugs that occurred because XML has such weak syntax; you can't easily limit the size of elements in a document, for example, either in the DTD (or XML schema replacement) or expat. You just gotta accept that the parser could blow up your program if someone feeds it bad data, because the parser writers couldn't be bothered to put any kind of controls in on this, probably because they were 'thinking XML style', which basically means, not thinking much at all. This stuff sucks.
Finally, my application had poor performance because XML is so slow and bloated to read in as a wire protocol. This stuff sucks.
XML sucks in so many different ways, it's amazing. In fact I cannot think of a single thing that XML does well, or a single aspect of it that couldn't have been better planned from the beginning. I blame the creators of XML, who obviously didn't really have much of a clue.
In summary - XML sucks, and I refuse to use it, and actively fight against it every opportunity I get.
The future of XML?
Probably a long, healthy life in a big house on the top of buzzword hill, funded by many glowing articles in magazines like InformationWeek and CIO, and 'research papers' by Gartner. Sitting on the porch yelling, "Get off my lawn!" to upstarts like SOA, AJAX, and VOIP. Hanging out watching tube with cousin HTML and poor cousin SGML. Trying to keep JSON and YAML from breaking in and swiping his stuff. Then fading into that same retirement community that housed such oldsters as EDI, VMS, SNA, CICS, RISC, etc.
Ok. I've once again seen the full range of XML comments here. From 'cool super technology modern java' to 'OMFG it sucks' right over to 'XML has bad security' - I mean ... WTF? XML is a Data Format Standard. It has about as much to do with IT security as the color of your keyboard.
And for those of you out there who haven't yet noticed: XML sucks because data structure serialisation sucks. It allways will. You can't cut open, unravel and string out an n-dimensional net of relations into a 1-dimensional string of bits and bytes without it sucking in one way or the other. It's a, if not THE classic hard problem in IT. Get over it. It's with XML that we've finally agreed upon in which way it's supposed to suck. Halle-flippin'-luja! XML is the unified successor to the late sixties way of badly delimited literals, indifference between variables and values and flatfile constructs of obscure standards nobody wants. And which are so arcane by todays standards that they are beyond useless (Check out AICC if you don't know what I mean). Crappy PLs and config schemas from the dawn of computing.
That's all there is to XML: a universal n-to-1 serialisation standard. Nothing more and nothing less. Calm down.
And as for the headline: Of-f*cking-course it's here to stay. What do you want to change about it (much less 'enhance'). Do you want to start color-coding your data? Talking about the future of XML is allmost like talking about the future of the wheel ("Scientist ask: Will it ever get any rounder?"). Give me a break. I'm glad we got it and I'm actually - for once - gratefull to the academic IT community doing something usefull and pushing it. It's universal, can be handled by any class and style of data processing and when things get rough it's even human readable. What more do you want?
Now if only someone could come up with a replacement for SQL and enforce universal utf-8 everywhere we could finally leave the 1960s behind us and shed the last pieces of vintage computing we have to deal with on a daily basis. Thats what discussions like these should actually be about.
We suffer more in our imagination than in reality. - Seneca
XML tries to make everything fit into a single hierarchy. Most real-world information is comprised of graphs of data. ISO STEP provides better readability compared to XML, a more strongly typed schema mechanism, and a more compact size. Best of all, programs can process and present results of STEP incrementally instead of requiring closing tags so you can hold gigabytes of information in the same file and seek randomly.
Example:
#10=ORGANIZATION('O0001','LKSoft','company');
#11=PRODUCT_DEFINITION_CONTEXT('part definition',#12,'manufacturing');
#12=APPLICATION_CONTEXT('mechanical design');
#13=APPLICATION_PROTOCOL_DEFINITION('','automotive_design',2003,#12);
#14=PRODUCT_DEFINITION('0',$,#15,#11);
#15=PRODUCT_DEFINITION_FORMATION('1',$,#16);
#16=PRODUCT('A0001','Test Part 1','',(#18));
#17=PRODUCT_RELATED_PRODUCT_CATEGORY('part',$,(#16));
#18=PRODUCT_CONTEXT('',#12,'');
#19=APPLIED_ORGANIZATION_ASSIGNMENT(#10,#20,(#16));
#20=ORGANIZATION_ROLE('id owner');
I'll save the discussion for XML on the web for others - I'm a game programmer, so I deal with XML as file-based data sources.
.NET's reflection capabilities, it's absolutely brainless to easily serialize any data structure. We've decided to use XAML (an XML-based object declaration format) / WPF as well. The artists love the flexibility in the tools, and can even participate in helping to design the interfaces by creating styles.
At the game studio where I work, all our newest tools are written in C#, and use XML as a data source (typically indirectly though serialized objects). Heavyweight objects (textures, models, audio) are naturally stored in a binary format, which is optimized for the task at hand. The XML-based formats are essentially our game data's source files, and tends to function in a metadata-type capacity. As a simple example, our audio scripts store a lot of parameters about how to play a sound (pitch and volume variations, choosing among multiple variants, category and volume data, etc), and this metadata simply references external binary audio files, typically stored in a standard format like Ogg Vorbis or ADPCM compressed wave files. This metadata is compiled into a binary run-time version using a proprietary format designed to allow us to easily filter versions. These binary formats are then packed into larger containers for simpler management. Since I work on an MMO, we have to think about versioning our binary data, which tends to be challenging.
XML is a great format for us, being so widely supported, since we use both native parsing libraries as well as a lightweight custom parser for our C++ tools (or if we need to support in-game loading for the in-house version of the game). It's easy to look into a file format to see what might be going wrong using just a text editor, and with
I don't know what the argument about not knowing what every tag means, like in HTML. The entire point of XML is to be extensible, meaning that it's the client application that determines what the tags ultimately mean. And using SweetXML, btw, misses one of the great benefits of using XML, which is that's it's a standard for which you're likely never going to have to write parsing libraries. It's fine if you want to go that route, but just be aware that you may not have the choice of libraries that you would have by using standard XML.
XML does tend to suffer from the "golden hammer" syndrome. Honestly, I'm not a huge fan of it's verbosity or general readability either, but if you take it for what it is, and use it sensibly, it's just another nifty tool you as a programmer can make good use of. After all, wouldn't you rather be working on more important parts of your project than fiddling with a text parser?
Irony: Agile development has too much intertia to be abandoned now.
XML does suck if you stick with some of the W3C standards and common tools. Suggestions to make it less painful:
W3C Schema is painful; it forces object-oriented design concepts onto a hierarchical data model. Consider RELAX NG (an Oasis-approved standard) instead; it's delightful in comparison. Use the verbose XML syntax when communicating with the less technical - if you've seen XML before, it's pretty easy to comprehend:
Switch to the compact syntax when you're among geeks:
There's validation support on major platforms, and even a tool (Trang) to convert between verbose/compact formats, and output to DTD and W3C Schemas. And, if you need to specify data types, it borrows the one technology W3C Schema got right: the Datatypes library.
The W3C DOM attempts to be a universal API, which means it must conform to the lowest common denominator in the programming languages it targets. Consider the NodeList interface:
While similar to the native list/collection/array interfaces most languages provide, it's not an exact match. So, DOM implementers create an object that doesn't work quite like any other collection on the platform. In Java, this means writing:
Instead of:
Dynamic languages allow an even more concise syntax. Consider this Ruby builder code to build a trivial XML document:
I thought about writing the W3C DOM equivalent of the above, but I'm not feeling masochistic tonight. Sorry.
The alternatives depend on your programming language, but plenty of choices exist for DOM-style traversal/manipulation.
In-memory object models of large XML document can consume a lot of resources, but often, you only need part of the data. Consider using an XMLPull or StAX parser instead. Pull means you control the document traversal, only descending into (and fully parsing) sections of the XML that are of interest. SAX based parsers have equivalent capabilities, but the programming model is uncomfortable for many developers.
Even better, some Pull processors are wicked fast, even when using them to construct a DOM. In Winter 2006, I benchmarked an XML-heavy application, and found WoodStox to be an order of magnitude faster at constructing thousands of small DOM4J documents
Scott Severtson
Senior Architect, Digital Measures