The Future of XML
An anonymous reader writes "How will you use XML in years to come? The wheels of progress turn slowly, but turn they do. The outline of XML's future is becoming clear. The exact timeline is a tad uncertain, but where XML is going isn't. XML's future lies with the Web, and more specifically with Web publishing. 'Word processors, spreadsheets, games, diagramming tools, and more are all migrating into the browser. This trend will only accelerate in the coming year as local storage in Web browsers makes it increasingly possible to work offline. But XML is still firmly grounded in Web 1.0 publishing, and that's still very important.'"
Netscape's dream of replacing the operating system with a browser is also coming true this year.
They've been saying that for years, and frankly it won't happen. A vast amount of users relish the control that having software stored and run locally provides. Of course there will always be exceptions as web based e-mail has shown us.
As far as the future of XML... I can't seem to find anything in this article that states anything more than the obvious, it's on the same path it's been on for quite some time.
FTA:
Success or failure, XML was intended for publishing: books, manuals, and--most important--Web pages.
Is that news to anyone? My understanding of XML is that it's intended use is to provide information, about the information.
I'm sick of following my dreams. I'm just going to ask where they're goin' and hook up with 'em later.
The think with XML is that it so easily supports whatever design the developer can think of. Even the realy bad ones. Now that it is such a buzz word, the problem gets worse.
I had someone call me up to design them a simple web app. But he wanted it coded in XML because he thought that was the technology he wanted. His Access database was not web frendly enough.
I did correct him a little to put him in check and atleast gave him the right buzz words to use to the next guy.
I think XML is dead simple to use if used correctly. I do like it much better that ini files. That is about all I use it for now. Easy to use config files that others have to use.
Im a gamer, not a grammer major. This post is full of spelling and grammer mistakes.
JSON/YAML is/are better (not considering, of course, the variety and maturity of available tools; but then, perhaps, you don't always need most of what is out there in XML tools, either) for lots of things (mostly, the kinds of things TFA notes XML wasn't designed for and often isn't the best choice for),things that aren't marked-up text. Where you actually want an extensible language for text-centric markup, rather than a structured format for interchange of something that isn't marked-up text, XML seems to be a pretty good choice. Of course, for some reason, that seems to be a minority of the uses of XML.
I have had far too many 'this stuff sucks' moments with XML to ever consider using it in any capacity where it is not forced upon me (which unfortunately, it is, with great frequency).
I first heard about XML years ago when it was new, and just the concept sucked to me. A markup language based on the ugly and unwieldy syntax of SGML (from which HTML derives)? We don't need more SGML-alikes, we need fewer, was my thought. This stuff sucks.
Then a while later I actually had to use XML. I read up on its design and features and thought, OK well at least the cool bit is that it has DTDs to describe the specifics of a domain of XML. But then I found out that DTDs are incomplete to the extreme, unable to properly specify large parts of what one should be able to specify with it. And on top of that, DTDs don't even use XML syntax - what the hell? This stuff sucks.
I then found that there were several competing specifications for XML-based replacements for the DTD syntax, and none were well-accepted enough to be considered the standard. So I realized that there was going to be no way to avoid fragmentation and incompatibility in XML schemas. This stuff sucks.
I spent some time reading through supposedly 'human readable' XML documents, and writing some. Both reading and writing XML is incredibly nonsuccinct, error-prone, and time consuming. This stuff sucks.
Finally I had to write some code to read in XML documents and operate on them. I searched around for freely available software libraries that would take care of parsing the XML documents for me. I had to read up on the 'SAX' and 'DOM' models of XML parsing. Both are ridiculously primitive and difficult to work with. This stuff sucks.
Of course I found the most widely distributed, and probably widely used, free XML parser (using the SAX style), expat. It is not re-entrant, because XML syntax is so ridiculously and overly complex that people don't even bother to write re-entrant parsers for it. So you have to dedicate a thread to keeping the stack state for the parser, or read the whole document in one big buffer and pass it to the parser. XML is so unwieldy and stupid that even the best freely available implementations of parsers are lame. This stuff sucks.
Then I got bitten by numerous bugs that occurred because XML has such weak syntax; you can't easily limit the size of elements in a document, for example, either in the DTD (or XML schema replacement) or expat. You just gotta accept that the parser could blow up your program if someone feeds it bad data, because the parser writers couldn't be bothered to put any kind of controls in on this, probably because they were 'thinking XML style', which basically means, not thinking much at all. This stuff sucks.
Finally, my application had poor performance because XML is so slow and bloated to read in as a wire protocol. This stuff sucks.
XML sucks in so many different ways, it's amazing. In fact I cannot think of a single thing that XML does well, or a single aspect of it that couldn't have been better planned from the beginning. I blame the creators of XML, who obviously didn't really have much of a clue.
In summary - XML sucks, and I refuse to use it, and actively fight against it every opportunity I get.
XML is easy to understand because of the prevalence of HTML knowledge. XML is easy because it's text. XML is easy because, like perl, you can store the same thing in 15 ways. XML is easy because there is only one data type: text. XML is flexible because you can nest to your heart's content.
/. steps down yet another notch. IMHO: if you loathe/hate XML then you should think about a change in career because it's not going away any time soon...
All these things are why people use it.
All these things are why people abuse it.
All these things are why we won't be able to get rid of it soon.
TFA has nothing to say about the future of XML but the tools to use XML. XQuery and XML databases. Whoopity do. The threshold for getting posted on
:wq
The "bulkiness" of XML is also it's strength: XML can be used to markup almost any data imaginable. Now it's true that for most simple two-party exchanges, a simpler format (like comma separated values or YAML or something) would require less characters, and would thus save disk space, transmit faster, etc.
... ok I admit, that wasn't so hard, just some Javascript right? But now you have to write a new batch of code to validate the next type of JSON you use. And another for the next, and so on. With XML, you have a choice of not one but four different schema formats; once you learn to use one of them, you can describe a validation schema far more quickly than you ever could in Javascript.
However, the modern programming age is all about sacrificing performance for convenience (this is why virtually no one is using C or C++ to make web apps, and almost everyone is using a significantly poorer performing language like Python or Ruby). We've got powerful computers with tons of RAM and hard drive space, and high-speed internet connections that can transmit vast amounts of data in mere seconds; why waste (valuable programmer) time and energy over-optimizing everything?
Instead, developers choose the option that will make their lives easier. XML is widely known, easily understood, and is human readable. I can send an XML document, without any schema or documentation, to another developer and they'll be able to "grok it". There's also a ton of tools out there for working with XML; if someone sends me a random XML document, I can see it syntax colored in Eclipse or my browser. If someone sends me an XML schema, I can use JAXB to generate Java classes to interact with it. If I need to reformat/convert ANY XML document, I can just whip up an XSLT for it and I'm done.
So yes, other formats offer some benefits. But XML's universality (which does require a bit of bulkiness) makes it a great choice for most types of data one would like to markup and/or transmit.
P.S. JSON is just as usable? Try writing a schema to validate it
Same deal with transformations: if you want to alter your JSON data in a consistent way, you have to again write custom code every time. Sure XSLT has a learning curve, but once you master it you can accomplish in a few lines of code what any other language would need tens or even hundreds of lines to do.
Like a lot of things, XML is popular because it's popular. Parsing is done with libraries, so programmers don't have to see or care how much overhead is involved, and it's well-known and well-understood, so it's easy to find people who are familiar with it. Every programmer and his dog knows the basics. It's easy to cobble up some in a text editor for testing purposes. You can hand it off to some guy in a completely separate division without worrying that he's going to find it particularly confusing. And you can work with it in pretty much any modern programming language without having to worry about the messy details. It's the path of least resistance. It may not be good, but it's frequently good enough, and that's usually the bottom line.
:)
I mean, yeah, when I was a kid, we all worked in hand-optimized C and assembler, and tried to pack useful information into each bit of storage, but systems were a lot smaller and a lot more expensive back then. These days, I write perl or python scripts that spit out forty bytes of XML to encode a single boolean flag, and it doesn't even faze me. Welcome to the 21st century.
OpenStep property lists kick json's ass 7 ways to sunday.
Do you even lift?
These aren't the 'roids you're looking for.
S-expressions (think the lisp format) are much nicer, more compact, and easier to use than XML, while sharing almost all of the same properties otherwise.
For example:
<tag1>
<tag2>
<tag3/>
</tag2>
<tag1>
becomes:
(tag1
(tag2
(tag3)
)
)
If I have nothing to hide, don't search me
If I use XML, I can embed documents in other documents of different types, and they share a DOM. I can serve an XHTML document with MathML and SVG inside it, and use one CSS file to style everything, and my Javascript file can play with all of the above.
JSON is neat, and it's great for some things, but I haven't seen anything from the JSON people that even approaches what I can do with XML in a browser.
XML tries to make everything fit into a single hierarchy. Most real-world information is comprised of graphs of data. ISO STEP provides better readability compared to XML, a more strongly typed schema mechanism, and a more compact size. Best of all, programs can process and present results of STEP incrementally instead of requiring closing tags so you can hold gigabytes of information in the same file and seek randomly.
Example:
#10=ORGANIZATION('O0001','LKSoft','company');
#11=PRODUCT_DEFINITION_CONTEXT('part definition',#12,'manufacturing');
#12=APPLICATION_CONTEXT('mechanical design');
#13=APPLICATION_PROTOCOL_DEFINITION('','automotive_design',2003,#12);
#14=PRODUCT_DEFINITION('0',$,#15,#11);
#15=PRODUCT_DEFINITION_FORMATION('1',$,#16);
#16=PRODUCT('A0001','Test Part 1','',(#18));
#17=PRODUCT_RELATED_PRODUCT_CATEGORY('part',$,(#16));
#18=PRODUCT_CONTEXT('',#12,'');
#19=APPLIED_ORGANIZATION_ASSIGNMENT(#10,#20,(#16));
#20=ORGANIZATION_ROLE('id owner');
You forgot XSLT.
It is extremely powerful tool, I once (ages ago) made a pure XSLT implementation to convert XML into C. Whith a CSS the XSLT was even browser/human viewable (the output was somewhat similar to the C program output).
I do not think JSON can do that.
I'll save the discussion for XML on the web for others - I'm a game programmer, so I deal with XML as file-based data sources.
.NET's reflection capabilities, it's absolutely brainless to easily serialize any data structure. We've decided to use XAML (an XML-based object declaration format) / WPF as well. The artists love the flexibility in the tools, and can even participate in helping to design the interfaces by creating styles.
At the game studio where I work, all our newest tools are written in C#, and use XML as a data source (typically indirectly though serialized objects). Heavyweight objects (textures, models, audio) are naturally stored in a binary format, which is optimized for the task at hand. The XML-based formats are essentially our game data's source files, and tends to function in a metadata-type capacity. As a simple example, our audio scripts store a lot of parameters about how to play a sound (pitch and volume variations, choosing among multiple variants, category and volume data, etc), and this metadata simply references external binary audio files, typically stored in a standard format like Ogg Vorbis or ADPCM compressed wave files. This metadata is compiled into a binary run-time version using a proprietary format designed to allow us to easily filter versions. These binary formats are then packed into larger containers for simpler management. Since I work on an MMO, we have to think about versioning our binary data, which tends to be challenging.
XML is a great format for us, being so widely supported, since we use both native parsing libraries as well as a lightweight custom parser for our C++ tools (or if we need to support in-game loading for the in-house version of the game). It's easy to look into a file format to see what might be going wrong using just a text editor, and with
I don't know what the argument about not knowing what every tag means, like in HTML. The entire point of XML is to be extensible, meaning that it's the client application that determines what the tags ultimately mean. And using SweetXML, btw, misses one of the great benefits of using XML, which is that's it's a standard for which you're likely never going to have to write parsing libraries. It's fine if you want to go that route, but just be aware that you may not have the choice of libraries that you would have by using standard XML.
XML does tend to suffer from the "golden hammer" syndrome. Honestly, I'm not a huge fan of it's verbosity or general readability either, but if you take it for what it is, and use it sensibly, it's just another nifty tool you as a programmer can make good use of. After all, wouldn't you rather be working on more important parts of your project than fiddling with a text parser?
Irony: Agile development has too much intertia to be abandoned now.
XML does suck if you stick with some of the W3C standards and common tools. Suggestions to make it less painful:
W3C Schema is painful; it forces object-oriented design concepts onto a hierarchical data model. Consider RELAX NG (an Oasis-approved standard) instead; it's delightful in comparison. Use the verbose XML syntax when communicating with the less technical - if you've seen XML before, it's pretty easy to comprehend:
Switch to the compact syntax when you're among geeks:
There's validation support on major platforms, and even a tool (Trang) to convert between verbose/compact formats, and output to DTD and W3C Schemas. And, if you need to specify data types, it borrows the one technology W3C Schema got right: the Datatypes library.
The W3C DOM attempts to be a universal API, which means it must conform to the lowest common denominator in the programming languages it targets. Consider the NodeList interface:
While similar to the native list/collection/array interfaces most languages provide, it's not an exact match. So, DOM implementers create an object that doesn't work quite like any other collection on the platform. In Java, this means writing:
Instead of:
Dynamic languages allow an even more concise syntax. Consider this Ruby builder code to build a trivial XML document:
I thought about writing the W3C DOM equivalent of the above, but I'm not feeling masochistic tonight. Sorry.
The alternatives depend on your programming language, but plenty of choices exist for DOM-style traversal/manipulation.
In-memory object models of large XML document can consume a lot of resources, but often, you only need part of the data. Consider using an XMLPull or StAX parser instead. Pull means you control the document traversal, only descending into (and fully parsing) sections of the XML that are of interest. SAX based parsers have equivalent capabilities, but the programming model is uncomfortable for many developers.
Even better, some Pull processors are wicked fast, even when using them to construct a DOM. In Winter 2006, I benchmarked an XML-heavy application, and found WoodStox to be an order of magnitude faster at constructing thousands of small DOM4J documents
Scott Severtson
Senior Architect, Digital Measures
XML is so popular because business people don't understand it and think it can magically do a lot of things it can't, so they choose software that uses XML when it really doesn't matter.
I have a lot of experience consulting with various organizations - some Fortune 500, some nonprofit, some educational - about their software selection process. I've watched many times as a vendor gives a presentation to my employer or client talking about how wonderous it is that their software saves all its data in XML so you'll be able to openly interchange it with everything. The business people's eyes glaze over and a happy glow suffuses their face, as they imagine that that means that any software's data that's saved in XML can be magically opened and understood by any other software that uses XML. They don't get the idea that there's a specific schema involved and that your word processor won't be able to automatically make sense of data generated by your database just because there's XML involved.
When I talk to my client after the vendor presentation I invariably learn that they think that XML is a sort of universal translator. I've had to explain why it isn't to so many clients I finally just wrote a white paper on the subject so when it comes up again I can just print it out and hand it to them. If I don't, I find my clients will reliably choose to buy the software that uses XML instead of the software that best meets their needs.
Believe me, the vendors knew that their prospective customers would act this way, and did everything they could to play up the idea that XML is magically interchangeable with all software, sometimes to the point of telling outright lies about it.
I've been working with XML ever since it first came out and the whole XML on the front-end is a fad that comes and goes periodically.
The pros of XML
Cons of XML
The pros and cons mean that the best place to use XML is for interoperability between systems/applications developed by different teams/vendors where not much data is sent around and processing is not time sensitive. This does cover some front-end applications where the data can be generated by a program done by one vendor and read by a program done by a different vendor. It does, however, not cover files which are meant to be written and read by the same application.
The second best place is to quickly add support for a tree structured storage format for data to an application (for example, for a config file), since you can just pick-up one of the XML libraries out there and half your file format problems will be solved (you still have to figure out and develop the "what to put in there" and "where to put it" part, but need not worry about most of the mechanics of generating, parsing and validating the file)
It is extremely powerful tool, I once (ages ago) made a pure XSLT implementation to convert XML into C. Whith a CSS the XSLT was even browser/human viewable (the output was somewhat similar to the C program output).
I do not think JSON can do that.
XSLT is a nice backwards chaining theorem prover, very similar to Prolog. I like it and use it a lot - currently for me it venerates SQL, Hibernate mappings, C# code and Velocity macros from a single source XML document. But there's nothing magic about it, and if we didn't have XSLT it would be very easy to do the same sort of thing in LISP or Prolog, or (slightly more awkwardly) in conventional programming languages.
I'm old enough to remember when discussions on Slashdot were well informed.
Sparingly. JSON is just plain better, and doesn't inflict an enterprisey mindset on anyone that tries to use it.
While I understand your pain, XML is still a very nice *markup* language, for marking up documents and simple content trees.
Can you imagine HTML / XHTML implemented as JSON? I doubt that.
The fault with people here lies in XML abuse, namely SOAP-like XML API-s and using XML for everything, where binary formats, or more compact and simpler formats, like JSON, do better.
>> Test question: Which is quicker?
.. it had to build corresponding page objects) to handle it at the other end.
>> 1. Spending a few hours coding your formats in some binary format making maximum use of all the bits.
>> 2. Spending a few minutes writing code to send your internal data structure to a library that will serialize it into XML and then running the XML through a generic compression routine (if space/speed actually makes any difference to your particular application).
A while back (before XML parsers were common) I built a kinda cool system whereby a mainframe programmer built a system that read 3270 page descriptions and converted them to XML. At the other end, I wrote a generator that built huge amounts of VB (hey - it worked
This replaced a complex and incredibly expensive system using a proprietary binary format.
I was amazed (and delighted) to find the XML system was actually faster - even before we put compression on the data.
And it was was a damned sight easier to handle, upgrade, extend - and pay for!
Go XML, I say.
(And it was soo cool to generate, say, 100,000 lines of code and have it compile and work straight off the bat).
"Cats like plain crisps"
"You might feel better..." -> "No, it wouldn't..."? WTF is that supposed to mean? How is taht even a response to what precedes it?
"JSON is..." -> "XML does much more than that." Again, this is incoherent. XML is simply tree-structured markup. It has less inherent semantic content than JSON/YAML. OTOH, JSON/YAML can do a lot more than what it is, just as XML can. Both JSON/YAML and XML can present pretty arbitrary data in manner which is fairly easy to parse automatically and fairly readable. XML strengths, IMO, is that its a more natural format for text-centric markup, and that it has a lot more maturity in the available tools and applications.
The point is, though, JSON doesn't force JavaScript on anyone.
That's kind of a dumb point to make without any discussion about which those areas are; as discussed above, the areas I see where XML is intrinsically superior are fairly narrow, though its currently probably better for lots of projects simply because its been established longer and has better tool support.
You missed the point again. Sure, there are other programming languages out there. The fact that there are JSON libraries for many of them underlines that the argument "JSON is forcing JavaScript on everyone" is false. JSON is an interchange format. It may be inspired by JavaScript, but it isn't forcing JavaScript on anyone.
But, yeah, S-expressions are great.
Since I didn't say "there are libraries to use JSON, so why use XML", but "there are libraries to use JSON from other languages, so JSON isn't forcing JavaScript on anyone", I think you miss the point. Anyone, real Lispers are more likely to say that "there are libraries to read/write S-expressions, so why use XML, which is essentially a more verbose but no more expressive or clear variant of S-expressions?*"
S-expressions are not "nesting lists of strings".
And, very often, a markup language inherently isn't a natural fit to data interchange requirements. JSON/YAML are both motivated by that specific fact. In fact, but for XHTML and office document formats, I've rarely seen an application where XML was used where a markup language was a really natural tool to start with. XML is popular because its a hammer that everyone's gotten really comfortable using, so every data interchange need is treated as if it were a nail, whether or not its actually a nail, a screw, or glue-on wallpaper.
This is certainly a limitation in JSON/YAML tool support. I don't think its an inherent limitation in the formats: either has
Wow, dude. Did you stop to think that there are many markup languages built on top of XML that can represent such things?
Yes in fact I did. That's what I was referring to when I talked about the "mountains of horribly complex software" on top of XML.
[0.5k of RDF that expresses 100, 200 as integer coordinates]
Simple enough.
Thank you for expressing so succinctly exactly why I am so depressed. How did you XML people come to have such low standards? How can you call "simple enough" a fragment of code that takes 0.5k of text to express a pair of coordinates, and takes 3 hefty specifications worth of software to make any sense of it (XML, XML Namespaces, and RDF)? Do you realize that the comparable JSON is {"coords": {"x": 100, "y": 200}} which can be fully parsed with well under 500 lines of C?
What words did XML seduce you with that you're willing to put up with this kind of abuse? How can you be convinced that this relationship is good for you?
You're right though, it is just a tree of nodes until you parse it. In fact, it's just one long string until you parse it. Imagine that....
Um, no. It's a tree of nodes after you parse it with an XML parser -- that's the problem.