Slashdot Mirror


The Future of XML

An anonymous reader writes "How will you use XML in years to come? The wheels of progress turn slowly, but turn they do. The outline of XML's future is becoming clear. The exact timeline is a tad uncertain, but where XML is going isn't. XML's future lies with the Web, and more specifically with Web publishing. 'Word processors, spreadsheets, games, diagramming tools, and more are all migrating into the browser. This trend will only accelerate in the coming year as local storage in Web browsers makes it increasingly possible to work offline. But XML is still firmly grounded in Web 1.0 publishing, and that's still very important.'"

13 of 273 comments (clear)

  1. "How will you use XML in years to come?" by Ant+P. · · Score: 5, Insightful

    Sparingly. JSON is just plain better, and doesn't inflict an enterprisey mindset on anyone that tries to use it.

    1. Re:"How will you use XML in years to come?" by MagikSlinger · · Score: 5, Insightful

      JSON is inflicting Javascript on everyone.

      No, it really doesn't, but if "JavaScript" in the name bothers you, you might feel better with YAML.

      No, it wouldn't because JSON is bare bones data. It's simply nested hash tables, arrays and strings. XML does much more than that. XML can represent a lot of information in a simple, easy-to-understand format. JSON strips it out for speed & efficiency. Which sort of gets into the point I did want to make but was too impatient to explain: JSON is good where JSON is best, and XML is good where XML is best. I dislike the one-uber-alles arguments because it's ignoring other situations and their needs.

      There are other programming languages out there.

      And there are JSON and/or YAML libraries for quite a lot of them. So what?

      Would you like to live in a world of S-expressions? The LISP people would point out there are libraries to read/write S-expressions, so why use JSON? The answer of course is that we want more than simply nesting lists of strings. We want our markup languages to fit our requirements, not the other way around. And saying "JSON for Everything", which the original poster did was... silly.

      My problems with JSON are:

      • No schema: XML Schema not only makes it easier to unit test, but it can be fed into tools that can do useful things like automatic creation of Java classes and code to read/write. Does JSON have anything like that? Of course not, because it would defeat JSON's purpose: easy Javascript data transmission.
      • Expressability: With XML, I can create a model that fits my logical model of the data where I use attributes to augment the data in the child elements. Doing that in JSON is a kludge with a hash-table to represent an element which can't be easily converted into a graph for easy understanding.
      • Diversity: I use GML in my day job. A lot. I can easily set up an object conversion rule with Jakarta Digester that I can painlessly drop into future projects without modification. That's the power of namespaces. I can build an XML document using tags from a dozen different schema, and then feed it to another application that only looks for the tags it cares about.
      • XPath. 'Nuff said. Ok, one thing: this should have replaced SAX/DOM years ago.

      JSON is great for AJAX where XML is clunky and a little bit slower (my own speed tests hasn't shown there's a huge hit, but it is significant). XML is great for document-type data like formatted documents or electronic data interchange between heavy-weight processes. My point was that the original poster's JSON is everything was narrow-minded, and that XML answers a very specific need. There are tonnes of mark-up languages out there, and I think XML is a great machine-based language. I hate it when humans have to write XML to configure something though. That really ticks me off. But that's the point: there should not be one mark-up language to rule them all. A mark-up language for every purpose.

      --
      The bitter lessons of a veteran coder: http://bitterprogrammer.blogspot.com
    2. Re:"How will you use XML in years to come?" by CoughDropAddict · · Score: 5, Insightful
      I'm so depressed. You represent an entire generation of programmers who can't figure out the difference between marked-up text and data, and why mark-up languages suck so bad for data interchange.

      Pop quiz. Here's an excerpt of GML from that page you linked to.

      <gml:coordinates>100,200</gml:coordinates>
      Do the contents of this node represent:
      1. the text string "100,200"
      2. the number 100200 (with a customary comma for nice formatting)
      3. the number 100.2 (hey, that's the way that the crazy Europeans do it)
      4. a tuple of two numbers: 100 and 200
      "Obviously it's two numbers, they're coordinates" you may say. But such things are not "obvious" to an XML parser. If you're an XML parser the answer is (1): it's a simple text string. So to get to the real data you have to parse that text string again to split on a comma, and to turn the two resulting text strings into numbers. Note this is a completely separate parser and is completely outside the XML data model, so all your fancy schema validation, xpath, etc. are useless to access data at this level.

      Why all this pain? Because XML simply has no way to say "this is a list of things" or "this is a number."

      Sure, you can approximate such things. You could write something like:

      <gml:coordinates>
          <gml:coordinateX>100</gml:coordinateX>
          <gml:coordinateY>200</gml:coordinateY>
      </gml:coordinates>
      But the fact remains that even though you may intuitively understand this to be two coordinates when you look at it (and at least you can select the coordinates individually with xpath in this example, but they're still strings, not numbers) to XML this is still nothing but a tree of nodes.

      Did you catch that? A tree of nodes. You're taking a concept which is logically a pair of integers, and encoding it in a format that's representing it in a tree of nodes. Specifically, that tree looks something like this:

      elementNode name=gml:coordinates
      \-> textNode, text="\n " *
      \-> elementNode name=gml:coordinateX
          \-> textNode text="100"
      \-> textNode, text="\n " *
      \-> elementNode name=gml:coordinateY
          \-> textNode, text="200"
      \-> textNode, text="\n" *


      (*: yep, it keeps all that whitespace that you only intended for formatting. XML is a text markup language, so all text anywhere in the document is significant to the parser.)

      So let's recap. Using XML, we've taken a structure which is logically just a pair of integers and encoded it as a tree of 7 nodes, three of which are meaningless whitespace that was only put there for formatting, and even after all this XML has no clue that what we're dealing with is a pair of integers.

      Now let's try this example in JSON:

      {"coordinates": [100, 200]}
      JSON knows two things that your fancy shmancy XML parser will never know: that 100 and 200 are numbers, and that they are two elements of an array (which might be more appropriately thought of as a "tuple" in this context). It's smart enough to know that the whitespace is not significant, it doesn't build this complex and meaningless node tree; it just lets you express, directly and succinctly, the data you are trying to encode.

      That's because JSON is a data format, and XML is a marked up text format. But we're suffering from the fact that no one realized this ten years ago, and compensated for the parity mismatch by layering mountains of horribly complex software on top of XML instead of just using something that is actually good at data interchange.
  2. You know the saying by Anonymous Coward · · Score: 5, Funny

    XML is like violence. If it doesn't solve your problem, you're not using enough of it.

  3. I don't understand... by ilovegeorgebush · · Score: 5, Insightful

    I don't get it. We can argue the merits of data exchange formats 'till we're blue in the face; yet I cannot see why XML is so popular. For the majority of applications that use it, it's overboard. Yes, it's easier on the eye, but ultimately how often do you have to play with the XML your CAD software uses?

    I'm a programmer, just like the rest of you here, so I'm quite used to having to write a parser here or there, or fixing an issue or two in an ant script. The thing that puzzles me, is why it's used so much on the web. XML is bulky, and when designed badly it can be far too complex; this all adds to bandwidth and processing on the client (think AJAX), so I'm not seeing why anyone would want to use it. Formats like JSON are just as usable, and not to mention more lightweight. Where's the gain?

    1. Re:I don't understand... by SpaceHamster · · Score: 5, Insightful

      My best stab at the popularity:

      1. Looks a lot like HTML. "Oh, it has angle brackets, I know this!"
      2. Inertia.
      3. Has features that make it a good choice for business: schemas and validation, transforms, namespaces, a type system.
      4. Inertia.

      There just isn't that much need to switch. Modern parsers/hardware make the slowness argument moot, and everyone knows how to work with it.

      As an interchange format with javascript (and other dynamically typed languages) it is sub-optimal for a number of reasons, and so an alternative, JSON has developed which fills that particular niche. But when I sit down to right yet another line of business app, my default format is going to be XML, and will be for the foreseeable future.

      --
      "BeOS is a great operating system" -Doug Miller, Microsoft
    2. Re:I don't understand... by GodfatherofSoul · · Score: 5, Insightful

      XML gives you a parsable standard on two levels; generic XML syntax and specific to your protocol via schemas. It's verbose enough to allow by-hand manual editing while the syntax will catch any errors save semantic errors you'll likely have. It's also a little more versatile as far as the syntax goes. Yes, there are less verbose parsing syntaxes out there, but you always seem to lose something when it comes to manual viewing or editing.

      Plus, as far as writing parsers, why burn the time when there are so many tools for XML out there? It's a design choice I suppose like every other one; i.e. what are you losing/gaining by DIYing? Personally, I love XML and regret that it hasn't taken off more. Especially in the area of network protocols. People have been trying to shove everything into an HTML pipe, when XML over the much underrated BEEP is a far more versatile. There are costs, though as you've already mentioned.

      --
      I swear to God...I swear to God! That is NOT how you treat your human!
    3. Re:I don't understand... by machineghost · · Score: 5, Interesting

      The "bulkiness" of XML is also it's strength: XML can be used to markup almost any data imaginable. Now it's true that for most simple two-party exchanges, a simpler format (like comma separated values or YAML or something) would require less characters, and would thus save disk space, transmit faster, etc.

      However, the modern programming age is all about sacrificing performance for convenience (this is why virtually no one is using C or C++ to make web apps, and almost everyone is using a significantly poorer performing language like Python or Ruby). We've got powerful computers with tons of RAM and hard drive space, and high-speed internet connections that can transmit vast amounts of data in mere seconds; why waste (valuable programmer) time and energy over-optimizing everything?

      Instead, developers choose the option that will make their lives easier. XML is widely known, easily understood, and is human readable. I can send an XML document, without any schema or documentation, to another developer and they'll be able to "grok it". There's also a ton of tools out there for working with XML; if someone sends me a random XML document, I can see it syntax colored in Eclipse or my browser. If someone sends me an XML schema, I can use JAXB to generate Java classes to interact with it. If I need to reformat/convert ANY XML document, I can just whip up an XSLT for it and I'm done.

      So yes, other formats offer some benefits. But XML's universality (which does require a bit of bulkiness) makes it a great choice for most types of data one would like to markup and/or transmit.

      P.S. JSON is just as usable? Try writing a schema to validate it ... ok I admit, that wasn't so hard, just some Javascript right? But now you have to write a new batch of code to validate the next type of JSON you use. And another for the next, and so on. With XML, you have a choice of not one but four different schema formats; once you learn to use one of them, you can describe a validation schema far more quickly than you ever could in Javascript.

      Same deal with transformations: if you want to alter your JSON data in a consistent way, you have to again write custom code every time. Sure XSLT has a learning curve, but once you master it you can accomplish in a few lines of code what any other language would need tens or even hundreds of lines to do.

    4. Re:I don't understand... by Anonymous Coward · · Score: 5, Insightful

      I don't get it. We can argue the merits of data exchange formats 'till we're blue in the face; yet I cannot see why XML is so popular.

      Because it's a standard that everyone (even reluctantly) can agree on.

      Because there are well-debugged libraries for reading, writing and manipulating it.

      Because (as a last resort) text is easy to manipulate with scripting languages like perl and python.

      Because if verbosity is a problem, text compresses very well.

    5. Re:I don't understand... by batkiwi · · Score: 5, Insightful

      XML IS:
      -Easily validated
      -Easily parsed
      -Easily compressed (in transit or stored)
      -Human readable in case of emergency
      -Easily extendable

  4. Too many 'this stuff sucks' moments by Bryan+Ischo · · Score: 5, Interesting

    I have had far too many 'this stuff sucks' moments with XML to ever consider using it in any capacity where it is not forced upon me (which unfortunately, it is, with great frequency).

    I first heard about XML years ago when it was new, and just the concept sucked to me. A markup language based on the ugly and unwieldy syntax of SGML (from which HTML derives)? We don't need more SGML-alikes, we need fewer, was my thought. This stuff sucks.

    Then a while later I actually had to use XML. I read up on its design and features and thought, OK well at least the cool bit is that it has DTDs to describe the specifics of a domain of XML. But then I found out that DTDs are incomplete to the extreme, unable to properly specify large parts of what one should be able to specify with it. And on top of that, DTDs don't even use XML syntax - what the hell? This stuff sucks.

    I then found that there were several competing specifications for XML-based replacements for the DTD syntax, and none were well-accepted enough to be considered the standard. So I realized that there was going to be no way to avoid fragmentation and incompatibility in XML schemas. This stuff sucks.

    I spent some time reading through supposedly 'human readable' XML documents, and writing some. Both reading and writing XML is incredibly nonsuccinct, error-prone, and time consuming. This stuff sucks.

    Finally I had to write some code to read in XML documents and operate on them. I searched around for freely available software libraries that would take care of parsing the XML documents for me. I had to read up on the 'SAX' and 'DOM' models of XML parsing. Both are ridiculously primitive and difficult to work with. This stuff sucks.

    Of course I found the most widely distributed, and probably widely used, free XML parser (using the SAX style), expat. It is not re-entrant, because XML syntax is so ridiculously and overly complex that people don't even bother to write re-entrant parsers for it. So you have to dedicate a thread to keeping the stack state for the parser, or read the whole document in one big buffer and pass it to the parser. XML is so unwieldy and stupid that even the best freely available implementations of parsers are lame. This stuff sucks.

    Then I got bitten by numerous bugs that occurred because XML has such weak syntax; you can't easily limit the size of elements in a document, for example, either in the DTD (or XML schema replacement) or expat. You just gotta accept that the parser could blow up your program if someone feeds it bad data, because the parser writers couldn't be bothered to put any kind of controls in on this, probably because they were 'thinking XML style', which basically means, not thinking much at all. This stuff sucks.

    Finally, my application had poor performance because XML is so slow and bloated to read in as a wire protocol. This stuff sucks.

    XML sucks in so many different ways, it's amazing. In fact I cannot think of a single thing that XML does well, or a single aspect of it that couldn't have been better planned from the beginning. I blame the creators of XML, who obviously didn't really have much of a clue.

    In summary - XML sucks, and I refuse to use it, and actively fight against it every opportunity I get.

  5. Errrm, folks, what's the big fat hairy deal? by Qbertino · · Score: 5, Informative

    Ok. I've once again seen the full range of XML comments here. From 'cool super technology modern java' to 'OMFG it sucks' right over to 'XML has bad security' - I mean ... WTF? XML is a Data Format Standard. It has about as much to do with IT security as the color of your keyboard.

    And for those of you out there who haven't yet noticed: XML sucks because data structure serialisation sucks. It allways will. You can't cut open, unravel and string out an n-dimensional net of relations into a 1-dimensional string of bits and bytes without it sucking in one way or the other. It's a, if not THE classic hard problem in IT. Get over it. It's with XML that we've finally agreed upon in which way it's supposed to suck. Halle-flippin'-luja! XML is the unified successor to the late sixties way of badly delimited literals, indifference between variables and values and flatfile constructs of obscure standards nobody wants. And which are so arcane by todays standards that they are beyond useless (Check out AICC if you don't know what I mean). Crappy PLs and config schemas from the dawn of computing.

    That's all there is to XML: a universal n-to-1 serialisation standard. Nothing more and nothing less. Calm down.

    And as for the headline: Of-f*cking-course it's here to stay. What do you want to change about it (much less 'enhance'). Do you want to start color-coding your data? Talking about the future of XML is allmost like talking about the future of the wheel ("Scientist ask: Will it ever get any rounder?"). Give me a break. I'm glad we got it and I'm actually - for once - gratefull to the academic IT community doing something usefull and pushing it. It's universal, can be handled by any class and style of data processing and when things get rough it's even human readable. What more do you want?

    Now if only someone could come up with a replacement for SQL and enforce universal utf-8 everywhere we could finally leave the 1960s behind us and shed the last pieces of vintage computing we have to deal with on a daily basis. Thats what discussions like these should actually be about.

    --
    We suffer more in our imagination than in reality. - Seneca
  6. XML is a fad, STEP is the future by chip2004 · · Score: 5, Interesting

    XML tries to make everything fit into a single hierarchy. Most real-world information is comprised of graphs of data. ISO STEP provides better readability compared to XML, a more strongly typed schema mechanism, and a more compact size. Best of all, programs can process and present results of STEP incrementally instead of requiring closing tags so you can hold gigabytes of information in the same file and seek randomly.

    Example:
    #10=ORGANIZATION('O0001','LKSoft','company');
    #11=PRODUCT_DEFINITION_CONTEXT('part definition',#12,'manufacturing');
    #12=APPLICATION_CONTEXT('mechanical design');
    #13=APPLICATION_PROTOCOL_DEFINITION('','automotive_design',2003,#12);
    #14=PRODUCT_DEFINITION('0',$,#15,#11);
    #15=PRODUCT_DEFINITION_FORMATION('1',$,#16);
    #16=PRODUCT('A0001','Test Part 1','',(#18));
    #17=PRODUCT_RELATED_PRODUCT_CATEGORY('part',$,(#16));
    #18=PRODUCT_CONTEXT('',#12,'');
    #19=APPLIED_ORGANIZATION_ASSIGNMENT(#10,#20,(#16));
    #20=ORGANIZATION_ROLE('id owner');