Slashdot Mirror


XML Co-Creator says XML Is Too Hard For Programmers

orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."

47 of 562 comments (clear)

  1. But XML is great for computers... by Max+Romantschuk · · Score: 1, Insightful

    First of all IDNRTA (I Did Not Read The Article).

    Writing XML by hand sure is no picnic. But I don't see writing XML by hand as something we should strive to do.

    XML is great for file formats. It's waaay better than binary formats. It's not as compact, but that is rarely an issue these days. Having a standard, structred, text-based, and editable-by-hand-when-necessary format is a godsend. Period.

    --
    .: Max Romantschuk :: http://max.romantschuk.fi/
    1. Re:But XML is great for computers... by CoolVibe · · Score: 5, Insightful
      Having a standard, structred, text-based, and editable-by-hand-when-necessary format is a godsend. Period.

      You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.

      My point: XML is over-used for a lot of things. In some places it makes sense, but in many places it doesn't.

    2. Re:But XML is great for computers... by Ed+Avis · · Score: 5, Insightful
      You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.

      You just gave the best argument for adopting XML as widely as possible. Yes, all these can be parsed (with the possible exception of sendmail's config files which may be Turing-complete) but they all require *different* code for each config file. If they were in XML you'd still need different semantic code, of course, but a whole wodge of syntax issues (how do I quote strings, how do I escape newlines, how do I mark nested scopes, what happens when the string delimiter character occurs inside a string, how do I deal with comments, what is the character set, is there a formal grammar for the document, etc etc) would be dealt with. Maybe not in the way that you or I think is perfect - IMHO XML is a little bit verbose compared to say Lisp- or Tcl-style encodings. But they would be dealt with *once*. No need to learn a new or almost-the-same-but-slightly-different set of syntactic conventions for every single config file.

      Maybe XML is over-used for a lot of things, but making up your own file format is definitely over-used a lot more. Simple line-oriented files are reasonable to have as plain text, for everything else please avoid the temptation to reinvent the wheel by devising a new syntax and block structure.

      --
      -- Ed Avis ed@membled.com
    3. Re:But XML is great for computers... by Zaiff+Urgulbunger · · Score: 2, Insightful

      Indeedy.

      And I've said it before, but I'll say it again -- XML as most people see it is *just* the serialised form of an XML structure. The same as Databases don't actually have to store lists of data in the order that you read it in.

      But as you quite rightly point out, having a standard, very accessible (if slightly verbose), method to create and edit data structures is indeed a god send!

      Here's an idea (which I've also said before!) - imagine if all those config files were XML based. So you could edit them using a text editor - same as now except slightly more cumbersome to edit.
      But we're agreed that being able to use a basic tool such as a text editor is a good thing right?

      Okay, so next up from that would be an XML editor so you can navigate the structure to find the element you want to tweak. The nice thing here is that you've got a standard tool that works with any XML file and therefore any config file.

      You can also build standard tools to work with these standard files so automating the update of a number of config files would be easy.

      Now lets go back to the whole thing about serialisation -- we're just manipulating data structures. The text-based, serialised form of these structures is called XML. The good thing is being able to edit with a text editor -- available on *any* platform including non-current platforms where no active development is occuring.

      But we're not limited, and we can build tools to work these structures more effeciently. And we don't *have* to use the serialised form if we don't want too -- it just happens that at this point in time, were the tools are not as evolved as they will be, it makes sense to use the text based form.

      In the future we could for example have a file system that is structured like an XML file? So then all those separate config files become part of the one structure, and thus even easier to manage.

      I'm rambling, so I'll stop now! My points are simply that, yep XML isn't perfect but don't get too hung up on it's being large-verbose-text-files, but it isn't -- thats just how it is currently being presented. Instead look at how it bridges the divide between old school proprietry, closed, binary formats, and the accessibility of text files.

    4. Re:But XML is great for computers... by Smallpond · · Score: 2, Insightful


      Had you read the article, his point was that you shouldn't have to slorp in the whole file just to read one field. In fact, he's using perl and regexp to avoid having to do things like Doc.Load.

      The author claims that existing tools are oriented toward either converting to a big internal data structure, or to processing gradually using callbacks, neither of which is optimal for small fast code or simple programming.

  2. A good point by shish · · Score: 3, Insightful

    Sure it sucks, but it's a *standard* that everyone can use, and there are many libraries for it so you don't need to write your own parsing code

    --
    I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
    1. Re:A good point by jilles · · Score: 3, Insightful

      Not only is it a standard, it appears to be the only widely accepted standard. Not using it currently boils down to going back to the hacked together, generally incompatible data formats of the past. Reinventing the wheel still is a popular way of passing time but it has never been very productive.

      People often fail to see the point of widely adopted standards but the bottom line is that it makes it easier to reuse functionality that confirms to the standard. There are now both SAX and DOM based parsers for most common programming languages. Basically if you spend some time figuring out how these APIs work you can work with XML from almost any language.

      That is not the problem. What is a problem is that everybody is introducing their own xml based languages and in many cases forget to publish the appropriate xml schema/dtd.

      Now the guy who is complaining here is a perl programmer who has to process data that is passed to him in XML form. His point is that it easier for him to throw together a bunch of regular expressions to do his thing than it is to use some off the shelf validating parser with a generic DOM/SAX based API. Good for him that is job is so simple that a bunch of regular expressions do the trick for him. I'd hate to maintain his code though and I suspect he doesn't have much reuse beyond the odd copy paste.

      --

      Jilles
    2. Re:A good point by EvilTwinSkippy · · Score: 3, Insightful
      Amen, and amen.

      Yes standards suck. But the suck in a way that is consistant and allows other sucky things to talk to other sucky things.

      I'll bet the 802.11b is a really crappy standards. But as long as I can pick up interchangable devices for $50 at the local computer store I'll live in ignorant bliss.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
  3. Maybe he should have read Knuth by thogard · · Score: 4, Insightful

    XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.

    None of this would have ever been needed had CS been tuaght properly. There are other concepts to describe how files are to be organized. Some of the systems date from the 1950's. BNF (which seems to work very well for programmers to describe file formats to other programmers) dates from the early 1960's. What was needed is a BNF type grammar that is machine readable.

    Would XLM have ever taken off if the web used something sane and not a hacked version of a nasty text formatting system from decades ago?

    1. Re:Maybe he should have read Knuth by Minna+Kirai · · Score: 3, Insightful

      I think the root of that difficulty comes from using XML to solve two different problems. One problem is data transmission between systems- which XML was designed for, and handles adequately. When recieving a data chunk from an external source who might not be trustworthy, a safety-concious program really has to read the whole thing and verify it complies with the format. Skipping over some sections to reach the part you're interested in isn't allowed.

      But, for data storage within an application (or a set of tightly coupled systems that trust each other to function correctly), XML is less advisable. Traditional (SQL) databases, or hand-rolled file formats, may be a better solution when high speed and scalability are needed.

      JoelOnSoftware has an long article on why XML is suboptimal for the latter use.

  4. This does not bode well by fudgefactor7 · · Score: 1, Insightful

    When an author says his work was not well done, that should be a sure fire red-flag that perhaps the whole thing should be aborted like an unviable fetus.

    1. Re:This does not bode well by JimDabell · · Score: 5, Insightful

      Did you actually read the article?

      I can sum it up very easily:

      • Callbacks irritate him.
      • It's not always practical to build a tree in-memory.

      He's looking for a nicer api for processing XML, he's not looking to replace XML entirely.

    2. Re:This does not bode well by ChimChim · · Score: 2, Insightful

      He never said his work, XML, is not well done. What he said was that the programming languages, APIs, and Environments haven't made the task of processing XML easy enough. XML itself is sound, or as sound as many alternatives.

      The thing is, back in the day when people wore onions on their belts, programmers had to be convinced that UNIX's "file is a bag of bytes" form of data access was better than the more direct/powerful/convenient methods they'd been used to, like raw access to the drive. But programmers aren't users, and what's great for users, or has benefits beyond the realm of CS will always complicate things for the programmer. However, the more complicated things are for programmers, the longer it will take to build systems and get usable products. So Tim Bray is basically saying that XML has succeeded in the data-interchange modekl, but is failing to also make programmers lives easier, which is also important.

    3. Re:This does not bode well by Random+Walk · · Score: 3, Insightful
      After reading the article, I would say he tries to use XML for something it is not very suitable for, and argues that in this case the available libraries are not useful (surprise ...).

      XML is not a stream - it has a hierarchical tree structure, and IMHO is not useful for anything that (a) by its very nature is a continuous stream of data (say, a log file), or (b) wants to be processed as a stream (because it's big, and would require too much memory to be handled as a single data structure).

      The problem seems to be that XML is good for portability and standardization, and therefore is abused for things it's not well suited for (the well-known 'if all you have is a hammer, every problem looks like a nail' syndrome).

  5. Re:xml by Uller-RM · · Score: 2, Insightful

    Since you apparently know nothing about XML, try reading the article. You'll learn something new, and you won't have to talk out your ass on this topic.

    XML's not a language -- it's a grammar, a guide of sorts, for hierarchical data storage. You design file formats that conform to XML. The goal is that it's easy to read that file format in any language or platform (given a XML processor/parser for that platform), since your data is stored in plain human-readable UTF8-encoded text.

    Might as well poke fun at the rest of your idiocy -- as it happens, HTML 4 is pretty close to being XML-conformant, and the W3C's now pushing XHTML which is fully conformant.

    Granted, a lot of people treat XML as another buzzword, the way that OOP once was. It's not a magic bullet -- it's just a guide to making cross-platform file formats, and it works pretty well for that.

  6. Re:Really? by phrantic · · Score: 2, Insightful

    If programming was easy everyone could/would do it.

    Yeah i am sure that someone can make a compiler than allows you to feed in pseudo code in clear English, written with crayons on the back of a ceral packet, but you are robbing Peter to pay Paul, you will have to take the hit somewhere....

    --
    --My sig is bigger than your sig--
  7. Re:Alas, XML by 6hill · · Score: 2, Insightful
    I have yet to see the xml come out of the dark ages, and until it decides to define exactly what it is or what it wants to be, I don't think it will.

    D'oh. What is the nature of the alphabet? To provide a common set of basic symbols from which to build the contents of a natural language.

    XML is a meta-language; it is specifically designed so that you, the user/code monkey/designer can define exactly what it is in terms of your projects. Unlike Java or other programming languages, XML is as free from in-built semantics as possible (i.e. "formless" as you put it) because it was meant to be that way! It's not a programming language, it's an alphabet.

    As for the uses of XML, I see a few things where it would be and is of great use:

    • storing representation-free data (i.e. same data could be imported into several programs that would then draw a graph, present a table, or devise a representational dance based on it)
    • an easily interpreted configuration/etc. language building blocks; readable by humans, operatable by machines, structured by definition
    • protocol languages in the lieu of SOAP

    And then there's the usual suspects: multichannel publishing, information sharing a la Amazon Associates, etc. XML bends to all these shapes, that's what makes it so beautiful.

  8. His idiom. by palad1 · · Score: 5, Insightful

    He's stating that he'd basically like others coders write more code the way he sees fit.
    [quote]
    while () {
    next if (XX);
    if (X|||X)
    { $divert = 'head'; }
    elsif (XX)
    { &proc_jpeg($1); }
    # and so on...
    }
    [/quote]

    Repeat after me: I will never leave parsing XML up to a regexp especially if my xml may contain CDATA and Comment sections. I will never...

    Unless you are 100% certain the file you are parsing is directly under your control, ie: no comments, no cdatas, params always in the same order, same indentation, same bloody encoding [pardon my french], well, you just will have to acces the data using some kind of DOM or abstract tree representation.

    I don't think he thinks no one uses XML, he seems to deplore the fact that some people don't get it at all and resort to heavy duty tools for trivial tasks [thus justifying his example above].

    Basically XML is quite simple, but that's not the matter, the problem is that XML bundles ACTUAL DATA, it's all about the complexity of those data, not the API used to access it [although writing a DOM implementation is a real pain]

  9. XML is not a programming language... by borgdows · · Score: 2, Insightful

    ... it's a convenient format to store and retrieve hierarchical information, that's all.

  10. XML is a MARKUP language by kahei · · Score: 3, Insightful

    ...and for doing generic markup in a relatively simple way, it's good.

    For storing arbitrary data, and use as a message format (as in SOAP), it's not so good because it has markup-like features, such as the distinction between attributes and elements and the distinction between text and element nodes. (The latter in particular is a huge pain, I wish people would agree to only use text nodes in leaf elements.)

    This is why XML parsers/generators, once they get into entities and DTDs and so on, become really a lot more complicated than they would need to be if XML just stored a tree of elements.

    However, it's the standard, so we might as well just shut up and use it.

    My opinions have no special importance but it *is* important to remember that XML is a markup format that is being used mostly for things other than markup.

    --
    Whence? Hence. Whither? Thither.
  11. Re:It's about tools, libraries by Anonymous Coward · · Score: 2, Insightful

    I don't buy it.

    There's two ways: DOM-like, where you read the file and have tree-like access. It's simple, and here the inefficiency complaint holds, very much so for large files.

    There's SAX-like, where you process events. Plain SAX is fast. It's somewhat inconvenient, but not that much worse than regexps. I've co-developed a large open source app using SAX: it works, it's efficient for large files, so SAX is certainly doable.

    But there's more: Tim Bray's blog message has created attention elsewhere, and on xml-dev one person introduced a Perl API based on SAX which lets you easily extract information from the stream. See:
    http://lists.xml.org/archives/xml-dev/200303/msg00 536.html

    So... I still say: Proper tools exist. Use them, be happy!

  12. similar problem with MathML by e**(i+pi)-1 · · Score: 5, Insightful

    It might be too late to correct some things in XML.
    Good about XML is, that whatever will emerge in the future,
    it will always be possible to convert old documents into any
    new form, using simple tools.

    There is a point with critics: Unlike Latex or HTML which
    can be written easily by hand, XML can become too bloated to
    be authored directly by humans.

    Similar problem with MathML:

    Latex: $x^5+3x-9=0$

    MathML:

    <mrow>
    <mrow>
    <msup>
    <mi>x</mi>
    <mn>5</mn>
    </msup>
    <mo>+</mo>
    <mrow>
    <mn>3</mn>
    <mo>&InvisibleTimes;</mo>
    <mi>x</mi>
    </mrow>
    <mo>-</mo>
    <mn>9</mn>
    </mrow>
    <mo>=</mo>
    <mn>0</mn>
    </mrow>

    You can write complicated formulas in Latex directly but it is
    almost impossible to do so in MathML, where one has to rely
    on tools to generate it (i.e. export it with Mathematica or
    TeX -> MathML converters). Wouldn't it be nice if browsers
    would understand a basic version of LateX? (That it is possible
    has been shown with IBM's texexplorer plugin).

    1. Re:similar problem with MathML by metasyntactic · · Score: 2, Insightful

      One thing that you seem to forget is that XML is useful for putting down the structure of the object in question, while leaving the presentation up to some third-party app.

      The XML snippet is indeed more verbose, however it carries much more semantic meaning than lour latex snippet which is just pure text.

      How is this useful? Well assume that I'm blind and I use applications that speak text to me. I'll end up with:

      "dollar-sign x carat 4 ..."

      Whereas with MathML my text-to-speach agent can actuall say:

      "x to the fifth plus 3 x minus nine equals zero".

      I write latex a lot, and it's a joy to write expressions that will end up looking great. However, I know that when I do so, I'm leaving the mathematical world for the one of fascinating typesetting.

      You say that XML can become to bloated to be edited by humans. On that point you are 100% correct. However, remember that one of the tenets of XML is that it should be possible, but not necessarily fun or easier, to hand code up input, as stated by the w3c . All that's required is that the format be human-legible and reasonably clear. If you find writing MathML too difficult (something which would not surprise me at all), then I suggest you work on a tool that converts Latex to MathML. Hell, I'll even help you with it. But given my experience with Latex I am extremely wary as I have no idea how that complicated beast works and I would imagine it would be quite difficult to infer a lot of the mathematical semantics from most Latex snippets.

  13. Re:He is right, I think. by kalidasa · · Score: 4, Insightful

    1. Doctype is necessary. Perhaps you've never tried handling a very complex text (a big DOCBOOK text or a big TEI text). You need to know what kind of text you're dealing with, and there's no way to come up with one universal solution for all kinds of texts. The only character entities needed are the handful of named entities that are part of the standard: &lt; &gt; &amp; etc. The rest can be handled by Unicode (including the PUA) and transcoding (if you are using a ISO 8859 encoding and you need a character outside that encoding, then you need to rethink the encoding you've chosen to use. UTF-8 is your friend). Entities really are good for more complex units (strings, etc.), rather than single characters. What character entities have to do with DOCTYPE is beyond me.

    2. True

    3. Standardize element IDs? Element IDs are part of the text, not part of the structure. They're simply a way of simplifying the difficulty of accessing random parts of text.

    I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

    So you're saying we need a meta-meta-language? The *MLs are a standard for arbitrary abstract data (and text) models (because not all texts are hierarchical like DBs).

    I think the problem here is that DB programmers (I'm excepting Bray from this) are overusing XML for very simple DB tasks that it wasn't intended for. If you're just doing a 40 field, 30,000 record flat DB, XML is NOT the solution. But it is the best solution for complex non-hierarchical data (i.e., books, etc.).

    As for Bray, I don't think he's saying XML itself (the markup standard alone) is too hard, that it should be abandoned. I think he's saying we haven't come up with simple enough ways of accessing XML data through APIs. But of course that wouldn't be a spicy enough meatball for the Taco.

  14. Re:Hahahah finallly something I know a lot about. by kalidasa · · Score: 5, Insightful

    If you're working with data that can be meaningfully represented with columns, you're using the wrong damned tool. XML is for complex structured data, which it does fine. It is not for tables. Don't blame the tool, blame the idiot who thought that XML was a good way to do DBs.

  15. Re:Too hard? by WPIDalamar · · Score: 3, Insightful

    If the full set of XML is too hard to use, then don't use the full set of features! I regularly write programs that read/write xml style documents, but with only the most basic xml functionality. The main benefit is so that other programs can also read & write these files. It's stupid to have a general purpose XML parser, when you only need a small subset of functionality.

  16. reusable loop structure by Anonymous Coward · · Score: 1, Insightful

    I see the article's gripe as another instance of a growingly-common problem: in all common languages, complicated loop structures aren't reusable. In the article, he wants to have a library (the XML parser) provide an efficient method for iteration over the tree structure in his XML file, and he rightly notices that the language doesn't support that very well.

    There are 2 basic ways to reuse a loop in languages such as Perl or Java or C. Way number one is to use callbacks: package up the loop body in a function and pass it into the library. As the author notes, this is syntactically annoying. It can also be inefficient: compilers usually can't optimize out the function call, so if the amount of work per iteration is small there can be a lot of overhead.

    Way number two is to use iterator-like syntax (a la Java iterators): provide a function which returns you the next object in line and then write a simple for-style loop. This is syntactically somewhat less annoying, but still subjects you to some overhead.

    The closest I've seen to a solution to this problem is compile-time computation such as templates in C++ or macros in LISP. These have not been particularly popular for people to use (probably because they're hard to use), and they're not available in many common languages. Does anyone know any better answers?

  17. Perl suggestion by skillet-thief · · Score: 2, Insightful

    I don't know what's going on in Perl 6, but it seems like Perl needs some kind of built-in way of running through an xml file by tags, in a way similar to the standard line by line file reading operator. Rather than grabbing a single line at a time, or having to slurp in the whole file before whacking it up, you should be able to pass a regex to the input operator so that it will stop when it gets to the end of a chunk of text defined by an end tag.

    Obviously, there are ways of getting around this by using a line-by-line approach, but I'm pretty sure that if such a thing existed and was easy to implement, it would get used a lot and would make Perl far more xml friendly.

    --

    Congratulations! Now we are the Evil Empire

  18. I agree, of course... by alispguru · · Score: 4, Insightful
    Given my .sig, how could I disagree?

    XML got one thing right over unadorned S-expressions - document packaging, specifically versioning and character-set labeling. XML inherited this from SGML, and it's one of the few things it took from there that was actually worth keeping.

    For a good laugh, read the Origin and Goals section of the XML spec. Of the ten goals for XML listed there:

    XML shall be straightforwardly usable over the Internet.

    XML shall support a wide variety of applications.

    XML shall be compatible with SGML.

    It shall be easy to write programs which process XML documents.

    The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

    XML documents should be human-legible and reasonably clear.

    The XML design should be prepared quickly.

    The design of XML shall be formal and concise.

    XML documents shall be easy to create.

    Terseness in XML markup is of minimal importance.

    I'd say two of them were met, but were bad ideas (SGML compatibility, terseness unimportant), and five of them were completely missed (ease of use, human legibility, quickly designed, formal and concise, ease of creation).

    Thirty per cent is a failing grade, folks...

    --

    To a Lisp hacker, XML is S-expressions in drag.
    1. Re:I agree, of course... by g4dget · · Score: 2, Insightful
      One other nice thing about XML is that closing tags are matched with ending tags. If you leave of a closing paren in Lisp, the parser will give you an error but it can't pinpoint where you screwed up. But an XML parser can spot which closing tag is missing, which means you don't have to hunt for it yourself.

      That would be a valid argument if XML were designed to be regularly input by humans. But XML is so cumbersome otherwise that almost all of it will be either machine generated or edited in special editors. And balancing closing tags is easy in Lisp if you use a special editor.

      Also, most versions of Lisp give you two separate, equivalen pairs of parens that you can use for checking. So, you write:

      [item (part-no 123456) (available 5) (stores 3 7 9)]

      And checks can be incorporated into the definition of specific constructs. So, you could have:

      (item (part-no 123456) (available 5) (stores 3 7 9) enditem)

      Or, you could make this an optional part of the syntax, allowing people to close a list starting with "x" with "/x", but not requiring it:

      (item (part-no 123456) (available 5) (stores 3 7 9) /item)

      Also, one of the major ideas of XML is to separate code from data, as opposed to Lisp where code and data are the same thing. Similar syntax, different philosophy, I guess.

      Lisp programs separate code from data all the time, just like well-written programs in any other language. It's just that on those occasions when you do have to deal with code, you can do so using the same syntax as you use for data. In different words, separating code from data does not require for code and data to have different syntax.

      The fact that several web standards use incompatible syntax (DTD, CSS, etc.) is actually a big problem. And the fact that almost no web code is written in XML syntax means that all those scripts are inaccessible to XML parsers and easy automatic analysis. Just imagine how nice it would be if the stuff inside the JavaScript tags could be analyzed and indexed with a bit more confidence.

  19. Re:Too hard? by khuber · · Score: 5, Insightful
    It's stupid to have a general purpose XML parser, when you only need a small subset of functionality.

    Yeah, the world needs more half-assed barely functioning and noncompliant XML parsers.

    Seriously I think it's much more robust to just use a normal XML parser. You get all the character set support. If someone hacked up their own parser at work I would reject it in a code review. There's no sense in maintaining your own XML parser these days; they are a commodity.

    -Kevin

  20. what the hell are you talkin` about? by Ender+Ryan · · Score: 2, Insightful
    It strives to excel at too many things at once, and becomes inefficient and complex as a result.

    I agree with this, to an extent. If you don't like/need all the fluff, don't use it. XML is only as complicated and inefficient as you want it to be.

    XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with.

    It's not just about writing parsers for a single program. What happens when you have several programs that read the same type of file? What if said file-type is somewhat complex. XML keeps things simpler and easier for these cases.

    Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form.

    What on earth are you talking about? YOU define the format of your XML data. If it doesn't need to be complicated, don't complicate it!

    XML document tree traversal = 10000x more complex than getting column data out of a ResultSet...

    Again, what? Keep the XML simple, and it will be just as easy.

    Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

    Then XML isn't the proper solution for your problem. Just because some dipshit tries to force XML to do things it isn't optimized for doesn't make XML any less useful.

    *snip* the rest of your comments comparing XML to relational databases.

    XML files are not high performance databases... Use the right tool for the job, and you will be much happier.

    It sounds to me like XML isn't your problem. Your problem is the "genius" at your company that needs to be beat over the head with a clue stick. If I were you, I'd be sure to beat him hard.

    --
    Sticking feathers up your butt does not make you a chicken - Tyler Durden
  21. C doesn't have it. by torpor · · Score: 2, Insightful

    Really.

    There's *still* nothing out there that can take my structs', parse them out to XML, then load them back again when needed, seamlessly.

    The embedded sphere - where XML is *USEFUL*, and where *C* is *ALSO USEFUL* - has no chance with XML right now.

    It's either libexpat and a monster callback module, or bust.

    --
    ; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
  22. XML is bad like Democracy is bad by Washizu · · Score: 4, Insightful

    XML is bad like Democracy is bad. It's just better than the alternatives.

    I had a problem at work when we switched from AutoCAD to Solidworks. Our manufacturing software couldn't read the new BOM files, which were Excel's .xls. Without ever looking at our system's BOM files before I wrote a program that read the .xls and built a proper XML BOM file our system could read. If our system wasn't using XML, who knows how long it would have taken me to figure out the intricacies of a proprietary file format.

    --
    OddManIn: A Game of guns and game theory.
  23. Re:It's about tools, libraries by Sique · · Score: 3, Insightful

    It is not about the number of elements. It is about the depth you can nestle them. Think about normal algebraic terms (a+b*5-(3*(7-4))). It's often very reasonable to have such terms in XML. But they are unparseable via regexp, because regexp doesn't have a stack and can't count parentheses. And don't reply with RPN (reversed polish notation) and argue that this were parentheses-free. It replaces the parentheses with a fixed number of operator argumentes. And regexp can't count arguments too. Regexp in fact can't count at all (or only until a predefined limit, which is mathematically equivalent).

    --
    .sig: Sique *sigh*
  24. Re:Hahahah finallly something I know a lot about. by Eric+Savage · · Score: 2, Insightful

    XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with.

    This is true if you are parsing your own data, but what about parsing third party data? I did that for years and every day was full of dealing with corruption, misformatted files, or formats that varied from the documentation because some new guy was making them on the other end.

    True, these problems can happen with XML but they are much easier to spot. Send me a file and a DTD/Schema and I can tell you in a second if any future files are bad.

    My view of XML is that what it does really well is transfer data. As far as storing data, well I only consider it when a database isn't available.

    --

    This is not the greatest sig in the world, this is just a tribute.
  25. RFC822 by semanticgap · · Score: 2, Insightful

    Before XML there was (and still is) RFC822 which describes how headers are formatted in e-mail, HTTP and a slew of other protocols.

    I've been down the route where I tried to use XML where something as simple as "key: value" would do, and before I knew it, my program became a bloat relying on third-party XML libs, the config files were only marginally human-readable and a lot of time was wasted thinking about virtues of DOM vs SAX. In the end I learned that using XML for sake of XML isn't worth it.

    I think XML is OK if used appropriately - for example I think XML is perfect for something like storing word processing documents. But the idea that every config file and every bit of network traffic should be XML is stupid IMHO.

  26. Re:Too hard? by arkanes · · Score: 5, Insightful

    You know, using VB is just code reuse. It's just reusing more code than you're use to. It's got some serious strengths. The app you write in a couple days the VB programmer can toss out after lunch. How about data aware controls? Those are a pain in the ass in C/C++, although you can make it easier by using third party components. Like ActiveX controls. Which are a pain in C/C++, but are painless in VB. On the other hand, your code won't be small, and you'll be linking to a massive runtime, and you're using a language who's syntax makes me feel dirty.
    Oh, and if you're making web-based apps, wtf are you using C for?

  27. Maybe it's just me... by Gibble · · Score: 1, Insightful

    But wasn't the entire point of XML for data exchange. You use XSLT to transform incoming data into the format your software wants, your software doesn't NEED to be able to read an XML format, but it's alot easier to knock off an XSLT file to transform data coming in to work with your app, than coding your app to handle more than one file type.

    You create another XSLT for outbound data to transform your proprietary format to XML so it can be consumed by another application, company, etc.

    XML isn't made to be used as the be all, end all of file formats, it's made to be a simple, yet robust, generic format for transporting data between disparate systems running on any OS, in any programming language.

    The other advantage is XML is self describing, I can glance at an XML file and see what all the data is and write an XSLT to get what I need out of the XML for my application alot easier than glancing at a flat text file for the same information.

    And considering there is an XML implementation for nearly every language out there that can be had for free why are people bothering to write there own parsers? What a waste of time.

    --
    Gibble: Descriptive of an emotional state in which one's mind is scrabbling for some purchase on reality
  28. Re:Too hard? by Billly+Gates · · Score: 2, Insightful

    Sounds like a similiar argument I hear for c++.

    I do not know any programmer who uses all of the features of ansi. This may have something to do with the fact that no c++ compiler is actually %100 ansi compliant. There are just so many different kinds of templates that most programmers do not use most of them because less experienced programmers will not be able to read the code.

    I never got into the xml hype. Soap is cool but xml otherwise is just an ascii text file with tags. I have not written alot of xml programs but sgml is fine for documents and is easier to read. Websites that need alot of information to be displayed can be gathered from a databse.

  29. Re:Too hard? by EriondII · · Score: 2, Insightful

    Signing up for unemployment? Hardly! I know of many Industries that rely exclusively on VB. Fortune 500 companies including the one I work for. We are currently in the process of writing an ERP in VB, and with phase 1 rolled out, no such issues exist. This is a complete Sales Order Entry system that connects with and replaces old COBOL and Progress legacy systems. Speed is not even an issue and I would wager our code base including COM+ components and XML/XSL Views is more robust and useful than some shops C libraries.

    And VB is not the only langauge I know or program in. I use Java, C, COBOL, and Progress(ever heard of it? Thought not.) for many other tasks within the organization. It's just a matter of using the best tool for the best job. I try not to be to tunnel visioned on one langauge and figure out how to make the best use of each.

  30. It takes more than a set of tools by apankrat · · Score: 3, Insightful

    > However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

    It takes more than a set of good tools for a technology to become 'a widespread success'. A clear justification why XML is better than existing standard marshalling techniques would be a good starting point. ASN.1 DER, simple container LSB serialization and others.

    I'm probably beating the dead horse here but XML has at least two properties rendering it useless for any performance-aware application:

    (a) unlike, say, TLV it does not allow effeciently skipping parts of the data you dont need or aware of. I.e. in order to skip the section, you need to read and parse it first.

    (b) XML's is a lazy man ASN.1 DER. It's all there in much more compact and elegant form. The only 'drawback' in the eyes of XML crowd is that it's binary. Sure, everyone knows that encoding numbers as strings is a definite way to improve upon the performance and scalability of everything from network protocols (SOAP, BXXP, UPNP) to a basic document processing. Right on.

    The bottom line is that XML has probably reached its acceptance limits. Whoever accepted XML for granted or stuck with it or is not willing to learn about alternatives will keep on whining about tools being sucky. That's life, but OTOH it's only the small part of it.

    --
    3.243F6A8885A308D313
  31. Re:Too hard? by EastCoastSurfer · · Score: 4, Insightful

    The market for *real* programmers has been destroyed by corporate America.

    I think that the *real* programmers that you have talked about all write libraries now. These guys all have jobs at the tool makers like MS, Apple, etc...

    Businesses in general don't want (and generally don't need) *real* programmers, they want software engineers. They want someone who can sit down, work out some requirements and provide a timely, cost effective solution. It has taken me some time to fully realize this, but the right technical solution is not always the right business solution. The PHB could really care less if the app is written in VB, C, Java, as long as the application works to within their parameters. It is those parameters that are specified by the people paying for the software that will direct the language/technology you ultimately use.

  32. Re:Hahahah finallly something I know a lot about. by Arandir · · Score: 2, Insightful

    Executive Summary: XML is not RDMS which makes it damn hard using this XML screwdriver to hammer in RDMS nails.

    Your main problem is that you think a tree should be a table. I think you need to get off of your RDBMS religion and realize that that there's a whole world of data our there that perfectly capable of not being shoved in a table before it can be used.

    --
    A Government Is a Body of People, Usually Notably Ungoverned
  33. plagiarism? by pwarf · · Score: 2, Insightful

    I am not the author of the post you responded to, but I felt compelled to comment.

    Plagiarism, in the most commonly used sense, is taking credit for someone else's words or ideas. Since he posted as an anonymous coward, he is unable to take credit. Therefore, he didn't commit plagiarism in the usual sense.

    He deserves the lesser charge of failure to cite. As long as we are throwing out accusations, I would accuse you of libel http://dictionary.reference.com/search?q=libel
    , but since he's an AC, I can't claim that it damages his reputation. Hmm, never mind. :)

  34. Re:Too hard? by dwsauder · · Score: 2, Insightful
    This is the lamest story I've ever heard on Slashdot. I almost left for good after reading this. If the next week's worth of news doesn't get any less lame, I probably will.

    Slashdot, don't be fucking lame. This is news for *nerds*, not for simps and wannabees. XML too hard? Then you shouldn't be a programmer cause that's about as easy as it gets unless you're just a hobbyist.

    Somehow, I think you don't understand what the story is about. Something can be easy, but for lazy programmers (and if you understand Larry Wall's Perl culture, then you know that laziness in a programmer is a virtue) it ought to be simpler so that we can enjoy our work more. There are some programming techniques that are just too repetitive, and doing them over and over and over can make a programmer go crazy, no matter how easy it is. Well, that's the way it is with XML. Sure, XML is as easy as it gets. But if you have write so much repetitive code, you look for ways to automate it all. A major point of Tim's complaint about XML is that apparently no one has done anything to make programming with XML less boring and repetitive.

  35. Re:xml by cicho · · Score: 2, Insightful

    Moderators on crack, the parent is not a troll, he's just about right.

    Read any introductory article on XML, or the first chapter of a book - it's so plain and simple and inviting and looks like a great idea. By page 50 of the book you're crawling through a dense pile of industrial trash. A book on XML I bought lists over thirty classes in OpenXML implementation - over THIRTY classes, that's hundres of methods; do I want to to dig into this just to read and write a simple file of records? Where simple and robust alternatives exist? Hell, no.

    --
    "Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan