Slashdot Mirror


XML Co-Creator says XML Is Too Hard For Programmers

orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."

108 of 562 comments (clear)

  1. Too hard? by Ledskof · · Score: 5, Funny

    Sounds like visual basic programmers are complaining or something.

    --
    This is my sig. The post is over.
    1. Re:Too hard? by Omkar · · Score: 2, Informative

      blah blah blah...right tool for the right task...blah blah blah.
      Seriously, don't knock VB until you need to code a quick dbaccess (or other simple) app in a couple of days for internal use. Easy languages have their places!

    2. Re:Too hard? by WPIDalamar · · Score: 3, Insightful

      If the full set of XML is too hard to use, then don't use the full set of features! I regularly write programs that read/write xml style documents, but with only the most basic xml functionality. The main benefit is so that other programs can also read & write these files. It's stupid to have a general purpose XML parser, when you only need a small subset of functionality.

    3. Re:Too hard? by khuber · · Score: 5, Insightful
      It's stupid to have a general purpose XML parser, when you only need a small subset of functionality.

      Yeah, the world needs more half-assed barely functioning and noncompliant XML parsers.

      Seriously I think it's much more robust to just use a normal XML parser. You get all the character set support. If someone hacked up their own parser at work I would reject it in a code review. There's no sense in maintaining your own XML parser these days; they are a commodity.

      -Kevin

    4. Re:Too hard? by Pxtl · · Score: 2, Interesting

      Whoever modded this troll is a jingoistic zealot. The poster is just saying that VB, for all its faults, is good for database RAD. Which many people would agree with.

    5. Re:Too hard? by arkanes · · Score: 5, Insightful

      You know, using VB is just code reuse. It's just reusing more code than you're use to. It's got some serious strengths. The app you write in a couple days the VB programmer can toss out after lunch. How about data aware controls? Those are a pain in the ass in C/C++, although you can make it easier by using third party components. Like ActiveX controls. Which are a pain in C/C++, but are painless in VB. On the other hand, your code won't be small, and you'll be linking to a massive runtime, and you're using a language who's syntax makes me feel dirty.
      Oh, and if you're making web-based apps, wtf are you using C for?

    6. Re:Too hard? by Billly+Gates · · Score: 2, Insightful

      Sounds like a similiar argument I hear for c++.

      I do not know any programmer who uses all of the features of ansi. This may have something to do with the fact that no c++ compiler is actually %100 ansi compliant. There are just so many different kinds of templates that most programmers do not use most of them because less experienced programmers will not be able to read the code.

      I never got into the xml hype. Soap is cool but xml otherwise is just an ascii text file with tags. I have not written alot of xml programs but sgml is fine for documents and is easier to read. Websites that need alot of information to be displayed can be gathered from a databse.

    7. Re:Too hard? by Evil+Grinn · · Score: 2, Flamebait

      My shop codes exclusively in C and I can even create rather complex apps in a few days because:

      #1 I know what I'm doing, and..

      #2 It's called libraries....be it STL, MFC, MyStack.h or whatever.


      STL and MFC are C++, not C. Presumably you know the difference between C and C++, since you "know what you're doing". I must assume then that you are trying to gloss over the distinction between C and C++ so as not to further confuse the VB programmers among us.

    8. Re:Too hard? by EriondII · · Score: 2, Insightful

      Signing up for unemployment? Hardly! I know of many Industries that rely exclusively on VB. Fortune 500 companies including the one I work for. We are currently in the process of writing an ERP in VB, and with phase 1 rolled out, no such issues exist. This is a complete Sales Order Entry system that connects with and replaces old COBOL and Progress legacy systems. Speed is not even an issue and I would wager our code base including COM+ components and XML/XSL Views is more robust and useful than some shops C libraries.

      And VB is not the only langauge I know or program in. I use Java, C, COBOL, and Progress(ever heard of it? Thought not.) for many other tasks within the organization. It's just a matter of using the best tool for the best job. I try not to be to tunnel visioned on one langauge and figure out how to make the best use of each.

    9. Re:Too hard? by EastCoastSurfer · · Score: 4, Insightful

      The market for *real* programmers has been destroyed by corporate America.

      I think that the *real* programmers that you have talked about all write libraries now. These guys all have jobs at the tool makers like MS, Apple, etc...

      Businesses in general don't want (and generally don't need) *real* programmers, they want software engineers. They want someone who can sit down, work out some requirements and provide a timely, cost effective solution. It has taken me some time to fully realize this, but the right technical solution is not always the right business solution. The PHB could really care less if the app is written in VB, C, Java, as long as the application works to within their parameters. It is those parameters that are specified by the people paying for the software that will direct the language/technology you ultimately use.

    10. Re:Too hard? by whereiswaldo · · Score: 2, Interesting


      This is the lamest story I've ever heard on Slashdot. I almost left for good after reading this. If the next week's worth of news doesn't get any less lame, I probably will.

      Slashdot, don't be fucking lame. This is news for *nerds*, not for simps and wannabees. XML too hard? Then you shouldn't be a programmer cause that's about as easy as it gets unless you're just a hobbyist.

    11. Re:Too hard? by kryonD · · Score: 2, Interesting

      Obviously all you know is C. It must be some kind of "geek pride" thing.

      I've been programming for 16 years...here is a short list of the languages I have used in real-world (i.e. I got paid) applications:

      C, C++, COBOL, VB (eventually rewritten in C when it hit the scalability wall), Intel x86 ASM, Motorolla 6809 ASM, and Motorolla 6502 ASM.

      The list of languages I have worked with either in private, or an academic setting is quite large and are not listed above because I either wouldn't trust them for real work, or my employer wouldn't trust them.

      ADO and OLEDB...Oracle

      Proprietary. Proprietary. Proprietary, but at least somewhat portable; however, waaayyy too expensive unless you are dealing with massive amounts of data/users or are coding for government/businesses that require namebrand stuff.

      some people ... write virii ... the REST ... write groupware.

      This is true. However, I have yet to run into anything that I couldn't replicate in C/C++ using RFC standards. Some of the more nifty features of Exchange would need some reverse engineering, but I've never had the need to provide them.

      why the hell are you still writing in C? I thought Perl, Java, and PHP4 were the gold standard for web apps... Aren't you afraid of buffer overruns??? Lord knows half the system calls in C are vulnerable...

      Don't get me started on the gross mis-management job Sun has done on JAVA. It has never lived up to Sun's promise of being platform independant. Security is another problem depending on whether you are talking about client side, or server side. What happens if you have a customer whose security policy disables JAVA on the browsers? For server side, I challenge you to name something you can do in JAVA that you can't do just as easily in C/C++. The language has its advantages, but most of them can be reproduced in other languages with minimal effort.

      Perl and PHP are very nice for simple straight forward page production. However, I code for US DOD and the security issues with both of those as well as a general distrust of anything open source has prevented their use on a general basis. I have seen some stuff done for DOD in those languages, but it was either in violation of policy, or contracted out and not on a .mil server. Additionally, they are interpreted languages. If you need to pull 4 million items into memory, consolidate the duplicates, calculate usage stats over multiple time periods, then filter out those that don't meet a usage to property hit list, Scripted languages are either way too slow, or simply incapable of doing that kind of complex filtering on a large quantity of data. The above process can be done in about 400 lines of C code, most of which is copy and pasted loops and if statements and it's fast.

      Buffer overruns are easy....don't rely on the server to feed your script data. Write the code to pull the data from the server and set a cutoff limit where extra data is ignored. Write a simple filter command to break attempts at embedding malicious SQL commands in data and your done. You can do this in any language, but yet you still occasionally see AIVAs about buffer overflow vulnerabilies in everything under the sun.

      System calls? Don't know what to tell you there. Been coding web based stuff for two years in C and never had to make one. Or are you referring to anything that handles I/O as a system call? If so, read your input one character at a time and COUNT them...stop when you hit your buffer's pre-defined limit. If you do hit a limit, have the app make a log entry. Either your code has failed to expect a wierd user need that requires sending large amounts of data, or someone is trying to attack your script....the latter is far more likely. I'd rather have a random user complaint once in a blue moon for lack of flexibility, than all my users pissed because someone rooted the box and defaced the web site.

      --
      I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
    12. Re:Too hard? by dwsauder · · Score: 2, Insightful
      This is the lamest story I've ever heard on Slashdot. I almost left for good after reading this. If the next week's worth of news doesn't get any less lame, I probably will.

      Slashdot, don't be fucking lame. This is news for *nerds*, not for simps and wannabees. XML too hard? Then you shouldn't be a programmer cause that's about as easy as it gets unless you're just a hobbyist.

      Somehow, I think you don't understand what the story is about. Something can be easy, but for lazy programmers (and if you understand Larry Wall's Perl culture, then you know that laziness in a programmer is a virtue) it ought to be simpler so that we can enjoy our work more. There are some programming techniques that are just too repetitive, and doing them over and over and over can make a programmer go crazy, no matter how easy it is. Well, that's the way it is with XML. Sure, XML is as easy as it gets. But if you have write so much repetitive code, you look for ways to automate it all. A major point of Tim's complaint about XML is that apparently no one has done anything to make programming with XML less boring and repetitive.

    13. Re:Too hard? by Tet · · Score: 2, Informative
      Motorolla 6809 ASM, and Motorolla 6502 ASM.

      Of course, while the 6809 was indeed a Motorola chip, the 6502 was made by MOS (a company started by former Motorola employees). The initial 6501 was pin compatible with the 6800, and Motorola sued, resulting in the 6502, which had a different pin layout.

      Other than that, I agree with your comments.

      --
      "The invisible and the non-existent look very much alike." -- Delos B. McKown
  2. Hah. by termos · · Score: 3, Funny

    They should only be glad not to be coding cobol, intercal or befunge!

    --
    Note to self: get smarter troll to guard door.
    1. Re:Hah. by Surak · · Score: 3, Funny

      It's a *bit* wordy?

      Son, there are professional Cobol programmers who HAVE NO FINGERS LEFT.

      Join the Cure. We're trying to raise $5 billion to cure Cobol Fingers through transplants.

      Call 1-800-I-REALLY-REALLY-USED-TO-BE-A-COBOL-PROGRAMME R

      Today!

  3. Really? by leecho · · Score: 3, Interesting

    Well, programming *is* a hard task, and simplifying it is about building layers and layers of better abstractions to machine code and binary data.

    Without XML, what would you normally do? Create a flat text file and read it using whatever syntax you'll like that day. I agree XML is ugly as hell to type in manually, but at least it's a standard, and every programming language in use today can handle it in a standard way - DOM, SAX, whatever.

    1. Re:Really? by Anonymous Coward · · Score: 5, Funny
      To paraphrase:

      XML is like:

      • * SGML without configurability
        * HTML without forgivingness
        * LISP without functions
        * CSV without flatness
        * PDF without Acrobat
        * ASN.1 without binary encodings
        * EDI without commercial semantics
        * RTF without word-processing semantics
        * CORBA without tight coupling
        * ZIP without compression or packaging
        * FLASH without the multimedia
        * A database without a DBMS or DDL or DML or SQL or a formal model
        * A MIME header which does not evaporate
        * Morse code with more characters
        * Unicode with more control characters
        * A mean spoilsport, depriving programmers the fun of inventing their own syntaxes during work hours
        * The first step in Mao's journey of a thousand miles
        * The intersection of James Clark and Oracle
        * The common ground between Simon St. L and Henry Thomson
        * The secret love child of Uche and Elliotte
        * Microsoft's secret weapon against Sun's Open Office
        * Sun's secret weapon against Microsoft's Office
        * The town bicycle
    2. Re:Really? by phrantic · · Score: 2, Insightful

      If programming was easy everyone could/would do it.

      Yeah i am sure that someone can make a compiler than allows you to feed in pseudo code in clear English, written with crayons on the back of a ceral packet, but you are robbing Peter to pay Paul, you will have to take the hit somewhere....

      --
      --My sig is bigger than your sig--
    3. Re:Really? by Anonymous Coward · · Score: 2, Interesting

      While Lisp functions are overkill for a dataformat, lisp syntax (sexps) are not, and are more mature, simpler, and clearer than XML.

      Even for C programs, I tend to use Lisp sexps as my persistent file format.
      A simple Lisp parser is smaller and faster than an XML parser, for no loss of expressivity.

    4. Re:Really? by stand · · Score: 4, Informative

      It is customary to attribute quotations when you publish them. Otherwise it's called plagarism. Credit where credit is due and all that.

      Unless, of course this particular AC is Rick Jelliffe, in which case I apologize.

      --
      Four fifths of all our troubles in this life would disappear if we would just sit down and keep still. -C. Coolidge
  4. A good point by shish · · Score: 3, Insightful

    Sure it sucks, but it's a *standard* that everyone can use, and there are many libraries for it so you don't need to write your own parsing code

    --
    I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
    1. Re:A good point by jilles · · Score: 3, Insightful

      Not only is it a standard, it appears to be the only widely accepted standard. Not using it currently boils down to going back to the hacked together, generally incompatible data formats of the past. Reinventing the wheel still is a popular way of passing time but it has never been very productive.

      People often fail to see the point of widely adopted standards but the bottom line is that it makes it easier to reuse functionality that confirms to the standard. There are now both SAX and DOM based parsers for most common programming languages. Basically if you spend some time figuring out how these APIs work you can work with XML from almost any language.

      That is not the problem. What is a problem is that everybody is introducing their own xml based languages and in many cases forget to publish the appropriate xml schema/dtd.

      Now the guy who is complaining here is a perl programmer who has to process data that is passed to him in XML form. His point is that it easier for him to throw together a bunch of regular expressions to do his thing than it is to use some off the shelf validating parser with a generic DOM/SAX based API. Good for him that is job is so simple that a bunch of regular expressions do the trick for him. I'd hate to maintain his code though and I suspect he doesn't have much reuse beyond the odd copy paste.

      --

      Jilles
    2. Re:A good point by EvilTwinSkippy · · Score: 3, Insightful
      Amen, and amen.

      Yes standards suck. But the suck in a way that is consistant and allows other sucky things to talk to other sucky things.

      I'll bet the 802.11b is a really crappy standards. But as long as I can pick up interchangable devices for $50 at the local computer store I'll live in ignorant bliss.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    3. Re:A good point by jilles · · Score: 2, Interesting

      Assuming that these data streams have something in common you'd probably spend a week or so developing a generic, maintainable solution using e.g. SAX and reuse that in each particular case. The adhoc solution of using regular expressions probably saves you time on the short term, but on the long term you'll probably keep reinventing the wheel.

      However, this is all beside the point since we've now established that there's nothing wrong with XML but that it's just the tools to manipulate it which are still lacking in certain ways. I'd be the first to agree that the SAX and DOM APIs are a bit overkill for some situations. However, concluding from that that XML is not a good solution goes too far IMHO.

      --

      Jilles
  5. It's about tools, libraries by Anonymous Coward · · Score: 5, Interesting

    Well, first he chose a bad tool (Perl regexp) for XML processing, and then complains about his tools being insufficient.

    Using Perl regexps to parse XML is silly, because there's too much variability (e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling. You can do it, of course, but it quickly gets messy.

    There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.

    1. Re:It's about tools, libraries by kinnell · · Score: 4, Informative

      As he say in the article, the reason he uses Perl regexp is that the tools/libraries have to read the entire file. If this is a long stream, it's grossly inefficient - you have to load the entire thing into a massive tree structure in memory. If the job can be done serially with regexps without using a noticeable amount of memory or time, then it is often better. This is the point of the article - there is a choice between using a method which is often grossly innefficient for real world problems (XML libs) and a fast but messy method (Perl regexp). Neither of these is really satisfactory, hence the complaint.

      --
      If I seem short sighted, it is because I stand on the shoulders of midgets
    2. Re:It's about tools, libraries by PigleT · · Score: 2, Interesting

      I agree that it's about tools and libraries. And this is what I think about them, too.

      At work, I brush up against XML occasionally, mostly for documentation or data-resultset purposes. In my own time, I use it in my photo gallery - result-sets from database queries get converted to XML and then spat out through XSLT in Sablotron, straight to web. For all the hoops it goes through, it's actually still quite nippy.

      However, I also dislike it intensly.

      I've written a blog-like system-news announcement board using a Ruby CGI against postgresql as a backend. I can pull back a result-set - a simple table-thing with each row being a text announcment, half a dozen fields (when posted, by whom, etc). And I wanted to output this in HTML form for the web, in plain-text to send to a user who wanted it via email every day, and in s-exp form for my own gratification.
      However, the first problem you run into is the formatting. A textarea in an HTML form gives no line-wrapping (wanted for plaintext output, but only in specific fields) and embeds ^M characters everywhere. When the output is HTML, those ^Ms want to become br tags. When the output is plaintext or sexp, they want to become \n. Simple, if ONLY there were a way of doing either elementary reformatting or search-n-replace in XSLT. There is, but s/// is about 10 lines' worth, if my googling is to be believed. That makes it non-optimal for one of its primary uses: making transformations on big blocks of text-based data, and it can't even edit within a node correctly? Pathetic.
      Why shouldn't I just write 3 output methods in my Ruby CGI script that take the result-set directly to text, HTML or sexp formats, with the power of ruby to do a #gsub("^M", "\n") on just the fields I want, in a tiny few extra characters of code?

      Now to tackle what you've said:

      > Using Perl regexps to parse XML is silly

      No, it's not. Perl regexps are a highly featureful, pre-existing, code.

      > e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling.

      These things are not a problem. You can easily match an attribute occurring, as it does, within a n opening-tag, and pull out both the name and the contents. Using that to set a variable of given name in your program - a highly important part, given that XML is a data-transfer format and it's the internal representation afterwards that is its whole raison-d'etre - is trivial. Thus, perl wins.
      Multi-line matching is explicitly catered-for in perl, with /m or /s on the end of the regexp.

      > There's a number of tools and libraries

      Indeed there are. And you know what? When I've got a small paragraph ( characters, I dunno.)
      In short, "programmed text" won the day for me.

      --
      ~Tim
      --
      .|` Clouds cross the black moonlight,
      Rushing on down to the circle of the turn
    3. Re:It's about tools, libraries by Sique · · Score: 5, Interesting

      No. It is not. It is about basic computer science.

      XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction. What he is complaining about is exactly this: Lots of parsing to get a simple datum.

      With regexp your parsing is much faster, because you can concentrate on substrings, you can parse them without using a stack, you can use them in stream context. But regexp are Regular Expressions (Chomsky Type 3 grammar), so they are in fact just a subset of XML and not able to parse XML completely.

      One of the links in the article points to another rant, where the author wants some regulations for a limited XML. Badly enough the ideas he is proposing are in fact context sensitive and such they are Chomsky Type 1 (context sensitive grammar) and a superset of XML instead of a simplified subset. Someone remembers the Early algorithm with something that can be described as a multi dimensional stack?

      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

      --
      .sig: Sique *sigh*
    4. Re:It's about tools, libraries by Anonymous Coward · · Score: 2, Insightful

      I don't buy it.

      There's two ways: DOM-like, where you read the file and have tree-like access. It's simple, and here the inefficiency complaint holds, very much so for large files.

      There's SAX-like, where you process events. Plain SAX is fast. It's somewhat inconvenient, but not that much worse than regexps. I've co-developed a large open source app using SAX: it works, it's efficient for large files, so SAX is certainly doable.

      But there's more: Tim Bray's blog message has created attention elsewhere, and on xml-dev one person introduced a Perl API based on SAX which lets you easily extract information from the stream. See:
      http://lists.xml.org/archives/xml-dev/200303/msg00 536.html

      So... I still say: Proper tools exist. Use them, be happy!

    5. Re:It's about tools, libraries by PigleT · · Score: 4, Informative

      I agree that it's about tools and libraries. And this is what I think about them, too.

      At work, I brush up against XML occasionally, mostly for documentation or data-resultset purposes. In my own time, I use it in my photo
      gallery - result-sets from database queries get converted to XML and then spat out through XSLT in Sablotron, straight to web. For all the hoops it goes through, it's actually still quite nippy.

      However, I also dislike it intensly.

      I've written a blog-like system-news announcement board using a Ruby CGI against postgresql as a backend. I can pull back a result-set - a
      simple table-thing with each row being a text announcment, half a dozen fields (when posted, by whom, etc). And I wanted to output this in HTML form for the web, in plain-text to send to a user who wanted it via email every day, and in s-exp form for my own gratification.
      However, the first problem you run into is the formatting. A textarea in an HTML form gives no line-wrapping (wanted for plaintext output,
      but only in specific fields) and embeds ^M characters everywhere. When the output is HTML, those ^Ms want to become br tags. When the output
      is plaintext or sexp, they want to become \n. Simple, if ONLY there were a way of doing either elementary reformatting or search-n-replace in XSLT. There is, but s/// is about 10 lines' worth, if my googling is to be believed. That makes it non-optimal for one of its primary uses: making transformations on big blocks of text-based data, and it can't even edit within a node correctly? Pathetic.
      Why shouldn't I just write 3 output methods in my Ruby CGI script that take the result-set directly to text, HTML or sexp formats, with the power of
      ruby to do a #gsub("^M", "\n") on just the fields I want, in a tiny few extra characters of code?

      Now to tackle what you've said:

      "Using Perl regexps to parse XML is silly"

      No, it's not. Perl regexps are a highly featureful, pre-existing, code. I'd be surprised if libxml *didn't* use regexps in its XML parsers, frankly.

      "e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling."

      These things are not a problem. You can easily match an attribute occurring, as it does, within a n opening-tag, and pull out both the name and the contents. Using that to set a variable of given name in your program - a highly important part, given that XML is a data-transfer format and it's the internal representation afterwards
      that is its whole raison-d'etre - is trivial. Thus, perl wins.
      Multi-line matching is explicitly catered-for in perl, with /m or /s on the end of the regexp.

      "There's a number of tools and libraries "...

      Indeed there are. And you know what? When I've got a small paragraph (under 10 lines) of data that I want to transfer from A to B, the last thing I'm going to do is invoke a 600Kb library so I can use a pompous and fashionable set of functions to produce "XML", when perl/ruby/sh have all had
      perfectly valid "print" or "echo" commands for the past decade or more. If the output is valid XML, you've no reason to diss the method used to produce it.

      As a final example, I've also had a few documents to be writing, of my own, at work. I've had two options: either sit down, set up emacs to
      handle XML sources smoothly so I can open and close tags at the push of a key-chord the way I *want* to create the stuff, or program a
      small sub-language. Lisp, in the form of _librep_, won the day, with a few small functions to produce strings based on the input. And guess what? Because this is a programming language rather than a mere text-transforming language, I made a CGI out of it, and can embed programs within my "data", too, without feeling the urge to write to
      the W3C about it.
      Editing it is an absolute dream - opening and closing paragraphs of text is a piece of cake and fits the way I want to work. (Maybe you like looking at spikey angle-bracket characters, I
      dunno.)
      In short, "programmed text" won the day for me.

      --
      ~Tim
      --
      .|` Clouds cross the black moonlight,
      Rushing on down to the circle of the turn
    6. Re:It's about tools, libraries by Boiotos · · Score: 2, Interesting
      Shouldn't SAX-based tools *not* have to load the entire thing into memory?

      Bray's paper appears to express a strong preference for an XML that would work well with ?standard regex tools. In it he says, "If I use any of the perl+XML machinery, it wants me either to let it read the whole thing and build a structure in memory, or go to a callback interface." And then it adds that callback "is sufficiently non-idiomatic and awkward that I'd rather just live in regexp-land."

      This, in turn, seems to be based on an article linked to in Bray and advocating the same thing.

      It seems to me that to convince the larger world that this is necessary, some other options would have to be excluded. Aren't regexs of some sort going to be in v. 2 of XSLT? None of its successful implementations require loading the document into memory, and it nicely magics away the namespace kerfuffle that Gregorio's examples illustrate.

      What I took away from the article was considerable amazement that one of the markup luminaries uses such low-level tools to process XML.

    7. Re:It's about tools, libraries by Sique · · Score: 3, Insightful

      It is not about the number of elements. It is about the depth you can nestle them. Think about normal algebraic terms (a+b*5-(3*(7-4))). It's often very reasonable to have such terms in XML. But they are unparseable via regexp, because regexp doesn't have a stack and can't count parentheses. And don't reply with RPN (reversed polish notation) and argue that this were parentheses-free. It replaces the parentheses with a fixed number of operator argumentes. And regexp can't count arguments too. Regexp in fact can't count at all (or only until a predefined limit, which is mathematically equivalent).

      --
      .sig: Sique *sigh*
    8. Re:It's about tools, libraries by Len · · Score: 3, Informative
      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

      You're right, but the problem is that "deal with it" may equate to "don't use XML" in a lot of cases, which makes XML less of the universal data representation language than it wants to be.

      When the parser uses a lot of memory (like DOM reading the entire input into a tree) it becomes inefficient, sometimes infeasible, to handle large input documents. That's one of the specific problems mentioned by Tim Bray and others.

    9. Re:It's about tools, libraries by Sique · · Score: 4, Interesting

      No, I am suggesting, that in general you have to use a stack machine. Surely you can use degenerated trees instead of fully balanced trees to store your data. And a concatenation of elements is a regular expression (and a degenerated tree). But then you are already making assumptions about the data you get. But with such limiting assumptions you can easily streamline your code. But you are loosing the full power of XML on the way. And you need a grammar that makes sure you don't mix terminals and nonterminals.

      It starts out already if you are using escape characters to mark nonterminals and escape those characters with itself to mark them terminal. Those markings are still regular, but you loose already some speed ups. For instance \\ matches \\" and \\\", but one means just \ and the end of the string, and the other one means \" and the string continues. The only way to stay out of the mess is to make sure you are using an only left bound parser, first parse for all escape characters and then for the nonterminals, which makes your parser already a (local) 2-pass-parser.

      --
      .sig: Sique *sigh*
    10. Re:It's about tools, libraries by Ed+Avis · · Score: 2, Interesting

      There are two more methods: interfaces like SAX where you read individual tokens, and callback interfaces like Perl's XML::Twig where you can efficiently scan the whole file and only construct in-memory trees for the parts you're interested in.

      The best method might be a lazy programming language where you can say

      tree.a[4].b[6].contents

      and only when this expression is evaluated will the necessary bit of the tree be parsed.

      --
      -- Ed Avis ed@membled.com
    11. Re:It's about tools, libraries by protonman · · Score: 2, Interesting

      I know, but I thought you'd get that with a finite number of elements, you can't nest them infinitely... (I'm counting tags as "elements" here, a bit sloppy I admit).

      My point was that in *practical* XML you simply don't have stuff like [a][a][a][a]... ...[/a][/a][/a][/a].

      As long as you want to parse a FINITE number of terms, you can do that with regexps.

      If your example string with parentheses is the ONLY one you want to parse, I can do that (in sed/perl-like syntax) like this:

      \(a+b\*5-\(3\*\(7-4\)\)\)

      If you want to parse all algebraic terms like in your example with a length less than 5 (!) you can start with this...

      (\w|\d\)
      \((\w|\d\)\)

      (to get 9 and (0) and (a) i.e.)

      and

      \((\w|\d) [+*-\] (\w|\d)\)

      to get (9+b),(a*b) etc.. etc..

      I know, it's gonna be a LONG list, but since the number of possibilities is limited, it's not infinite! (and obviously, I can't use * on the parentheses!)

      A problem arises you want to be able to parse a string of arbitrary length with an arbitrary number of parentheses. That's of course impossible for reasons you stated. :-)

      But IN PRACTICE, the number of possibilities in your XML file is NOT arbitrary, it is fixed and predictable, so you can use regexps.

      I'm nitpicking, I know, but it still is CS. :-)

      --
      The man of knowledge must be able not only to love his enemies but also to hate his friends.
    12. Re:It's about tools, libraries by ajs · · Score: 2, Interesting
      Come Perl 6, of course, you'll have the best of both worlds:
      $data = STDIN.getlines().join('');
      if ($data =~ qr{ ^ (<xml>) $ }) {
      my XML $parsed = $1;
      if (my $n = $parsed.findnode('sometagiwant')) {
      print "Yep, it's there:\n$n\n";
      } else {
      print "Failed to find sometagiwant\n";
      }
      }
      And depending on what you want (memory vs speed) your "xml rule" in that regexp can do whatever annotation, datastructure building, etc that you want.
    13. Re:It's about tools, libraries by Anonymous._.Coward · · Score: 2, Interesting

      There's more than SAX and DOM out there. What about data binding tools? Generate some classes from your DTD/schema, call bind(xmlFile) and you've got objects to work with.

      There are even partial matching binding architectures. The best one I've seen is SNAQue.

      --

      take a triptonica to subthunk

    14. Re:It's about tools, libraries by Loma · · Score: 5, Informative
      You have used many big words, and you may have your language levels incorrect, but you are clearly wrong in one respect:

      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.


      Well, I've written my own XML parser, as well as a compiler for a simplified version of C, so I think I'm somewhat qualified to talk on this. A generalized XML parser is not memory intensive, unless you are a very bad programmer. All you need is a depth-first stack, which will be as high as your XML tree is deep. And given that, a stack of size N is capable of handling a tree of size X^N, you are definitely going to run out of disk space before you run out of RAM. In other words, the memory required for parsing an XML tree is trivial.

      An XML parser is one of the simplest parsers imaginable. It's a sophmore task to create a state machine to process the generic L(1) (or is it L(0)?) XML grammar. And as you should know, a state machine for an L(1) grammar is as fast as you can get.

      Anything you do with regular expressions will be much more complicated. As I'm sure you know, regular expressions are turned into state machines before being used to process the input. And almost all regular expression state machines are much more complicated than the state machine you need for an XML parser. In an XML parser, definite boundaries exist on elements such as:
      '<' and '>'


      Regular expressions are not this smart. For example, looking for the substring "abc" in the longer string "abababaaabbbabcabababac" is already generating a statemachine that is more complicated than that needed for XML parsers.

      Back to the "memory" intensive nature of XML parsers. If you parse your XML tree into a nested hashmap structure, then the memory needed will be proportional to the number of nodes in the XML tree. Maybe this is what you meant by "memory intensive". However, this is totally unnecessary. You can easily construct an XML parser to look for the specific elements you care about. Then you only get those elements, and you only need to allocate the memory for the elements required.
  6. I tend to agree. by NetDanzr · · Score: 3, Funny

    The last book on XML I read and understood was XML for Dummies.

  7. Re:But XML is great for computers... by Max+Romantschuk · · Score: 3

    First of all IDNRTA (I Did Not Read The Article)

    OK... This is exactly why you SHOULD read the articles... I just posted blatantly off topic due to an annoying quick-read = misread mistake... yay me :)

    Mod me down, I deserve it ;)

    --
    .: Max Romantschuk :: http://max.romantschuk.fi/
  8. Maybe he should have read Knuth by thogard · · Score: 4, Insightful

    XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.

    None of this would have ever been needed had CS been tuaght properly. There are other concepts to describe how files are to be organized. Some of the systems date from the 1950's. BNF (which seems to work very well for programmers to describe file formats to other programmers) dates from the early 1960's. What was needed is a BNF type grammar that is machine readable.

    Would XLM have ever taken off if the web used something sane and not a hacked version of a nasty text formatting system from decades ago?

    1. Re:Maybe he should have read Knuth by Ed+Avis · · Score: 5, Informative
      XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.

      WTF? Perhaps you could explain more about these two cases. As far as I know, general XML parsers such as Expat do not require unlimited memory to parse any finite input document, nor do they require infinite time.

      The Document Type Description (DTD) system is equivalent to a BNF grammar for XML documents. It's not quite as flexible as a full BNF because it enforces that elements are correctly nested, but I don't see this as a bad thing.

      And yes, DTDs are machine readable. Other grammars for XML documents such as DSD, XML Schema or Relax-NG are also machine readable.

      Just as with BNF grammars and flex(1), you can take a DTD and generate an efficient parser from it using FleXML.

      Comparisons with TeX aren't really appropriate because TeX is a Turing-complete language, and so impossible to parse automatically in 100% of cases (unless you want to allow that your program will sometimes fail to terminate, ie hang, on particular input files). I don't know what you mean by your subject line 'Maybe he should have read Knuth'...

      --
      -- Ed Avis ed@membled.com
    2. Re:Maybe he should have read Knuth by Sique · · Score: 2, Informative

      Comparisons with TeX aren't really appropriate because TeX is a Turing-complete language, and so impossible to parse automatically in 100% of cases (unless you want to allow that your program will sometimes fail to terminate, ie hang, on particular input files). I don't know what you mean by your subject line 'Maybe he should have read Knuth'...

      Maybe you should read Knuth also... There are two different things: One is the grammar and the other one is the language. You can write a turing complete language in a regular grammar (Chomsky Type 3), completely parseable with regexp (think: (([linenumber] ((INC|DEC) [register])|JMZ [linenumber])[newline])*). You can also write a primitive-recursive language using a free grammar (Chomsky Type 0) (think: your average english book about primitive-recursive languages), which is unparseable within finite time and memory.

      So TeX is a Turing complete language written in a Chomsky Type 1 grammar (It should be LL2, but I am not sure). XML for itself is a turing incomplete way to describe Chomsky Type 2 grammars.

      --
      .sig: Sique *sigh*
    3. Re:Maybe he should have read Knuth by Minna+Kirai · · Score: 2, Informative

      I don't know what you mean by 'in theory'. A finite input file requires finite resources. Period.

      He probably means "taken to the limit". A way of characterizing the performance of a system- how does it fail, when faced with an overwhelming amount of work? (It's like O-notation, which assumes the problem size is infinite to elimiate lower-order effects from the description)

      An infinite input file could require infinite memory to parse it. So what?

      The intention probably was to point out that a program which extracts from a non-XML database can be written to use constant memory, regardless of file size (or log memory, to be pedantic). Whereas with XML, the memory used increases as long as the file size does.

      (There are tricks which can reduce the memory use, but they usually come down to making assumptions about the formatting of the file, which can lead to skipping over malformed XML chunks)

      (I'm not espousing those views, just attempting to translate for you)

    4. Re:Maybe he should have read Knuth by Minna+Kirai · · Score: 3, Insightful

      I think the root of that difficulty comes from using XML to solve two different problems. One problem is data transmission between systems- which XML was designed for, and handles adequately. When recieving a data chunk from an external source who might not be trustworthy, a safety-concious program really has to read the whole thing and verify it complies with the format. Skipping over some sections to reach the part you're interested in isn't allowed.

      But, for data storage within an application (or a set of tightly coupled systems that trust each other to function correctly), XML is less advisable. Traditional (SQL) databases, or hand-rolled file formats, may be a better solution when high speed and scalability are needed.

      JoelOnSoftware has an long article on why XML is suboptimal for the latter use.

    5. Re:Maybe he should have read Knuth by J.+Random+Software · · Score: 2, Informative

      It's fairly common to comment out markup when hand-editing, since <![IGNORE[...]]> can't be used within a document. Skipping non-markup in the document should be just a matter of matching the Perl regex

      ( [^<]+ | <!--.*?--> | <\?.*?\?> | <![CDATA[.*?]]> )* <foo

      If someone else defines a foo element in a different namespace, I don't see how you can do anything other than ignore it--it's almost certainly not what you were looking for, and you have no idea what it might mean.

  9. Re:xml by Pyromage · · Score: 4, Informative

    XML isn't intended for web pages. That's what you missed:

    It's biggest use right now is data interchange. Moving bits between one magic widget and another. And for that, HTML sucks. It just can't represent arbitrary data. Programming languages (C++, Java) are for instructions, not data.

    XML fits in perfectly where it's at use-wise. Tim Bray is talking about programming for it: The available interfaces are very counter-intuitive, and that's what Bray's getting at.

  10. Re:But XML is great for computers... by CoolVibe · · Score: 5, Insightful
    Having a standard, structred, text-based, and editable-by-hand-when-necessary format is a godsend. Period.

    You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.

    My point: XML is over-used for a lot of things. In some places it makes sense, but in many places it doesn't.

  11. Re:xml by CynicTheHedgehog · · Score: 2, Informative

    When you're writing an application and you have to decide what format messages should be written in, or what type of file configuration data should be stored in, most people say, "Why, XML, of course. That way we're guarenteed that it is extensible, transformable, and readable by anyone who would ever need to read it." Granted, there are lots of other document formats in which that is the case, but they are not industry standard. As long as there is a schema, everyone will accept it. And if it's not in the format that they would like, they are free to run it through an XSL transformation. Easy as pie.

    XML is not hard, but it is a discipline. It requires a lot of reading and a fair amount of practice, but once you have it down, that's it. And from now on, your document storage design decisions (barring any space/memory constraints) are made for you.

  12. Re:xml by BFKrew · · Score: 2, Informative

    On the web, a big problem is that the content of the page is mixed in with the formatting. So, this content cannot be displayed easily on a PDA, phone or even across different browsers to an extent.

    By separting the content from how it is displayed makes it easier to display it in pretty much any format. By taking a single XML document you could create a page that looks great on Mozilla, great on IE, a WAP enabled phone, Opera, Microwave, Fridge - whatever!

    XML is NOT a programming language. It is more like a way of describing data and one MAJOR benefit in my opinion is that it is human as well as machine readable. I can ask my 'pointy haired boss' to make an ammendment to an XML document and he will pretty much be able to read it quite easily.

    It has plenty of uses such as a way of sharing data. There is no reason, for example, why a XML source could not be used in other webpages, as an input source for a database, or even as a way of getting output from your C++ program into my Java app, my ASP.NET page or even another C++ program!

  13. Re:xml by Uller-RM · · Score: 2, Insightful

    Since you apparently know nothing about XML, try reading the article. You'll learn something new, and you won't have to talk out your ass on this topic.

    XML's not a language -- it's a grammar, a guide of sorts, for hierarchical data storage. You design file formats that conform to XML. The goal is that it's easy to read that file format in any language or platform (given a XML processor/parser for that platform), since your data is stored in plain human-readable UTF8-encoded text.

    Might as well poke fun at the rest of your idiocy -- as it happens, HTML 4 is pretty close to being XML-conformant, and the W3C's now pushing XHTML which is fully conformant.

    Granted, a lot of people treat XML as another buzzword, the way that OOP once was. It's not a magic bullet -- it's just a guide to making cross-platform file formats, and it works pretty well for that.

  14. Re:This does not bode well by JimDabell · · Score: 5, Insightful

    Did you actually read the article?

    I can sum it up very easily:

    • Callbacks irritate him.
    • It's not always practical to build a tree in-memory.

    He's looking for a nicer api for processing XML, he's not looking to replace XML entirely.

  15. Short summary by Anonymous Coward · · Score: 5, Informative

    Tim Bray thinks that callback based XML apis are a bit awkward to use. He would prefer to use something like a pull parser (see for example http://www.xmlpull.org for examples in java) to the current perl xml apis.

    And he would probably want to be able to parse parts of documents ("XML Fragments"), rather than whole documents.

    I agree with his views (not using perl too much, though). But this is *not* the end of XML or anything. Tim just has some thoughts about how the xml api could be better in perl. Not very exciting, perhaps...

  16. He is right, I think. by expro · · Score: 3, Interesting

    Among other things ...

    (1) They need to eliminate the doctype can of worms. Unfortunately, this cries out for an alternative solution for character entities.

    (2) Namespaces need to be simplified and better integrated into the core of the language. Expanding on this, there need to be much better mechanisms for modularizing parts of the markup so that it isn't necessary to parse and hold everything in memory to make sense of it.

    (3) There needs to be clean-up and standardization of element id's and references, integrating it with (1) and (2).

    Do others have more? Should this be done compatibly with XML?

    I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

    1. Re:He is right, I think. by kalidasa · · Score: 4, Insightful

      1. Doctype is necessary. Perhaps you've never tried handling a very complex text (a big DOCBOOK text or a big TEI text). You need to know what kind of text you're dealing with, and there's no way to come up with one universal solution for all kinds of texts. The only character entities needed are the handful of named entities that are part of the standard: &lt; &gt; &amp; etc. The rest can be handled by Unicode (including the PUA) and transcoding (if you are using a ISO 8859 encoding and you need a character outside that encoding, then you need to rethink the encoding you've chosen to use. UTF-8 is your friend). Entities really are good for more complex units (strings, etc.), rather than single characters. What character entities have to do with DOCTYPE is beyond me.

      2. True

      3. Standardize element IDs? Element IDs are part of the text, not part of the structure. They're simply a way of simplifying the difficulty of accessing random parts of text.

      I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

      So you're saying we need a meta-meta-language? The *MLs are a standard for arbitrary abstract data (and text) models (because not all texts are hierarchical like DBs).

      I think the problem here is that DB programmers (I'm excepting Bray from this) are overusing XML for very simple DB tasks that it wasn't intended for. If you're just doing a 40 field, 30,000 record flat DB, XML is NOT the solution. But it is the best solution for complex non-hierarchical data (i.e., books, etc.).

      As for Bray, I don't think he's saying XML itself (the markup standard alone) is too hard, that it should be abandoned. I think he's saying we haven't come up with simple enough ways of accessing XML data through APIs. But of course that wouldn't be a spicy enough meatball for the Taco.

  17. Re:This does not bode well by ChimChim · · Score: 2, Insightful

    He never said his work, XML, is not well done. What he said was that the programming languages, APIs, and Environments haven't made the task of processing XML easy enough. XML itself is sound, or as sound as many alternatives.

    The thing is, back in the day when people wore onions on their belts, programmers had to be convinced that UNIX's "file is a bag of bytes" form of data access was better than the more direct/powerful/convenient methods they'd been used to, like raw access to the drive. But programmers aren't users, and what's great for users, or has benefits beyond the realm of CS will always complicate things for the programmer. However, the more complicated things are for programmers, the longer it will take to build systems and get usable products. So Tim Bray is basically saying that XML has succeeded in the data-interchange modekl, but is failing to also make programmers lives easier, which is also important.

  18. Re:Alas, XML by 6hill · · Score: 2, Insightful
    I have yet to see the xml come out of the dark ages, and until it decides to define exactly what it is or what it wants to be, I don't think it will.

    D'oh. What is the nature of the alphabet? To provide a common set of basic symbols from which to build the contents of a natural language.

    XML is a meta-language; it is specifically designed so that you, the user/code monkey/designer can define exactly what it is in terms of your projects. Unlike Java or other programming languages, XML is as free from in-built semantics as possible (i.e. "formless" as you put it) because it was meant to be that way! It's not a programming language, it's an alphabet.

    As for the uses of XML, I see a few things where it would be and is of great use:

    • storing representation-free data (i.e. same data could be imported into several programs that would then draw a graph, present a table, or devise a representational dance based on it)
    • an easily interpreted configuration/etc. language building blocks; readable by humans, operatable by machines, structured by definition
    • protocol languages in the lieu of SOAP

    And then there's the usual suspects: multichannel publishing, information sharing a la Amazon Associates, etc. XML bends to all these shapes, that's what makes it so beautiful.

  19. Re:This does not bode well by Random+Walk · · Score: 3, Insightful
    After reading the article, I would say he tries to use XML for something it is not very suitable for, and argues that in this case the available libraries are not useful (surprise ...).

    XML is not a stream - it has a hierarchical tree structure, and IMHO is not useful for anything that (a) by its very nature is a continuous stream of data (say, a log file), or (b) wants to be processed as a stream (because it's big, and would require too much memory to be handled as a single data structure).

    The problem seems to be that XML is good for portability and standardization, and therefore is abused for things it's not well suited for (the well-known 'if all you have is a hammer, every problem looks like a nail' syndrome).

  20. His idiom. by palad1 · · Score: 5, Insightful

    He's stating that he'd basically like others coders write more code the way he sees fit.
    [quote]
    while () {
    next if (XX);
    if (X|||X)
    { $divert = 'head'; }
    elsif (XX)
    { &proc_jpeg($1); }
    # and so on...
    }
    [/quote]

    Repeat after me: I will never leave parsing XML up to a regexp especially if my xml may contain CDATA and Comment sections. I will never...

    Unless you are 100% certain the file you are parsing is directly under your control, ie: no comments, no cdatas, params always in the same order, same indentation, same bloody encoding [pardon my french], well, you just will have to acces the data using some kind of DOM or abstract tree representation.

    I don't think he thinks no one uses XML, he seems to deplore the fact that some people don't get it at all and resort to heavy duty tools for trivial tasks [thus justifying his example above].

    Basically XML is quite simple, but that's not the matter, the problem is that XML bundles ACTUAL DATA, it's all about the complexity of those data, not the API used to access it [although writing a DOM implementation is a real pain]

  21. XML is good by Ender+Ryan · · Score: 4, Interesting
    I don't understand why so many people complain about XML so much. It's really quite useful for storing arbitrary data. We have several hundred thousand text-based documents where I work, and it has been a total nightmare, until I converted the whole thing(well, I'm not done yet...) to XML.

    The documents are generally displayed as HTML on the web, but they're also read by a couple different programs for different purposes. When I first started here, it was mostly a mess of poorly hand-written HTML, but thankfully there were *only* about 20k documents at the time.

    I was charged with the task of writing said programs to read these damn files. Unfortuneately, they weren't all marked up the same...

    Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.

    Yay for XML! :)

    So, to sum up, XML is doing what it was meant to do, no less. Unfortuneately, it's also probably doing a bit more as well, XSL anyone? Yeck, why not just have a stand XML scripting language, why the need for the language to be valid XML itself?

    --
    Sticking feathers up your butt does not make you a chicken - Tyler Durden
  22. Re:of course there is! (sorry for the prev post) by borgdows · · Score: 3, Funny

    arggh!!! fuck'in XML tags!! lol

    <?xml version="1.0" encoding="bork">
    <troll>
    <sovietrussiathing>In SOVIET RUSSIA, XML standardizes YOU!!</sovietrussiathing>
    <offtopic>Let's bomb the french!</offtopic>
    <flamebait>Anyway, XML is for loosers!</flamebait>
    </troll>

  23. WTF? by samael · · Score: 4, Informative

    XML isn't a replacement for Java or C++. Neither is HTML. You're looking at three seperate areas there.
    HTML is a page description language.
    C++ and Java are data processing languages.
    XML is a data description language.

    You can certainly describe a page using XML, and I see no reason why you couldn't construct a programming language using XML syntax, but how on earth are you going to store data in C++ or Java?

  24. "Load into memory" vs. "Callbacks" by itsallinthemind · · Score: 4, Informative
    Say what you will about Microsoft - and many of you have - but they really got it right with their XmlReader class in .NET. It streams the document like SAX (the "callback" interface Tim mentions in his comments), but allows the programmer to cursor over the document manually rather than having to handle everything in thrown event handlers (which I agree can be a real headache, especially in highly variable or deeply nested documents.)

    XML is just one of the tools in our collective toolbox. Use it where it helps you solve a problem. Don't bother if it doesn't.

  25. XML is not a programming language... by borgdows · · Score: 2, Insightful

    ... it's a convenient format to store and retrieve hierarchical information, that's all.

  26. Re:Don't Blink by samael · · Score: 2, Informative

    You can use XSL to translate any XML document into a different format. So your old documents should be convertable.

    If your subdialect keeps changing, that's down to the people defining the syntax, not the language itself.

  27. XML: bad implementation of a good idea by g4dget · · Score: 4, Interesting
    I have to agree that XML has serious problems.

    Now, I have to say: a universal syntax for tree-structured data is very useful: experience since the 1970s with one such universal syntax, Lisp, has shown that. It is unfortunate that XML is about the worst imaginable implementation of that idea. XML combines being a nuisance to type with having comparatively complex semantics and lots of redundant features.

    What is ironic is that the same "real world programmers" who wax ecstatic about XML also condemn Lisp as too complicated and too difficult to read. The universal syntax that XML aspires to, Lisp syntax delivered many decades ago. It's just that prejudice and ignorance caused people to re-invent the wheel (and in square form, too) in the form of XML.

    I am pretty torn between whether XML is a blessing or a curse. We really need something like it, but XML is so bad that it may not even live up to the level of "poorly designed industry standard but better than nothing".

  28. In related news... by arvindn · · Score: 3, Funny
    It's now official. C++ creator admits it was all a hoax! Read on for the details of the stunning scoop...

    On the 1st of January, 2003, Bjarne Stroustrup gave an interview to the IEEE's 'Computer' magazine.

    Naturally, the editors thought he would be giving a retrospective view of twelve years of object-oriented design, using the language he created.

    By the end of the interview, the interviewer got more than he had bargained for and, subsequently, the editor decided to suppress its contents, 'for the good of the industry' but, as with many of these things, there was a leak.

    Here is a complete transcript of what was was said, unedited, and unrehearsed, so it isn't as neat as planned interviews.

    Interviewer: Well, it's been a few years since you changed the world of software design, how does it feel, looking back?

    Stroustrup: Actually, I was thinking about those days, just before you arrived. Do you remember? Everyone was writing 'C' and, the trouble was, they were pretty damn good at it. Universities got pretty good at teaching it, too. They were turning out competent - I stress the word 'competent' - graduates at a phenomenal rate. That's what caused the problem.

    Interviewer: Problem?

    Stroustrup: Yes, problem. Remember when everyone wrote Cobol?

    Interviewer: Of course, I did too

    Stroustrup: Well, in the beginning, these guys were like demi-gods. Their salaries were high, and they were treated like royalty.

    Interviewer: Those were the days, eh?

    Stroustrup: Right. So what happened? IBM got sick of it, and invested millions in training programmers, till they were a dime a dozen.

    Interviewer: That's why I got out. Salaries dropped within a year, to the point where being a journalist actually paid better.

    Stroustrup: Exactly. Well, the same happened with 'C' programmers.

    Interviewer: I see, but what's the point?

    Stroustrup: Well, one day, when I was sitting in my office, I thought of this little scheme, which would redress the balance a little. I thought 'I wonder what would happen, if there were a language so complicated, so difficult to learn, that nobody would ever be able to swamp the market with programmers? Actually, I got some of the ideas from X10, you know, X windows. That was such a bitch of a graphics system, that it only just ran on those Sun 3/60 things. They had all the ingredients for what I wanted. A really ridiculously complex syntax, obscure functions, and pseudo-OO structure. Even now, nobody writes raw X-windows code. Motif is the only way to go if you want to retain your sanity.

    Interviewer: You're kidding...?

    Stroustrup: Not a bit of it. In fact, there was another problem. Unix was written in 'C', which meant that any 'C' programmer could very easily become a systems programmer. Remember what a mainframe systems programmer used to earn?

    Interviewer: You bet I do, that's what I used to do.

    Stroustrup: OK, so this new language had to divorce itself from Unix, by hiding all the system calls that bound the two together so nicely. This would enable guys who only knew about DOS to earn a decent living too.

    Interviewer: I don't believe you said that...

    Stroustrup: Well, it's been long enough, now, and I believe most people have figured out for themselves that C++ is a waste of time but, I must say, it's taken them a lot longer than I thought it would.

    Interviewer: So how exactly did you do it?

    Stroustrup: It was only supposed to be a joke, I never thought people would take the book seriously. Anyone with half a brain can see that object-oriented programming is counter-intuitive, illogical and inefficient.

    Interviewer: What?

    Stroustrup: And as for 're-useable code' - when did you ever hear of a company re-using its code?

    Interviewer: Well, never, actually, but...

    Stroustrup: There you are then. Mind you, a few tried, in the early days. There was this Oregon company - Mentor Graphi

    1. Re:In related news... by Rich0 · · Score: 2, Funny
      Actually, I've always liked this story. My favorite lines:

      We stopped when we got a clean compile on the following syntax:
      for(;P("\n"),R-;P("|"))for(e=3DC;e-;P("_"+(*u++/8) %2))P("|"+(*u/4)%2);
      At one time, we joked about selling this to the Soviets to set their computer science progress back 20 or more years.
  29. Re:But XML is great for computers... by Ed+Avis · · Score: 5, Insightful
    You mean like most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)? These have standard layouts, text-based, can be edited by hand and can be easily parsed.

    You just gave the best argument for adopting XML as widely as possible. Yes, all these can be parsed (with the possible exception of sendmail's config files which may be Turing-complete) but they all require *different* code for each config file. If they were in XML you'd still need different semantic code, of course, but a whole wodge of syntax issues (how do I quote strings, how do I escape newlines, how do I mark nested scopes, what happens when the string delimiter character occurs inside a string, how do I deal with comments, what is the character set, is there a formal grammar for the document, etc etc) would be dealt with. Maybe not in the way that you or I think is perfect - IMHO XML is a little bit verbose compared to say Lisp- or Tcl-style encodings. But they would be dealt with *once*. No need to learn a new or almost-the-same-but-slightly-different set of syntactic conventions for every single config file.

    Maybe XML is over-used for a lot of things, but making up your own file format is definitely over-used a lot more. Simple line-oriented files are reasonable to have as plain text, for everything else please avoid the temptation to reinvent the wheel by devising a new syntax and block structure.

    --
    -- Ed Avis ed@membled.com
  30. Still good for some things by krygny · · Score: 2, Interesting

    The hype and promise of XML has gone too far. It's a boon for document type data. Semantic content like documentation, on-line content, even spreadsheets and email. (e.g., why isn't there a standard address book format based on XML that any application on any platform can use interchangeably?)

    But using XML to build relational databases is slipping a round peg into a square hole. You need something to putty the corners.

    --
    Research shows that 67% of those who use the term "research shows", are just making shit up.
  31. Oh please! by gwappo · · Score: 5, Interesting
    It's annoying when posters get presumptious. The people complaining in the article are by all means elite programmers, proclaiming xml is okay because "programming *is* a hard task" is non-sense and in the same league as "HLL's are for wussies, real men code in assembly" and other crap.

    The criticism on XML is accurate, correct, valid, if only for the simple reason that the code needed to interface with the libraries is 90% plumbing-work and 10% business-solution. That 90% plumbing-work leaves oppertunity for _a lot of bugs_ to be created and for any solution using XML to become a resource-hog.

    Having a standard interchange format like XML is a fun-thing, and "good", as it allows standardized processing of these formats. However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

    1. Re:Oh please! by jallen02 · · Score: 3, Interesting

      Isn't interfacing with a library by definition "plumbing" though?

      I did find the SAX API (In Java) a little tedious to work with for maybe a few days, but after I got used to the idiom it was pretty straight-forward. The interfacing with the library was not really a lot of "extra" code. Most of my SAX parsing code spends it's time in a content handler firing of events based on XML it is processing.

      I still cleanly separated the XML interfacing from the server. Once the plumbing is set up, my server doesn't even have to know it is there for the most part. And I rarely have to deal with the interfacing to the library after the initial separation. I either go below the parser level via filter
      streams or above it, but the XML parser just does it's job.

      It is a tough question to answer, but doesn't having a certain level of configurability necessitate some level of compexity? I think C# does a decent job at keeping the XML processing more simple while still giving the configurability, but to tap into that configurability there is still complexity involved. I think that the problem is easy to identify and the solution will take many more brain cycles to find :)

      Jeremy

  32. too hard by PhilipMatarese · · Score: 5, Funny

    Admitting something is too hard is too hard for programmers.

    Now I'll go read the article.

  33. Hahahah finallly something I know a lot about. by BeerSlurpy · · Score: 4, Interesting

    We use XML heavily in a project I'm working on at my company. Some genius decided that everything should be in xml, and that we would use XSLT for a lot of the data manipulation. Naturally we also make heavy use of DTD and SAX. Lots of XML related technologies.

    I can tell you now that XML is a Bad Thing. It strives to excel at too many things at once, and becomes inefficient and complex as a result.

    XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with. Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form. XML document tree traversal = 10000x more complex than getting column data out of a ResultSet... Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

    The real problem is that XML only partly addresses the problems that relational database solved years ago (organizing and data accessable), but it does it without any of the efficiency benefits of a well designed database server. In my opinion, 90+ percent of the places where XML is being used today would be better served by using columns in a relational database table to store object fields. You get indexing, you get universal, simple and efficient searching, and you get speed.

    XML has too many faults to really list in one short post. The truth of the matter is that it tries to do too many things and DOESNT DO ANYTHING WELL. Sort of like if someone tries to be skilled in all musical instruments but ends up being, at best, mediocre in a few of them.

    1. Re:Hahahah finallly something I know a lot about. by kalidasa · · Score: 5, Insightful

      If you're working with data that can be meaningfully represented with columns, you're using the wrong damned tool. XML is for complex structured data, which it does fine. It is not for tables. Don't blame the tool, blame the idiot who thought that XML was a good way to do DBs.

    2. Re:Hahahah finallly something I know a lot about. by Eric+Savage · · Score: 2, Insightful

      XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with.

      This is true if you are parsing your own data, but what about parsing third party data? I did that for years and every day was full of dealing with corruption, misformatted files, or formats that varied from the documentation because some new guy was making them on the other end.

      True, these problems can happen with XML but they are much easier to spot. Send me a file and a DTD/Schema and I can tell you in a second if any future files are bad.

      My view of XML is that what it does really well is transfer data. As far as storing data, well I only consider it when a database isn't available.

      --

      This is not the greatest sig in the world, this is just a tribute.
    3. Re:Hahahah finallly something I know a lot about. by Arandir · · Score: 2, Insightful

      Executive Summary: XML is not RDMS which makes it damn hard using this XML screwdriver to hammer in RDMS nails.

      Your main problem is that you think a tree should be a table. I think you need to get off of your RDBMS religion and realize that that there's a whole world of data our there that perfectly capable of not being shoved in a table before it can be used.

      --
      A Government Is a Body of People, Usually Notably Ungoverned
  34. XML is a MARKUP language by kahei · · Score: 3, Insightful

    ...and for doing generic markup in a relatively simple way, it's good.

    For storing arbitrary data, and use as a message format (as in SOAP), it's not so good because it has markup-like features, such as the distinction between attributes and elements and the distinction between text and element nodes. (The latter in particular is a huge pain, I wish people would agree to only use text nodes in leaf elements.)

    This is why XML parsers/generators, once they get into entities and DTDs and so on, become really a lot more complicated than they would need to be if XML just stored a tree of elements.

    However, it's the standard, so we might as well just shut up and use it.

    My opinions have no special importance but it *is* important to remember that XML is a markup format that is being used mostly for things other than markup.

    --
    Whence? Hence. Whither? Thither.
  35. similar problem with MathML by e**(i+pi)-1 · · Score: 5, Insightful

    It might be too late to correct some things in XML.
    Good about XML is, that whatever will emerge in the future,
    it will always be possible to convert old documents into any
    new form, using simple tools.

    There is a point with critics: Unlike Latex or HTML which
    can be written easily by hand, XML can become too bloated to
    be authored directly by humans.

    Similar problem with MathML:

    Latex: $x^5+3x-9=0$

    MathML:

    <mrow>
    <mrow>
    <msup>
    <mi>x</mi>
    <mn>5</mn>
    </msup>
    <mo>+</mo>
    <mrow>
    <mn>3</mn>
    <mo>&InvisibleTimes;</mo>
    <mi>x</mi>
    </mrow>
    <mo>-</mo>
    <mn>9</mn>
    </mrow>
    <mo>=</mo>
    <mn>0</mn>
    </mrow>

    You can write complicated formulas in Latex directly but it is
    almost impossible to do so in MathML, where one has to rely
    on tools to generate it (i.e. export it with Mathematica or
    TeX -> MathML converters). Wouldn't it be nice if browsers
    would understand a basic version of LateX? (That it is possible
    has been shown with IBM's texexplorer plugin).

    1. Re:similar problem with MathML by metasyntactic · · Score: 2, Insightful

      One thing that you seem to forget is that XML is useful for putting down the structure of the object in question, while leaving the presentation up to some third-party app.

      The XML snippet is indeed more verbose, however it carries much more semantic meaning than lour latex snippet which is just pure text.

      How is this useful? Well assume that I'm blind and I use applications that speak text to me. I'll end up with:

      "dollar-sign x carat 4 ..."

      Whereas with MathML my text-to-speach agent can actuall say:

      "x to the fifth plus 3 x minus nine equals zero".

      I write latex a lot, and it's a joy to write expressions that will end up looking great. However, I know that when I do so, I'm leaving the mathematical world for the one of fascinating typesetting.

      You say that XML can become to bloated to be edited by humans. On that point you are 100% correct. However, remember that one of the tenets of XML is that it should be possible, but not necessarily fun or easier, to hand code up input, as stated by the w3c . All that's required is that the format be human-legible and reasonably clear. If you find writing MathML too difficult (something which would not surprise me at all), then I suggest you work on a tool that converts Latex to MathML. Hell, I'll even help you with it. But given my experience with Latex I am extremely wary as I have no idea how that complicated beast works and I would imagine it would be quite difficult to infer a lot of the mathematical semantics from most Latex snippets.

  36. Huh? by Sparky69 · · Score: 2, Interesting

    What was his argument again? Reading the whole thing into memory is too slow? Ok, agreed, hence SAX. When you're a perl programmer everthing is a regular expression. Look Perl was the first language I learned. I'm all for perl it's wonderful, poetic and fun. And it handles XML perfectly. Are you telling me that using relational databases is easier than XML? That you can just sit down and start doing it without reading some books or at least a couple online tutorials? That's nonsense. The benefits of XML outweigh it's shortcomings IMHO. Especially Schema validation. I love knowing the fact that I don't have to rewrite the same goddamn code to make sure my input is sane! I make a schema for it and voila. Yes the schema spec is big. But have you read the full SQL spec? Of course not. You use a nice little subset and get your work done. Same with the schema spec. I use about 4 tags for 90% of the documents I need to create. So let's summarize XML in a couple rules (there is one caveat, see below): 1. Every element is in between angle brackets 2. Close every tag you open in the reverse order (like a stack but this is far too complicated a subject for people programming, there are NO stacks in computers....right). Does anyone force you to use XML? Of course not. That's a weak argument but it's true. XML gives you the choice to not reinvent a structured data format. I'm not a programming guru by anyone's hallucination. I've been working with XML for a while now (3 years) and it's been terrific. Yes you have to learn some stuff and yes some of the API's are a bit terse but show me something that isn't. What I've come to realize is that if you want to move forward you do have to change. Programmers bitch and whine about how end users don't want to change their UI. Well this sounds like programmars that don't want to move their brains a little and stop seeing things as regular expressions and start seeing them as XML. Stop trying to reinvent the wheel everytime you need to parse a document and move up an abstraction. And it strikes me as odd that one of the cocreators doesn't seem to "get it". The whole point of making a standardized format is so that you can abstract the parsing, transformation and validation functionality. Just my 2 cents CAD. Andrew

  37. XML parsing models by HalfFlat · · Score: 3, Informative

    If I understand it correctly, the author is lamenting that neither of the standard ways of parsing XML in a scripting language fit the straightforward model of scanning for something relevant and then acting upon it, where the two models are: 1) read in whole file and make a tree (take sup too much memory, is slow, etc.); or 2) use a callback interface.

    The style of perl script he was seeking was a simple loop model:
    while () {
    next if /ignorable/;
    if (/thing-one/) { ... }
    elsif (/thing-two/) { ... }
    ...
    }

    To me the thing that distinguishes this the most from the provided XML parsing interfaces is that it has a minimal amount of state.

    So isn't what is needed a corresponding structure to the while () above that iterates over the tree-nodes of the XML-encoded data structure, in a depth-first preorder traversal (to avoid having to build the whole tree first)? One could imagine a parser object that scans through the XML file returning nodes (and their parent history) while maintaining an absolute minimum of state. If one wanted to build an in-memory representation of a subtree given a node, then one can always do so when one finds the node one wants.

    Such an interface wouldn't be good for integrity verification or the like, but for the sort of application the author was talking about, it would seem ideal. Much less flexible than the normal models, sure, but much easier to work with when the problem fits this sort of description. Perhaps I'm underestimating the difficulty of the task, but it doesn't sound too hard to write, given that it is doing so much less than the fully-featured XML parsing interfaces.

    The other problem is the awkwardness of the use of XML in O-O languages such as addressed in the article linked-to by Tim Bray in his article. Though I haven't used this particular program, this seems to be the problem that FleXML is trying to address. When you don't need all of the flexibility that XML can provide, but instead have a fixed schema that your XML-representation follows, why not have your parser automatically built to read it? People have used lex/flex for scanning text files for decades --- in these days of XML Schema, it should be even easier. If FleXML lives up to its promise, it will be. Has anyone here used FleXML and are willing to comment on how well it addresses these sorts of problems?

  38. Re:xml by FireAtWill · · Score: 2, Interesting

    I've been working on EDI applications for many years now. I view XML as another attempt to solve the same problem as the ANSI X12 standards. The problem is, 'that problem' was never *the* problem.

    In the old days (in my industry), there was a COBOL oriented file structure called the National Standard Format (NSF). It was typically documented as a set of maybe 10-20 hierarchical record formats. The mechanics for reading the files were immediately obvious. The problem was understanding what needed to be done with the data. Of course, there was often a need for a new data element and it got shoved into some filler field, resulting in the National Standard Format becoming the Nearly Similar Format.

    To resolve this issue, the industry jumped on the ANSI X12 bandwagon. ANSI X12, like XML offered a flexible, platform-independent standard for representing hierarchical data structures.

    Platform-independent means that it's equally difficult to use on all platforms. The 10 pages or so of NSF COBOL record layouts were replaced by a couple of binders worth of standards. One for X12 and one containing the various industry-specific transaction sets. Expensive tools emerged to read the new files and cram them back into the familiar and more workable structures.

    'Flexible standards' turned out to be an oxymoron. There are so many options that it is extremely difficult to anticipate what sort of odd interpretations you'll be forced to deal with. And deal with them we must, because the Feds have mandated the way in which we must exchange data (HIPAA).

    And still we find ourselves needing extra pieces of data for specific trading partners that we put into places that are beyond the standard.

    I'd rather use XML than ANSI X12, but I'd rather not use either. They add much complexity and infernal flexibility in order to 'solve' what used a trivial task - agreeing on a data format.

    If we want something truely useful, we'd forget about markup languages and specify an open database format similar to Access that actually has value beyond the narrow problem being addressed.

  39. Perl suggestion by skillet-thief · · Score: 2, Insightful

    I don't know what's going on in Perl 6, but it seems like Perl needs some kind of built-in way of running through an xml file by tags, in a way similar to the standard line by line file reading operator. Rather than grabbing a single line at a time, or having to slurp in the whole file before whacking it up, you should be able to pass a regex to the input operator so that it will stop when it gets to the end of a chunk of text defined by an end tag.

    Obviously, there are ways of getting around this by using a line-by-line approach, but I'm pretty sure that if such a thing existed and was easy to implement, it would get used a lot and would make Perl far more xml friendly.

    --

    Congratulations! Now we are the Evil Empire

  40. Re:But XML is great for computers... by Anonymous Coward · · Score: 4, Interesting

    Right, so instead of using one regexp for /etc/hosts and another regexp for /etc/passwd, I'd have to use ten pages of getTheGodDOMObjectFromTheGodDOMXMLFile crap for /etc/hosts.xml and another ten pages for /etc/passwd.xml.

    How, exactly, has XML simplified *anything*?

  41. I agree, of course... by alispguru · · Score: 4, Insightful
    Given my .sig, how could I disagree?

    XML got one thing right over unadorned S-expressions - document packaging, specifically versioning and character-set labeling. XML inherited this from SGML, and it's one of the few things it took from there that was actually worth keeping.

    For a good laugh, read the Origin and Goals section of the XML spec. Of the ten goals for XML listed there:

    XML shall be straightforwardly usable over the Internet.

    XML shall support a wide variety of applications.

    XML shall be compatible with SGML.

    It shall be easy to write programs which process XML documents.

    The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

    XML documents should be human-legible and reasonably clear.

    The XML design should be prepared quickly.

    The design of XML shall be formal and concise.

    XML documents shall be easy to create.

    Terseness in XML markup is of minimal importance.

    I'd say two of them were met, but were bad ideas (SGML compatibility, terseness unimportant), and five of them were completely missed (ease of use, human legibility, quickly designed, formal and concise, ease of creation).

    Thirty per cent is a failing grade, folks...

    --

    To a Lisp hacker, XML is S-expressions in drag.
    1. Re:I agree, of course... by Twylite · · Score: 2, Interesting

      Shameless self-plug, but I have a critique of XML's failure to meet its goals on my home page. You may find it interesting.

      --
      i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
    2. Re:I agree, of course... by g4dget · · Score: 2, Insightful
      One other nice thing about XML is that closing tags are matched with ending tags. If you leave of a closing paren in Lisp, the parser will give you an error but it can't pinpoint where you screwed up. But an XML parser can spot which closing tag is missing, which means you don't have to hunt for it yourself.

      That would be a valid argument if XML were designed to be regularly input by humans. But XML is so cumbersome otherwise that almost all of it will be either machine generated or edited in special editors. And balancing closing tags is easy in Lisp if you use a special editor.

      Also, most versions of Lisp give you two separate, equivalen pairs of parens that you can use for checking. So, you write:

      [item (part-no 123456) (available 5) (stores 3 7 9)]

      And checks can be incorporated into the definition of specific constructs. So, you could have:

      (item (part-no 123456) (available 5) (stores 3 7 9) enditem)

      Or, you could make this an optional part of the syntax, allowing people to close a list starting with "x" with "/x", but not requiring it:

      (item (part-no 123456) (available 5) (stores 3 7 9) /item)

      Also, one of the major ideas of XML is to separate code from data, as opposed to Lisp where code and data are the same thing. Similar syntax, different philosophy, I guess.

      Lisp programs separate code from data all the time, just like well-written programs in any other language. It's just that on those occasions when you do have to deal with code, you can do so using the same syntax as you use for data. In different words, separating code from data does not require for code and data to have different syntax.

      The fact that several web standards use incompatible syntax (DTD, CSS, etc.) is actually a big problem. And the fact that almost no web code is written in XML syntax means that all those scripts are inaccessible to XML parsers and easy automatic analysis. Just imagine how nice it would be if the stuff inside the JavaScript tags could be analyzed and indexed with a bit more confidence.

  42. what the hell are you talkin` about? by Ender+Ryan · · Score: 2, Insightful
    It strives to excel at too many things at once, and becomes inefficient and complex as a result.

    I agree with this, to an extent. If you don't like/need all the fluff, don't use it. XML is only as complicated and inefficient as you want it to be.

    XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with.

    It's not just about writing parsers for a single program. What happens when you have several programs that read the same type of file? What if said file-type is somewhat complex. XML keeps things simpler and easier for these cases.

    Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form.

    What on earth are you talking about? YOU define the format of your XML data. If it doesn't need to be complicated, don't complicate it!

    XML document tree traversal = 10000x more complex than getting column data out of a ResultSet...

    Again, what? Keep the XML simple, and it will be just as easy.

    Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

    Then XML isn't the proper solution for your problem. Just because some dipshit tries to force XML to do things it isn't optimized for doesn't make XML any less useful.

    *snip* the rest of your comments comparing XML to relational databases.

    XML files are not high performance databases... Use the right tool for the job, and you will be much happier.

    It sounds to me like XML isn't your problem. Your problem is the "genius" at your company that needs to be beat over the head with a clue stick. If I were you, I'd be sure to beat him hard.

    --
    Sticking feathers up your butt does not make you a chicken - Tyler Durden
  43. Stay on topic - problem isn't XML standard by cdthompso1 · · Score: 5, Interesting
    Tim Bray's article, if you didn't read it, is right on the money. The last paragraph basically states that XML is the best alternative to the data interchange problem because it provides a consistent format. Some of you guys who are rounding up the mob and lighting buildings on fire calling for book burnings and the downfall of all XML have to read the article! You're not in agreement with Tim when you say, "Sure, I think XML sucks, too."

    So to be clear, XML is here to stay. (An example of XML penetration: there is a working schema for using XML in the farming industry!) Just imagine the chaos that will insue once MS Office saves all documents in true XML.

    My take on the problem Tim's really talking about: inconsistency and the proliferation of people who want to be the next prodigy in their area of expertise. There are so many parsers and interfaces, even within a language domain, because vendors want to put their own spin on everything. The alphabet soup that results confuses the hell out of people. This has even happened in the open source world, where I can do a Google search on "php xml parsing" and read articles on no less than 10 different approaches. For the average guy who has been told by a project manager, "We need to take these XML files from our business partner, extract and store the data in our database," you need a standard approach. Not to stifle thought and innovation, yes, you should take the initiative to understand whether an event-driven approach (SAX parser) or an in-memory object model approach (DOM parser) is right for the job. After all, you do get paid to do this, so earn your keep! But the XML community hasn't done a good job of specifying best practices and leading people by the nose to a solution. Every XML book I've seen furthers the confusion, with each other offering his opinion with a slight variation of how to do things, leading programmers/scripters/whatevers to use the approach they most recently read about, and not necessarily the one that time has proven out to be the most efficient.

    Part of this is the divide between the .Net guys, the Java camp, the Perl/PHP folks, etc., but in the spirit of interoperability, maybe the XML promoters just need to dumb things down a bit to get some simple concepts and best practices into the hands of Joe Sixpack Programmer. Maybe a central authority, a la java.sun.com or php.net?

  44. Re:But XML is great for computers... by Ed+Avis · · Score: 2, Interesting

    (Replying to AC post, please mod it up if you can.)

    I admit that interfaces like DOM are rather clunky. But your regexps would break if a new field were added to /etc/passwd, or probably even if the format were changed to allow comments. So files like /etc/passwd become fossilized over time.

    The answer is a better interface for reading XML files, one that knows about the format (which is described in a DTD or other grammar) and can present a neat interface like

    passwd.user["abc01"].real_name

    (or whatever the syntax of your preferred language looks like). DOM is so awkward because it knows nothing about whether a element would be present, or whether there might be more than one of them, or whether whitespace before and after the element is significant, so it has to provide an API to explicitly wade through all that just in case you want it. A tool like FleXML which knows that must appear exactly once and in a particular place can put it into a single field.

    (Actually FleXML isn't ideal for this example because the parsing code it generates will stop working when the file format is extended, if new elements started appearing inside . But if you made the generated code only a little bit slower it could skip over these extensions to the file format, so existing apps would continue to work when new things were added to the DTD.)

    The answer I think is for programming languages which better support XML, which can read a document and put it into the language's native data structures. Libraries like Perl's XML::Simple try to do this, but they do so without any knowledge of what the legal documents are, so the resulting interface is still rather awkward.

    --
    -- Ed Avis ed@membled.com
  45. C doesn't have it. by torpor · · Score: 2, Insightful

    Really.

    There's *still* nothing out there that can take my structs', parse them out to XML, then load them back again when needed, seamlessly.

    The embedded sphere - where XML is *USEFUL*, and where *C* is *ALSO USEFUL* - has no chance with XML right now.

    It's either libexpat and a monster callback module, or bust.

    --
    ; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
  46. Re:xml by frisket · · Score: 2
    >>XML isn't intended for web pages

    Wrong. This is precisely what XML was intended for. Go and read the Spec.

    Where we went wrong was in using XML for spreadsheet/database-style rectangular data, for which it was never designed, and for which is it grotesquely unsuited.

  47. XML is bad like Democracy is bad by Washizu · · Score: 4, Insightful

    XML is bad like Democracy is bad. It's just better than the alternatives.

    I had a problem at work when we switched from AutoCAD to Solidworks. Our manufacturing software couldn't read the new BOM files, which were Excel's .xls. Without ever looking at our system's BOM files before I wrote a program that read the .xls and built a proper XML BOM file our system could read. If our system wasn't using XML, who knows how long it would have taken me to figure out the intricacies of a proprietary file format.

    --
    OddManIn: A Game of guns and game theory.
  48. RFC822 by semanticgap · · Score: 2, Insightful

    Before XML there was (and still is) RFC822 which describes how headers are formatted in e-mail, HTTP and a slew of other protocols.

    I've been down the route where I tried to use XML where something as simple as "key: value" would do, and before I knew it, my program became a bloat relying on third-party XML libs, the config files were only marginally human-readable and a lot of time was wasted thinking about virtues of DOM vs SAX. In the end I learned that using XML for sake of XML isn't worth it.

    I think XML is OK if used appropriately - for example I think XML is perfect for something like storing word processing documents. But the idea that every config file and every bit of network traffic should be XML is stupid IMHO.

  49. Re:Meta XML by rabidcow · · Score: 3, Informative

    This is bad XML design.

    This would be better:
    <date year=2003 month=3 day=18/>

    I used to think XML was just horribly bloaty and ugly, now I think it's more like VB in that it's easy to make something that's very poorly designed.

  50. Re:But XML is great for computers... by rabidcow · · Score: 2, Informative

    most other non-xml config files in /etc, like say hosts, DNS zone files, named.conf, passwd/shadow, hosts.allow/deny, sendmail.mc or resolv.conf (etc. etc.)

    all these can be parsed but they all require *different* code for each config file.

    Nonsense, if you're smart about your parser, you'll need about 3. If you're not smart about your parser, you'd probably design lousy XML anyway.

    how do I quote strings, how do I escape newlines, how do I mark nested scopes, what happens when the string delimiter character occurs inside a string, how do I deal with comments, what is the character set, is there a formal grammar for the document, etc etc

    afaik, most config files ignore these issues, but you could easily separate these options from the core of the parser. Pass them in as a traits class or something.

  51. It takes more than a set of tools by apankrat · · Score: 3, Insightful

    > However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

    It takes more than a set of good tools for a technology to become 'a widespread success'. A clear justification why XML is better than existing standard marshalling techniques would be a good starting point. ASN.1 DER, simple container LSB serialization and others.

    I'm probably beating the dead horse here but XML has at least two properties rendering it useless for any performance-aware application:

    (a) unlike, say, TLV it does not allow effeciently skipping parts of the data you dont need or aware of. I.e. in order to skip the section, you need to read and parse it first.

    (b) XML's is a lazy man ASN.1 DER. It's all there in much more compact and elegant form. The only 'drawback' in the eyes of XML crowd is that it's binary. Sure, everyone knows that encoding numbers as strings is a definite way to improve upon the performance and scalability of everything from network protocols (SOAP, BXXP, UPNP) to a basic document processing. Right on.

    The bottom line is that XML has probably reached its acceptance limits. Whoever accepted XML for granted or stuck with it or is not willing to learn about alternatives will keep on whining about tools being sucky. That's life, but OTOH it's only the small part of it.

    --
    3.243F6A8885A308D313
  52. Re:But XML is great for computers... by Dalroth · · Score: 2, Interesting

    In C# at least:

    XmlDocument Doc = new XmlDocument();
    Doc.Load("/etc/passwd.xml");
    string Password = Doc.SelectSingleNode("/users/user[@name='dalroth'] /@password").Value;
    Really doesn't seem that difficult to me. Bryan
  53. Re:But XML is great for computers... by Zaiff+Urgulbunger · · Score: 2, Insightful

    Indeedy.

    And I've said it before, but I'll say it again -- XML as most people see it is *just* the serialised form of an XML structure. The same as Databases don't actually have to store lists of data in the order that you read it in.

    But as you quite rightly point out, having a standard, very accessible (if slightly verbose), method to create and edit data structures is indeed a god send!

    Here's an idea (which I've also said before!) - imagine if all those config files were XML based. So you could edit them using a text editor - same as now except slightly more cumbersome to edit.
    But we're agreed that being able to use a basic tool such as a text editor is a good thing right?

    Okay, so next up from that would be an XML editor so you can navigate the structure to find the element you want to tweak. The nice thing here is that you've got a standard tool that works with any XML file and therefore any config file.

    You can also build standard tools to work with these standard files so automating the update of a number of config files would be easy.

    Now lets go back to the whole thing about serialisation -- we're just manipulating data structures. The text-based, serialised form of these structures is called XML. The good thing is being able to edit with a text editor -- available on *any* platform including non-current platforms where no active development is occuring.

    But we're not limited, and we can build tools to work these structures more effeciently. And we don't *have* to use the serialised form if we don't want too -- it just happens that at this point in time, were the tools are not as evolved as they will be, it makes sense to use the text based form.

    In the future we could for example have a file system that is structured like an XML file? So then all those separate config files become part of the one structure, and thus even easier to manage.

    I'm rambling, so I'll stop now! My points are simply that, yep XML isn't perfect but don't get too hung up on it's being large-verbose-text-files, but it isn't -- thats just how it is currently being presented. Instead look at how it bridges the divide between old school proprietry, closed, binary formats, and the accessibility of text files.

  54. Java XML Parsing by SurfTheWorld · · Score: 3, Interesting

    Let's decompose the XML parsing "problem" (if one actually exists) into smaller components that we can reasonably discuss. XML parsing is too broad a topic to intelligently discuss, but if you limit it to XML parsing in Java you suddenly have a topic small enough to be manageable. So let's discuss Java parsing in XML.

    When XML was first introduced, there were no standard libraries in the JDK to facilitate parsing. What's more, the few projects out there varied wildly in how you actually used their DOM tree or SAX callback mechanism. This isn't necessarily a Bad Thing (tm), it's the same problem every emerging technology faces: immature tools. This is basic biology - lots of competing implementations (life forms), each struggling for community (resources).

    So, time goes by, and eventually a handful of implementations emerge dominant. Some dominate due to performance, and some dominate because of ease of use of the API. The victors in this game then sometimes go through a merging process of their own, where the performance victors lend technology to ease of use API victors. After a lot of merging (and flames usually), one or two projects emerge out of the XML kingdom as the dominant players. In my opinion, in the world of Java these are Xalan (Xerces) and Dom4J.

    During the maturation process, Sun comes along and looks at the technology and says "Wow this XML stuff is really here to stay. What implementations are out there, and what similarities exist between them? How can we facilitate growth of these projects?" They realize that certain classes (like org.xml.sax.InputSource) are common entities in both projects (even if the class InputSource doesn't exist), and they standardize it. For a reference to all of the XML standards implemented in the JDK, do a search on java.sun.com for JAXP, JAXM, and JAXB (just to name a few).

    At this point, the XML projects come back and work in support so that they can be "JAXP compatible" (again this is part of the biological process of evolution). This insures that the projects works well with whatever Sun ships in the JDK.

    In the end (which is really where we are now) you end up with a pluggable architecture, where the JDK provides some common functionality or interfaces that are implemented by open source projects.

    Java XML parsing was damn hard back in the day - you had to marry your code to a specific project. But these days with the standardization that has taken place (thanks Sun!), as long as you write code that makes use of the JAXP specification you can plug in any JAXP-compliant parser into your app and things *should* work.

    The difficult problem is getting other entities (Application Servers for example) to get up-to-date with the standards. WebLogic 6.1 comes with a non-JAXP compliant parser, and thus doesn't work with the latest JDK, Xalan, etc.

    --
    Do it for da shorties
  55. Re:But XML is great for computers... by Smallpond · · Score: 2, Insightful


    Had you read the article, his point was that you shouldn't have to slorp in the whole file just to read one field. In fact, he's using perl and regexp to avoid having to do things like Doc.Load.

    The author claims that existing tools are oriented toward either converting to a big internal data structure, or to processing gradually using callbacks, neither of which is optimal for small fast code or simple programming.

  56. Not really a joke anymore by duck_prime · · Score: 2, Funny
    It's now official. C++ creator admits it was all a hoax!
    In a stunning move, C++ creator Stroustrup identifies the fine line between a ridiculous self-parody of over-engineering, and soul-destroying evil, and pole-vaults over it.

    Repeat after me:
    You don't overload whitespace.
    You don't overload whitespace.
    You don't overload whitespace.
    You don't overload whitespace.
  57. Re:The API is XPath by Ed+Avis · · Score: 2, Interesting

    But XPath, at least its implementation in current languages, takes a string as its path. If you specify an element which doesn't exist in the XML then this error will not be caught until run time. Whereas if the compiler knew about the grammar of the XML file it could tell you immediately 'there cannot be a element at this level' or 'no such attribute'. You could even hit Tab in your editor to see what the available subelements are at the current point in the tree.

    Also, knowing the grammar (DTD or XML Schema or whatever) of the XML will help generate more efficient code, better than an XPath implementation could be because the general XPath has to work with all possible XML files, not just those restricted to a certain grammar.

    It's like the difference between the putative code

    int x = a.b[6]->c["hello"];

    which is checked at compile time and compiles down into efficient code, and

    int x = tree_query("a/b 6/c 'hello");

    which walks some data structure at run time. It's better if the language can help you with the data structures.

    --
    -- Ed Avis ed@membled.com
  58. plagiarism? by pwarf · · Score: 2, Insightful

    I am not the author of the post you responded to, but I felt compelled to comment.

    Plagiarism, in the most commonly used sense, is taking credit for someone else's words or ideas. Since he posted as an anonymous coward, he is unable to take credit. Therefore, he didn't commit plagiarism in the usual sense.

    He deserves the lesser charge of failure to cite. As long as we are throwing out accusations, I would accuse you of libel http://dictionary.reference.com/search?q=libel
    , but since he's an AC, I can't claim that it damages his reputation. Hmm, never mind. :)

  59. Re:xml by cicho · · Score: 2, Insightful

    Moderators on crack, the parent is not a troll, he's just about right.

    Read any introductory article on XML, or the first chapter of a book - it's so plain and simple and inviting and looks like a great idea. By page 50 of the book you're crawling through a dense pile of industrial trash. A book on XML I bought lists over thirty classes in OpenXML implementation - over THIRTY classes, that's hundres of methods; do I want to to dig into this just to read and write a simple file of records? Where simple and robust alternatives exist? Hell, no.

    --
    "Only the small secrets need to be protected. The big ones are kept secret by public incredulity." - Marshall McLuhan