Slashdot Mirror


XML Co-Creator says XML Is Too Hard For Programmers

orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."

36 of 562 comments (clear)

  1. Really? by leecho · · Score: 3, Interesting

    Well, programming *is* a hard task, and simplifying it is about building layers and layers of better abstractions to machine code and binary data.

    Without XML, what would you normally do? Create a flat text file and read it using whatever syntax you'll like that day. I agree XML is ugly as hell to type in manually, but at least it's a standard, and every programming language in use today can handle it in a standard way - DOM, SAX, whatever.

    1. Re:Really? by Anonymous Coward · · Score: 2, Interesting

      While Lisp functions are overkill for a dataformat, lisp syntax (sexps) are not, and are more mature, simpler, and clearer than XML.

      Even for C programs, I tend to use Lisp sexps as my persistent file format.
      A simple Lisp parser is smaller and faster than an XML parser, for no loss of expressivity.

  2. It's about tools, libraries by Anonymous Coward · · Score: 5, Interesting

    Well, first he chose a bad tool (Perl regexp) for XML processing, and then complains about his tools being insufficient.

    Using Perl regexps to parse XML is silly, because there's too much variability (e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling. You can do it, of course, but it quickly gets messy.

    There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.

    1. Re:It's about tools, libraries by PigleT · · Score: 2, Interesting

      I agree that it's about tools and libraries. And this is what I think about them, too.

      At work, I brush up against XML occasionally, mostly for documentation or data-resultset purposes. In my own time, I use it in my photo gallery - result-sets from database queries get converted to XML and then spat out through XSLT in Sablotron, straight to web. For all the hoops it goes through, it's actually still quite nippy.

      However, I also dislike it intensly.

      I've written a blog-like system-news announcement board using a Ruby CGI against postgresql as a backend. I can pull back a result-set - a simple table-thing with each row being a text announcment, half a dozen fields (when posted, by whom, etc). And I wanted to output this in HTML form for the web, in plain-text to send to a user who wanted it via email every day, and in s-exp form for my own gratification.
      However, the first problem you run into is the formatting. A textarea in an HTML form gives no line-wrapping (wanted for plaintext output, but only in specific fields) and embeds ^M characters everywhere. When the output is HTML, those ^Ms want to become br tags. When the output is plaintext or sexp, they want to become \n. Simple, if ONLY there were a way of doing either elementary reformatting or search-n-replace in XSLT. There is, but s/// is about 10 lines' worth, if my googling is to be believed. That makes it non-optimal for one of its primary uses: making transformations on big blocks of text-based data, and it can't even edit within a node correctly? Pathetic.
      Why shouldn't I just write 3 output methods in my Ruby CGI script that take the result-set directly to text, HTML or sexp formats, with the power of ruby to do a #gsub("^M", "\n") on just the fields I want, in a tiny few extra characters of code?

      Now to tackle what you've said:

      > Using Perl regexps to parse XML is silly

      No, it's not. Perl regexps are a highly featureful, pre-existing, code.

      > e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling.

      These things are not a problem. You can easily match an attribute occurring, as it does, within a n opening-tag, and pull out both the name and the contents. Using that to set a variable of given name in your program - a highly important part, given that XML is a data-transfer format and it's the internal representation afterwards that is its whole raison-d'etre - is trivial. Thus, perl wins.
      Multi-line matching is explicitly catered-for in perl, with /m or /s on the end of the regexp.

      > There's a number of tools and libraries

      Indeed there are. And you know what? When I've got a small paragraph ( characters, I dunno.)
      In short, "programmed text" won the day for me.

      --
      ~Tim
      --
      .|` Clouds cross the black moonlight,
      Rushing on down to the circle of the turn
    2. Re:It's about tools, libraries by Sique · · Score: 5, Interesting

      No. It is not. It is about basic computer science.

      XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction. What he is complaining about is exactly this: Lots of parsing to get a simple datum.

      With regexp your parsing is much faster, because you can concentrate on substrings, you can parse them without using a stack, you can use them in stream context. But regexp are Regular Expressions (Chomsky Type 3 grammar), so they are in fact just a subset of XML and not able to parse XML completely.

      One of the links in the article points to another rant, where the author wants some regulations for a limited XML. Badly enough the ideas he is proposing are in fact context sensitive and such they are Chomsky Type 1 (context sensitive grammar) and a superset of XML instead of a simplified subset. Someone remembers the Early algorithm with something that can be described as a multi dimensional stack?

      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.

      --
      .sig: Sique *sigh*
    3. Re:It's about tools, libraries by Boiotos · · Score: 2, Interesting
      Shouldn't SAX-based tools *not* have to load the entire thing into memory?

      Bray's paper appears to express a strong preference for an XML that would work well with ?standard regex tools. In it he says, "If I use any of the perl+XML machinery, it wants me either to let it read the whole thing and build a structure in memory, or go to a callback interface." And then it adds that callback "is sufficiently non-idiomatic and awkward that I'd rather just live in regexp-land."

      This, in turn, seems to be based on an article linked to in Bray and advocating the same thing.

      It seems to me that to convince the larger world that this is necessary, some other options would have to be excluded. Aren't regexs of some sort going to be in v. 2 of XSLT? None of its successful implementations require loading the document into memory, and it nicely magics away the namespace kerfuffle that Gregorio's examples illustrate.

      What I took away from the article was considerable amazement that one of the markup luminaries uses such low-level tools to process XML.

    4. Re:It's about tools, libraries by Anonymous Coward · · Score: 1, Interesting
      XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction.


      Are you suggesting that to get to the 15005-th transaction in the file I have to fully parse all of the 15004 previous transactions? That could be true for some of the more complicated context-free grammars, but in XML you can easily discard a nonterminal without fully parsing it - just look for the matching closing symbol. This does not even require a stack, just a simple counter, in case your nonterminal is nested.
      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.
      Which theorem says that XML has to be memory intensive and cannot be fast? Observe, that theorems about context-free grammars only guarantee the existence of hard context free grammars, not general hardness of the grammars.
    5. Re:It's about tools, libraries by Sique · · Score: 4, Interesting

      No, I am suggesting, that in general you have to use a stack machine. Surely you can use degenerated trees instead of fully balanced trees to store your data. And a concatenation of elements is a regular expression (and a degenerated tree). But then you are already making assumptions about the data you get. But with such limiting assumptions you can easily streamline your code. But you are loosing the full power of XML on the way. And you need a grammar that makes sure you don't mix terminals and nonterminals.

      It starts out already if you are using escape characters to mark nonterminals and escape those characters with itself to mark them terminal. Those markings are still regular, but you loose already some speed ups. For instance \\ matches \\" and \\\", but one means just \ and the end of the string, and the other one means \" and the string continues. The only way to stay out of the mess is to make sure you are using an only left bound parser, first parse for all escape characters and then for the nonterminals, which makes your parser already a (local) 2-pass-parser.

      --
      .sig: Sique *sigh*
    6. Re:It's about tools, libraries by Ed+Avis · · Score: 2, Interesting

      There are two more methods: interfaces like SAX where you read individual tokens, and callback interfaces like Perl's XML::Twig where you can efficiently scan the whole file and only construct in-memory trees for the parts you're interested in.

      The best method might be a lazy programming language where you can say

      tree.a[4].b[6].contents

      and only when this expression is evaluated will the necessary bit of the tree be parsed.

      --
      -- Ed Avis ed@membled.com
    7. Re:It's about tools, libraries by protonman · · Score: 2, Interesting

      I know, but I thought you'd get that with a finite number of elements, you can't nest them infinitely... (I'm counting tags as "elements" here, a bit sloppy I admit).

      My point was that in *practical* XML you simply don't have stuff like [a][a][a][a]... ...[/a][/a][/a][/a].

      As long as you want to parse a FINITE number of terms, you can do that with regexps.

      If your example string with parentheses is the ONLY one you want to parse, I can do that (in sed/perl-like syntax) like this:

      \(a+b\*5-\(3\*\(7-4\)\)\)

      If you want to parse all algebraic terms like in your example with a length less than 5 (!) you can start with this...

      (\w|\d\)
      \((\w|\d\)\)

      (to get 9 and (0) and (a) i.e.)

      and

      \((\w|\d) [+*-\] (\w|\d)\)

      to get (9+b),(a*b) etc.. etc..

      I know, it's gonna be a LONG list, but since the number of possibilities is limited, it's not infinite! (and obviously, I can't use * on the parentheses!)

      A problem arises you want to be able to parse a string of arbitrary length with an arbitrary number of parentheses. That's of course impossible for reasons you stated. :-)

      But IN PRACTICE, the number of possibilities in your XML file is NOT arbitrary, it is fixed and predictable, so you can use regexps.

      I'm nitpicking, I know, but it still is CS. :-)

      --
      The man of knowledge must be able not only to love his enemies but also to hate his friends.
    8. Re:It's about tools, libraries by ajs · · Score: 2, Interesting
      Come Perl 6, of course, you'll have the best of both worlds:
      $data = STDIN.getlines().join('');
      if ($data =~ qr{ ^ (<xml>) $ }) {
      my XML $parsed = $1;
      if (my $n = $parsed.findnode('sometagiwant')) {
      print "Yep, it's there:\n$n\n";
      } else {
      print "Failed to find sometagiwant\n";
      }
      }
      And depending on what you want (memory vs speed) your "xml rule" in that regexp can do whatever annotation, datastructure building, etc that you want.
    9. Re:It's about tools, libraries by Anonymous._.Coward · · Score: 2, Interesting

      There's more than SAX and DOM out there. What about data binding tools? Generate some classes from your DTD/schema, call bind(xmlFile) and you've got objects to work with.

      There are even partial matching binding architectures. The best one I've seen is SNAQue.

      --

      take a triptonica to subthunk

  3. He is right, I think. by expro · · Score: 3, Interesting

    Among other things ...

    (1) They need to eliminate the doctype can of worms. Unfortunately, this cries out for an alternative solution for character entities.

    (2) Namespaces need to be simplified and better integrated into the core of the language. Expanding on this, there need to be much better mechanisms for modularizing parts of the markup so that it isn't necessary to parse and hold everything in memory to make sense of it.

    (3) There needs to be clean-up and standardization of element id's and references, integrating it with (1) and (2).

    Do others have more? Should this be done compatibly with XML?

    I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

  4. XML is good by Ender+Ryan · · Score: 4, Interesting
    I don't understand why so many people complain about XML so much. It's really quite useful for storing arbitrary data. We have several hundred thousand text-based documents where I work, and it has been a total nightmare, until I converted the whole thing(well, I'm not done yet...) to XML.

    The documents are generally displayed as HTML on the web, but they're also read by a couple different programs for different purposes. When I first started here, it was mostly a mess of poorly hand-written HTML, but thankfully there were *only* about 20k documents at the time.

    I was charged with the task of writing said programs to read these damn files. Unfortuneately, they weren't all marked up the same...

    Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.

    Yay for XML! :)

    So, to sum up, XML is doing what it was meant to do, no less. Unfortuneately, it's also probably doing a bit more as well, XSL anyone? Yeck, why not just have a stand XML scripting language, why the need for the language to be valid XML itself?

    --
    Sticking feathers up your butt does not make you a chicken - Tyler Durden
  5. JAXB by Hellvetica · · Score: 1, Interesting

    Java Programmers: Take a look at the Java Architecture for XML Binding (JAXB), available in the Java Web Services Developer Pack V 1.1 (see article here). From my basic understanding of it, it "binds" XML to a set of Java content classes, saving you the time and effort of traversing a DOM tree or dealing with SAX. I have yet to use it, but it looks perfect for my application, which uses an XML-based configuration file.

    Actually, I'd be interested if anybody here has used this yet? Is it ready for prime time?

  6. Yea can be hard by Anonymous Coward · · Score: 1, Interesting

    Writing an XML document is easy. I looked at a sample document and was able to produce xml documents without reading any books on the subject.

    Parsing is another issue. Last night I spent some time parsing XML data in perl that was being retrieved from a daemon I wrote in C. producing the XML output was easy. Parsing it in perl was hard. I think maybe the author is talking about the lack of really good, easy to use libraries (abstactions) for parsing XML data. I'm a bliever that a a lot of work in the backend produces ease of use in the front end. In other words, I'd like to parse XML data with ease in just a few lines of code in the application. All the work will be done in the library. XML::Parser proves that this is just not the case.

  7. XML: bad implementation of a good idea by g4dget · · Score: 4, Interesting
    I have to agree that XML has serious problems.

    Now, I have to say: a universal syntax for tree-structured data is very useful: experience since the 1970s with one such universal syntax, Lisp, has shown that. It is unfortunate that XML is about the worst imaginable implementation of that idea. XML combines being a nuisance to type with having comparatively complex semantics and lots of redundant features.

    What is ironic is that the same "real world programmers" who wax ecstatic about XML also condemn Lisp as too complicated and too difficult to read. The universal syntax that XML aspires to, Lisp syntax delivered many decades ago. It's just that prejudice and ignorance caused people to re-invent the wheel (and in square form, too) in the form of XML.

    I am pretty torn between whether XML is a blessing or a curse. We really need something like it, but XML is so bad that it may not even live up to the level of "poorly designed industry standard but better than nothing".

  8. Still good for some things by krygny · · Score: 2, Interesting

    The hype and promise of XML has gone too far. It's a boon for document type data. Semantic content like documentation, on-line content, even spreadsheets and email. (e.g., why isn't there a standard address book format based on XML that any application on any platform can use interchangeably?)

    But using XML to build relational databases is slipping a round peg into a square hole. You need something to putty the corners.

    --
    Research shows that 67% of those who use the term "research shows", are just making shit up.
  9. Oh please! by gwappo · · Score: 5, Interesting
    It's annoying when posters get presumptious. The people complaining in the article are by all means elite programmers, proclaiming xml is okay because "programming *is* a hard task" is non-sense and in the same league as "HLL's are for wussies, real men code in assembly" and other crap.

    The criticism on XML is accurate, correct, valid, if only for the simple reason that the code needed to interface with the libraries is 90% plumbing-work and 10% business-solution. That 90% plumbing-work leaves oppertunity for _a lot of bugs_ to be created and for any solution using XML to become a resource-hog.

    Having a standard interchange format like XML is a fun-thing, and "good", as it allows standardized processing of these formats. However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

    1. Re:Oh please! by jallen02 · · Score: 3, Interesting

      Isn't interfacing with a library by definition "plumbing" though?

      I did find the SAX API (In Java) a little tedious to work with for maybe a few days, but after I got used to the idiom it was pretty straight-forward. The interfacing with the library was not really a lot of "extra" code. Most of my SAX parsing code spends it's time in a content handler firing of events based on XML it is processing.

      I still cleanly separated the XML interfacing from the server. Once the plumbing is set up, my server doesn't even have to know it is there for the most part. And I rarely have to deal with the interfacing to the library after the initial separation. I either go below the parser level via filter
      streams or above it, but the XML parser just does it's job.

      It is a tough question to answer, but doesn't having a certain level of configurability necessitate some level of compexity? I think C# does a decent job at keeping the XML processing more simple while still giving the configurability, but to tap into that configurability there is still complexity involved. I think that the problem is easy to identify and the solution will take many more brain cycles to find :)

      Jeremy

  10. Hahahah finallly something I know a lot about. by BeerSlurpy · · Score: 4, Interesting

    We use XML heavily in a project I'm working on at my company. Some genius decided that everything should be in xml, and that we would use XSLT for a lot of the data manipulation. Naturally we also make heavy use of DTD and SAX. Lots of XML related technologies.

    I can tell you now that XML is a Bad Thing. It strives to excel at too many things at once, and becomes inefficient and complex as a result.

    XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with. Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form. XML document tree traversal = 10000x more complex than getting column data out of a ResultSet... Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

    The real problem is that XML only partly addresses the problems that relational database solved years ago (organizing and data accessable), but it does it without any of the efficiency benefits of a well designed database server. In my opinion, 90+ percent of the places where XML is being used today would be better served by using columns in a relational database table to store object fields. You get indexing, you get universal, simple and efficient searching, and you get speed.

    XML has too many faults to really list in one short post. The truth of the matter is that it tries to do too many things and DOESNT DO ANYTHING WELL. Sort of like if someone tries to be skilled in all musical instruments but ends up being, at best, mediocre in a few of them.

  11. Huh? by Sparky69 · · Score: 2, Interesting

    What was his argument again? Reading the whole thing into memory is too slow? Ok, agreed, hence SAX. When you're a perl programmer everthing is a regular expression. Look Perl was the first language I learned. I'm all for perl it's wonderful, poetic and fun. And it handles XML perfectly. Are you telling me that using relational databases is easier than XML? That you can just sit down and start doing it without reading some books or at least a couple online tutorials? That's nonsense. The benefits of XML outweigh it's shortcomings IMHO. Especially Schema validation. I love knowing the fact that I don't have to rewrite the same goddamn code to make sure my input is sane! I make a schema for it and voila. Yes the schema spec is big. But have you read the full SQL spec? Of course not. You use a nice little subset and get your work done. Same with the schema spec. I use about 4 tags for 90% of the documents I need to create. So let's summarize XML in a couple rules (there is one caveat, see below): 1. Every element is in between angle brackets 2. Close every tag you open in the reverse order (like a stack but this is far too complicated a subject for people programming, there are NO stacks in computers....right). Does anyone force you to use XML? Of course not. That's a weak argument but it's true. XML gives you the choice to not reinvent a structured data format. I'm not a programming guru by anyone's hallucination. I've been working with XML for a while now (3 years) and it's been terrific. Yes you have to learn some stuff and yes some of the API's are a bit terse but show me something that isn't. What I've come to realize is that if you want to move forward you do have to change. Programmers bitch and whine about how end users don't want to change their UI. Well this sounds like programmars that don't want to move their brains a little and stop seeing things as regular expressions and start seeing them as XML. Stop trying to reinvent the wheel everytime you need to parse a document and move up an abstraction. And it strikes me as odd that one of the cocreators doesn't seem to "get it". The whole point of making a standardized format is so that you can abstract the parsing, transformation and validation functionality. Just my 2 cents CAD. Andrew

  12. Re:xml by FireAtWill · · Score: 2, Interesting

    I've been working on EDI applications for many years now. I view XML as another attempt to solve the same problem as the ANSI X12 standards. The problem is, 'that problem' was never *the* problem.

    In the old days (in my industry), there was a COBOL oriented file structure called the National Standard Format (NSF). It was typically documented as a set of maybe 10-20 hierarchical record formats. The mechanics for reading the files were immediately obvious. The problem was understanding what needed to be done with the data. Of course, there was often a need for a new data element and it got shoved into some filler field, resulting in the National Standard Format becoming the Nearly Similar Format.

    To resolve this issue, the industry jumped on the ANSI X12 bandwagon. ANSI X12, like XML offered a flexible, platform-independent standard for representing hierarchical data structures.

    Platform-independent means that it's equally difficult to use on all platforms. The 10 pages or so of NSF COBOL record layouts were replaced by a couple of binders worth of standards. One for X12 and one containing the various industry-specific transaction sets. Expensive tools emerged to read the new files and cram them back into the familiar and more workable structures.

    'Flexible standards' turned out to be an oxymoron. There are so many options that it is extremely difficult to anticipate what sort of odd interpretations you'll be forced to deal with. And deal with them we must, because the Feds have mandated the way in which we must exchange data (HIPAA).

    And still we find ourselves needing extra pieces of data for specific trading partners that we put into places that are beyond the standard.

    I'd rather use XML than ANSI X12, but I'd rather not use either. They add much complexity and infernal flexibility in order to 'solve' what used a trivial task - agreeing on a data format.

    If we want something truely useful, we'd forget about markup languages and specify an open database format similar to Access that actually has value beyond the narrow problem being addressed.

  13. Re:But XML is great for computers... by Anonymous Coward · · Score: 4, Interesting

    Right, so instead of using one regexp for /etc/hosts and another regexp for /etc/passwd, I'd have to use ten pages of getTheGodDOMObjectFromTheGodDOMXMLFile crap for /etc/hosts.xml and another ten pages for /etc/passwd.xml.

    How, exactly, has XML simplified *anything*?

  14. Stay on topic - problem isn't XML standard by cdthompso1 · · Score: 5, Interesting
    Tim Bray's article, if you didn't read it, is right on the money. The last paragraph basically states that XML is the best alternative to the data interchange problem because it provides a consistent format. Some of you guys who are rounding up the mob and lighting buildings on fire calling for book burnings and the downfall of all XML have to read the article! You're not in agreement with Tim when you say, "Sure, I think XML sucks, too."

    So to be clear, XML is here to stay. (An example of XML penetration: there is a working schema for using XML in the farming industry!) Just imagine the chaos that will insue once MS Office saves all documents in true XML.

    My take on the problem Tim's really talking about: inconsistency and the proliferation of people who want to be the next prodigy in their area of expertise. There are so many parsers and interfaces, even within a language domain, because vendors want to put their own spin on everything. The alphabet soup that results confuses the hell out of people. This has even happened in the open source world, where I can do a Google search on "php xml parsing" and read articles on no less than 10 different approaches. For the average guy who has been told by a project manager, "We need to take these XML files from our business partner, extract and store the data in our database," you need a standard approach. Not to stifle thought and innovation, yes, you should take the initiative to understand whether an event-driven approach (SAX parser) or an in-memory object model approach (DOM parser) is right for the job. After all, you do get paid to do this, so earn your keep! But the XML community hasn't done a good job of specifying best practices and leading people by the nose to a solution. Every XML book I've seen furthers the confusion, with each other offering his opinion with a slight variation of how to do things, leading programmers/scripters/whatevers to use the approach they most recently read about, and not necessarily the one that time has proven out to be the most efficient.

    Part of this is the divide between the .Net guys, the Java camp, the Perl/PHP folks, etc., but in the spirit of interoperability, maybe the XML promoters just need to dumb things down a bit to get some simple concepts and best practices into the hands of Joe Sixpack Programmer. Maybe a central authority, a la java.sun.com or php.net?

  15. Re:Too hard? by Pxtl · · Score: 2, Interesting

    Whoever modded this troll is a jingoistic zealot. The poster is just saying that VB, for all its faults, is good for database RAD. Which many people would agree with.

  16. Re:But XML is great for computers... by Ed+Avis · · Score: 2, Interesting

    (Replying to AC post, please mod it up if you can.)

    I admit that interfaces like DOM are rather clunky. But your regexps would break if a new field were added to /etc/passwd, or probably even if the format were changed to allow comments. So files like /etc/passwd become fossilized over time.

    The answer is a better interface for reading XML files, one that knows about the format (which is described in a DTD or other grammar) and can present a neat interface like

    passwd.user["abc01"].real_name

    (or whatever the syntax of your preferred language looks like). DOM is so awkward because it knows nothing about whether a element would be present, or whether there might be more than one of them, or whether whitespace before and after the element is significant, so it has to provide an API to explicitly wade through all that just in case you want it. A tool like FleXML which knows that must appear exactly once and in a particular place can put it into a single field.

    (Actually FleXML isn't ideal for this example because the parsing code it generates will stop working when the file format is extended, if new elements started appearing inside . But if you made the generated code only a little bit slower it could skip over these extensions to the file format, so existing apps would continue to work when new things were added to the DTD.)

    The answer I think is for programming languages which better support XML, which can read a document and put it into the language's native data structures. Libraries like Perl's XML::Simple try to do this, but they do so without any knowledge of what the legal documents are, so the resulting interface is still rather awkward.

    --
    -- Ed Avis ed@membled.com
  17. Re:Too hard? by kryonD · · Score: 1, Interesting

    don't knock VB until you need to code a quick dbaccess (or other simple) app in a couple of days for internal use.

    Maybe if you're a beginning programmer. My shop codes exclusively in C and I can even create rather complex apps in a few days because:

    #1 I know what I'm doing, and..

    #2 It's called libraries....be it STL, MFC, MyStack.h or whatever. Code re-use is the key to rapid and robust application development.

    And my code is platform independant and usually weighs in at less than 100K (for a simple DB app). Web-based,Web-based,Web-based...can I make it any clearer? I'm talking about real-world, mission-critical data applications where bandwidth is paid for and no one gives a fsck if the button turns neon pink and spins in a circle when you mouse over it.

    VB has its place in small businesses and first year programming courses where its not a big deal if the code is messy, non-portable, slow and bloated. If your company is paying full time salaried VB programmers who have no other skills, start familiarizing yourself with the procedures involved in signing up for unemployment. You company is eventually going to grow to a point where VB totally fails and you find that your job was the one cut in order to dish out the money for someone else's software that actually works. Either that or your company just goes tits up. Dot Com anyone?

    --
    I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
  18. Re:I agree, of course... by Twylite · · Score: 2, Interesting

    Shameless self-plug, but I have a critique of XML's failure to meet its goals on my home page. You may find it interesting.

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
  19. Re:But XML is great for computers... by Dalroth · · Score: 2, Interesting

    In C# at least:

    XmlDocument Doc = new XmlDocument();
    Doc.Load("/etc/passwd.xml");
    string Password = Doc.SelectSingleNode("/users/user[@name='dalroth'] /@password").Value;
    Really doesn't seem that difficult to me. Bryan
  20. Java XML Parsing by SurfTheWorld · · Score: 3, Interesting

    Let's decompose the XML parsing "problem" (if one actually exists) into smaller components that we can reasonably discuss. XML parsing is too broad a topic to intelligently discuss, but if you limit it to XML parsing in Java you suddenly have a topic small enough to be manageable. So let's discuss Java parsing in XML.

    When XML was first introduced, there were no standard libraries in the JDK to facilitate parsing. What's more, the few projects out there varied wildly in how you actually used their DOM tree or SAX callback mechanism. This isn't necessarily a Bad Thing (tm), it's the same problem every emerging technology faces: immature tools. This is basic biology - lots of competing implementations (life forms), each struggling for community (resources).

    So, time goes by, and eventually a handful of implementations emerge dominant. Some dominate due to performance, and some dominate because of ease of use of the API. The victors in this game then sometimes go through a merging process of their own, where the performance victors lend technology to ease of use API victors. After a lot of merging (and flames usually), one or two projects emerge out of the XML kingdom as the dominant players. In my opinion, in the world of Java these are Xalan (Xerces) and Dom4J.

    During the maturation process, Sun comes along and looks at the technology and says "Wow this XML stuff is really here to stay. What implementations are out there, and what similarities exist between them? How can we facilitate growth of these projects?" They realize that certain classes (like org.xml.sax.InputSource) are common entities in both projects (even if the class InputSource doesn't exist), and they standardize it. For a reference to all of the XML standards implemented in the JDK, do a search on java.sun.com for JAXP, JAXM, and JAXB (just to name a few).

    At this point, the XML projects come back and work in support so that they can be "JAXP compatible" (again this is part of the biological process of evolution). This insures that the projects works well with whatever Sun ships in the JDK.

    In the end (which is really where we are now) you end up with a pluggable architecture, where the JDK provides some common functionality or interfaces that are implemented by open source projects.

    Java XML parsing was damn hard back in the day - you had to marry your code to a specific project. But these days with the standardization that has taken place (thanks Sun!), as long as you write code that makes use of the JAXP specification you can plug in any JAXP-compliant parser into your app and things *should* work.

    The difficult problem is getting other entities (Application Servers for example) to get up-to-date with the standards. WebLogic 6.1 comes with a non-JAXP compliant parser, and thus doesn't work with the latest JDK, Xalan, etc.

    --
    Do it for da shorties
  21. Re:A good point by jilles · · Score: 2, Interesting

    Assuming that these data streams have something in common you'd probably spend a week or so developing a generic, maintainable solution using e.g. SAX and reuse that in each particular case. The adhoc solution of using regular expressions probably saves you time on the short term, but on the long term you'll probably keep reinventing the wheel.

    However, this is all beside the point since we've now established that there's nothing wrong with XML but that it's just the tools to manipulate it which are still lacking in certain ways. I'd be the first to agree that the SAX and DOM APIs are a bit overkill for some situations. However, concluding from that that XML is not a good solution goes too far IMHO.

    --

    Jilles
  22. Re:The API is XPath by Ed+Avis · · Score: 2, Interesting

    But XPath, at least its implementation in current languages, takes a string as its path. If you specify an element which doesn't exist in the XML then this error will not be caught until run time. Whereas if the compiler knew about the grammar of the XML file it could tell you immediately 'there cannot be a element at this level' or 'no such attribute'. You could even hit Tab in your editor to see what the available subelements are at the current point in the tree.

    Also, knowing the grammar (DTD or XML Schema or whatever) of the XML will help generate more efficient code, better than an XPath implementation could be because the general XPath has to work with all possible XML files, not just those restricted to a certain grammar.

    It's like the difference between the putative code

    int x = a.b[6]->c["hello"];

    which is checked at compile time and compiles down into efficient code, and

    int x = tree_query("a/b 6/c 'hello");

    which walks some data structure at run time. It's better if the language can help you with the data structures.

    --
    -- Ed Avis ed@membled.com
  23. Re:Too hard? by whereiswaldo · · Score: 2, Interesting


    This is the lamest story I've ever heard on Slashdot. I almost left for good after reading this. If the next week's worth of news doesn't get any less lame, I probably will.

    Slashdot, don't be fucking lame. This is news for *nerds*, not for simps and wannabees. XML too hard? Then you shouldn't be a programmer cause that's about as easy as it gets unless you're just a hobbyist.

  24. Re:Too hard? by kryonD · · Score: 2, Interesting

    Obviously all you know is C. It must be some kind of "geek pride" thing.

    I've been programming for 16 years...here is a short list of the languages I have used in real-world (i.e. I got paid) applications:

    C, C++, COBOL, VB (eventually rewritten in C when it hit the scalability wall), Intel x86 ASM, Motorolla 6809 ASM, and Motorolla 6502 ASM.

    The list of languages I have worked with either in private, or an academic setting is quite large and are not listed above because I either wouldn't trust them for real work, or my employer wouldn't trust them.

    ADO and OLEDB...Oracle

    Proprietary. Proprietary. Proprietary, but at least somewhat portable; however, waaayyy too expensive unless you are dealing with massive amounts of data/users or are coding for government/businesses that require namebrand stuff.

    some people ... write virii ... the REST ... write groupware.

    This is true. However, I have yet to run into anything that I couldn't replicate in C/C++ using RFC standards. Some of the more nifty features of Exchange would need some reverse engineering, but I've never had the need to provide them.

    why the hell are you still writing in C? I thought Perl, Java, and PHP4 were the gold standard for web apps... Aren't you afraid of buffer overruns??? Lord knows half the system calls in C are vulnerable...

    Don't get me started on the gross mis-management job Sun has done on JAVA. It has never lived up to Sun's promise of being platform independant. Security is another problem depending on whether you are talking about client side, or server side. What happens if you have a customer whose security policy disables JAVA on the browsers? For server side, I challenge you to name something you can do in JAVA that you can't do just as easily in C/C++. The language has its advantages, but most of them can be reproduced in other languages with minimal effort.

    Perl and PHP are very nice for simple straight forward page production. However, I code for US DOD and the security issues with both of those as well as a general distrust of anything open source has prevented their use on a general basis. I have seen some stuff done for DOD in those languages, but it was either in violation of policy, or contracted out and not on a .mil server. Additionally, they are interpreted languages. If you need to pull 4 million items into memory, consolidate the duplicates, calculate usage stats over multiple time periods, then filter out those that don't meet a usage to property hit list, Scripted languages are either way too slow, or simply incapable of doing that kind of complex filtering on a large quantity of data. The above process can be done in about 400 lines of C code, most of which is copy and pasted loops and if statements and it's fast.

    Buffer overruns are easy....don't rely on the server to feed your script data. Write the code to pull the data from the server and set a cutoff limit where extra data is ignored. Write a simple filter command to break attempts at embedding malicious SQL commands in data and your done. You can do this in any language, but yet you still occasionally see AIVAs about buffer overflow vulnerabilies in everything under the sun.

    System calls? Don't know what to tell you there. Been coding web based stuff for two years in C and never had to make one. Or are you referring to anything that handles I/O as a system call? If so, read your input one character at a time and COUNT them...stop when you hit your buffer's pre-defined limit. If you do hit a limit, have the app make a log entry. Either your code has failed to expect a wierd user need that requires sending large amounts of data, or someone is trying to attack your script....the latter is far more likely. I'd rather have a random user complaint once in a blue moon for lack of flexibility, than all my users pissed because someone rooted the box and defaced the web site.

    --
    I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
  25. Re:Too hard? by Anonymous Coward · · Score: 1, Interesting

    If you think SOAP is nice for document transfer you should check out HTTP. It's great. And most firewalls let it through. You can use HTTPS to encrypt, too!