Slashdot Mirror


Effective XML

James Edward Gray II writes "I'm not an XML junkie and I thought this was a very good book, so I'm betting that XML aficionados will love it. Effective XML covers 50 best practices that all developers should know and use. This amounts to a book of distilled wisdom that will push you a good distance up the chart of XML mastery." Read on for the rest of Gray's review. Effective XML author Elliotte Rusty Harold pages 304 publisher Addison-Wesley rating 8 reviewer James Edward Gray II ISBN 0321150406 summary A guide to the correct use of XML.

Before I tell you what's inside though, let me tell you what you won't find in these pages. Primarily you need to know that this book does not teach XML. I know a lot of books say that, yet still include an introduction or appendix that covers the basics, but this isn't one of them. You're expected to know XML from page one. Even syntax is only covered from a proper usage angle. Personally, I appreciated this. It always bothers me when an obvious non-beginner's book starts off by wasting a chapter on things I should already know. You just need to be aware when you buy that you won't learn XML here. Knowledge of namespaces, DTDs, the W3C's Schema Language, XSLT, and more aren't strictly required to get something out of this book, but they certainly would help you get a lot more out of it.

What you will get here is coverage of fifty miscellaneous topics spread across four sections on "Syntax", "Structure", "Semantics", and "Implementation". In "Syntax", ten topics delve into the details of things like DTDs, entity references and the XML declaration itself. It may sound silly to dig deep into a single line of XML that simply declares the format, but I doubt you will think so after reading that topic. There's a lot going on in that line and you want to be in control of those decisions instead of just copying and pasting. Entity references are an even smaller chunk of XML output, but they too get illuminated by a rare insight on how and when they should be used, and for what. Did you know that it is possible to write a namespace savvy DTD? I do now and I learned that in this section as well.

The second section of the book covers "Structure", and to me it was the best part. This collection of seventeen topics is loaded with good advice about how to build an XML document that will be ideal for anyone who needs to work with it. Here you see how metadata should be stored in XML, get tips on embedding binary content, learn which schema language is better for which tasks, and finally understand rare XML constructs like processing instructions and exactly what they are for. Additionally, there's a lot of general advice on the right way to mark up content that's really worth its weight in gold. Just one example of what I learned here is that I under appreciate mixed content for great constructs like <name><given>John</given> <family>Doe</family>, <title>Ph.D.</title></name>. If you like that, you'll enjoy this whole section.

Section three, "Semantics", deals primarily with parsers and their APIs. Again, you won't learn any APIs here. What's covered is their strengths and weaknesses and why you should choose a given API for a given task. SAX and DOM are the main focus of these ten topics, but there are other details sprinkled in, like XPath.

The fourth and final section is all about "Implementation". The thirteen topics here address client-side XML styling, server-side transformations, signatures, encryption, compression, and more. My favorite topic here was a terrific coverage of Unicode and how it affects XML. All developers should know at least as much about Unicode as what's printed here and this is a fine source to learn it from.

One thing that really stands out in the whole text is that the author isn't afraid to cover the dark side of XML. He will tell you where the design process was less than perfect, which tools have little practical value, and some of the problems with where XML technologies are headed. This isn't complaining though. All of this is targeted at how it affects XML developers today. You learn what you can safely skip and what should be outright avoided. The author even tells you what XML is bad at and gives you advice about when you shouldn't use it. That's the mark of a man who knows his subject, if you ask me.

All told, I think the author failed to completely convince me his way is perfect on only 2 topics. That means I learned 48 expert XML tricks. Surely that's worth the cost of the book in time and money. This isn't the first XML book you need, but I think it is the second XML book everyone should read.

You can purchase Effective XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

269 comments

  1. Binding by 2.7182 · · Score: 3, Funny

    I love the book, but once it encountered a humid day the binding fell apart. Anyone else have this experiance ?

    1. Re:Binding by dauthur · · Score: 0, Redundant

      No, but similarly with another book, the staples disintegrated.

    2. Re:Binding by elid · · Score: 1

      I'm wondering why so many bulky computer books are softcover...

    3. Re:Binding by Anonymous Coward · · Score: 1, Funny

      Good book. I found that when I spilled milk all over the book the pages stuck togethor. But still, the content is good.

    4. Re:Binding by Monkelectric · · Score: 2, Funny

      I once tried to "dry" a book that was rain soaked in the oven at very low temp ... my father turned the oven up to preheat some food ... physics book al larange

      --

      Religion is a gateway psychosis. -- Dave Foley

    5. Re:Binding by xaqar · · Score: 5, Funny
      physics book al larange

      Surely you mean physics book a lagrange ...
    6. Re:Binding by TripMaster+Monkey · · Score: 1
      Surely you mean physics book a lagrange


      I guess putting the book into a Trojan point would dry it out in a hurry...

      --
      ____

      ~ |rip/\/\aster /\/\onkey

    7. Re:Binding by Anonymous Coward · · Score: 0

      Just because someone prefers a book in a good state doesn't mean that they are lazy or dumb. I understand your point, which I believe, is that the goal is to get the job done [i.e. write some code] and it shouldn't matter what the condition of the book it, but you should expect a publisher to at least produce a book that stays in one piece. And btw, americans are as fond of knowledge as anyone!

      Peace.

    8. Re:Binding by Dasch · · Score: 1, Funny

      In France only old people read books...

    9. Re:Binding by Anonymous Coward · · Score: 0

      physics book al larange

      Is that anything like Little Mookey Al Larange? Seriously, I could see how it would get burned, but why did your father put orange sauce on it?

    10. Re:Binding by Anonymous Coward · · Score: 0

      I thought it smelled like something died in this thread.

    11. Re:Binding by Anonymous Coward · · Score: 0

      I smell a user .sig troll coming with parent post

    12. Re:Binding by Anonymous Coward · · Score: 0

      What is a .sig troll ?

    13. Re:Binding by Anonymous Coward · · Score: 0

      I found that when I spilled milk all over the book the pages stuck togethor.

      somehow I doubt that was milk. You're a freak AND a pervert

    14. Re:Binding by Anonymous Coward · · Score: 0

      get modded +5 and change the .sig to something offensive/trollish. slashcode introduced the -- separator to make these easier to spot, if the troll made the .sig look like part of the body of the message. Try it some time. It's fun.

    15. Re:Binding by Anonymous Coward · · Score: 0

      Then why are you all so fucking stupid?

      I guess caring about something and acquiring it are two different things.

      Ie; I care about Ferrari's but I don't have any.

    16. Re:Binding by LordoftheWoods · · Score: 1

      Well, for one thing, computer books need not last very long because the content quickly becomes obsolete. It's not like a history textbook, as nothing is set in stone in fast-paced tech fields. Publishers want you to buy the newer editions, so they often publish only in soft-cover. Also, who needs them to be hard-cover? They're expensive enough as it is and they usually don't need to be re-read many times.

  2. hmmm by elid · · Score: 1, Interesting
    All told, I think the author failed to completely convince me his way is perfect on only 2 topics.

    Any ideas what those 2 are?

    1. Re:hmmm by johannesg · · Score: 1
      All told, I think the author failed to completely convince me his way is perfect on only 2 topics.

      Any ideas what those 2 are?

      1. XML is a good idea.

      2. XML is an efficient format for wire protocols, internal program messages, and databases.

      Actually I'm just kidding; there are definitely places where it has a purpose. Although I will probably never get why a closing tag requires a repeat of the file opening tag name...

    2. Re:hmmm by computational+super · · Score: 2, Informative
      Although I will probably never get why a closing tag requires a repeat of the file opening tag name

      Not sure if you were serious here or not, but this is necessary to disambiguate the following improperly formed XML:

      <start> Now is the time for all good men to come to the aid of their <noun>country</noun></phrase>

      which is either missing a "phrase" start tag or mixed up the start & end tags... in a long XML document, the parser can give you a better hint where to look for the error.

      Or you were kidding and I missed the joke, in which case I'm about to be called all sorts of impolite things... (I might even be referred to as Sean Penn).

      --
      Proud neuron in the Slashdot hivemind since 2002.
    3. Re:hmmm by elharo · · Score: 5, Insightful

      Ever try to debug deeply nested LISP in a plain vanilla text editor? Ever try to find exactly which closing parenthesis is missing where? That's why end-tags have names. It's pure human factors. Computers don't care about this. People do.

      SGML (XML's precursor) did have minimized end-tags like . Experience proved this caused more pain than it alleviated. Hence the lack of minimized end-tags in XML.

    4. Re:hmmm by Anonymous Coward · · Score: 0, Offtopic

      I haven't, but I imagine it's exactly the same problem you'd have doing Java in a "plain vanilla text editor" -- only in Lisp the closing parens are all ), where in Java it's a combination of ) ] and }. I *have* tried to debug Java in NOTEPAD.EXE, and all the different types of close-paren *do* screw me up.

      I don't think the moral of this story is "don't use Lisp/SGML", or even "make closing tags really verbose". I think the moral of the story is either:
      - don't use a "plain vanilla text editor" for debugging "deeply nested" code -- use a real editor -- or:
      - use a languages like Python and YAML that don't *need* closing parens in the first place.

      Since this is slashdot, I now expect several people who have never written a line of Python in their lives to respond, saying how Python's use of indentation is brain-dead. *Yawn*.

      (Oh, and the parent was probably trying to say "minimized end-tags like <tag/text here/", but forgot to escape his <.)

    5. Re:hmmm by JamesOfTheDesert · · Score: 1
      - or: - use a languages like Python and YAML that don't *need* closing parens in the first place.

      Ever try finding the end of a large chunk of YAML text? Ever try to verify that the indentation is correct?

      If you add or omit a closing tag or paren, you get an invalid state and the software tells you. But, with indentation-sensitive formats, if you add or remove spaces, you can get another technicallly valid state, but not the state you want, and there may be no way to (easily|reliably) check for this.

      --

      Java is the blue pill
      Choose the red pill
  3. Bah by Anonymous Coward · · Score: 0, Funny

    xml is just html with a waffle iron attached to it.

    1. Re:Bah by Anonymous Coward · · Score: 4, Funny

      The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well. - Phil Wadler

      XML is not the end of our problems, it is the beginning of our problems. - ditto

      Shortly after the release of XML, some folks, including some very important folks in W3C and its members, who had been big supporters of XML, actually got around to reading the spec, and discovered to their horror that they had an XML which included entities, DTDs, PIs, and assorted other baggage. - Tim Bray

      When XMI came out, I had just been studying up on UML, and I thought "Cool! I'll print out the DTD so that I can look it over on the subway ride home!" When I saw how big the XMI DTD was, I decided not to print it out--I prefer not to spend that much time in the subway. - Robert DuCharme

      XML was monocase until quite late in its design, when we ran across this ugliness. I had a Java-language processor called Lark - the world's first - and when XML went case-sensitive, I got a factor of three performance improvement, it was all being spent in toLowerCase().- Tim Bray

      XML-based technologies seem particularly susceptible to the "if we standardize it, everyone will use it" fallacy. - Simon St. Laurent

    2. Re:Bah by Anonymous Coward · · Score: 0
      XML was monocase until quite late in its design, when we ran across this ugliness. I had a Java-language processor called Lark - the world's first - and when XML went case-sensitive, I got a factor of three performance improvement, it was all being spent in toLowerCase().- Tim Bray

      Not that surpising - seeing XML uses unicode. Making unicode case-insensitive is hard. I do hope we get a better "universal" serialisation format soon, though. YAML looks good.

    3. Re:Bah by Anonymous Coward · · Score: 0

      One of your quotes is not like the others. One of your quotes just isn't the same.

    4. Re:Bah by Anonymous Coward · · Score: 0

      quotes from three of xml's biggest jerks hardly counts as worth anything.

  4. The Problem With XML by osewa77 · · Score: 5, Interesting

    Is that it's not a very machine-friendly language (more wordy than it ought to be; parsing of tags is not very efficient) and it's not a very human-friendly language (the human style is free-style, really). I don't think it's a very good universal data description language. sorry that I had to go on a bit of a tangent...

    1. Re:The Problem With XML by elid · · Score: 1

      What is inefficient about machine-parsing of XML (not a troll, I'm really curious)? It seems to me to be a logical structure for information.

    2. Re:The Problem With XML by Further82 · · Score: 0

      sorry to disagree but its an extremly human-friendly data description language. Free-style is only good for us humans when you are talking about written languages. For representing discrete data, XML is the best out there. Now having said that, you still have to have well designed XML in order for it to be easily read and understood by a human (and by virtue easily interperted by a program written by a human). While its not very fast for machines to parse and it is wordy, I'd glady give up speed and tersness for easily understandable and parseable data, and that my friend is the whole point.

    3. Re:The Problem With XML by Further82 · · Score: 1, Insightful

      it is inefficiant machine wise because string parsing is an extremly slow and computationaly expensive operation. While perl and friends make it seem easy to the programer, the machine is still truging through the text one character at a time. Try writing an XML parser from scratch in C (no std:string) and see how difficult it is.

    4. Re:The Problem With XML by trewornan · · Score: 1

      You can read an XML data file (eg OpenOffice files) and understand immediately how the data is arranged/formatted, how to extract bits you're interested in, etc. That's a powerful advantage even if it's not what XML was originally designed for.

    5. Re:The Problem With XML by Anonymous Coward · · Score: 0

      Those who marked this guy as a troll have probably never used this fucking language.

    6. Re:The Problem With XML by GeckoX · · Score: 1

      ...it is inefficient machine wise because it's parsing bytes one bit at a time which is an extremely slow and computationally expensive operation.

      Try writing an XML parser in binary (no compiler) and see how difficult it is.

      Seriously, your arguments are not sound in this context.

      --
      No Comment.
    7. Re:The Problem With XML by Rei · · Score: 1

      Yes, it is human-friendly. But what % of xml files are written or read by humans? As far as machines are concerned, xml is awkward and bulky, and they're the ones who deal with the xml the most.

      --
      Don't take a knife to a gunfight, or even a knife to a knife fight. Take a gun to a knife fight.
    8. Re:The Problem With XML by arkanes · · Score: 1

      In fairness, this is only true if there's been an effort made by the XML schema designer to make it so. It's perfectly possible, and even easy, to make perfectly valid and well-formed XML which is inscrutable.

    9. Re:The Problem With XML by Anonymous Coward · · Score: 0

      how would anyone use std:string in C in the first place?

    10. Re:The Problem With XML by cecom · · Score: 1

      The parent is not trolling! What are the moderators drinking today ?

      Anyway, while I agree that XML suffers from both these problems (not very human readable and not efficiently parsable), there doesn't seem to be anything better and I am not sure there could be. These are contradictory goals.

      That said, it is amazing how slow XML parsing can be! A 200-300 line file sometimes takes 100 ms to parse on a modern machine. Turbo Pascal used to compile a thousand lines of code in less then a second on a IBM PC/XT :-)

    11. Re:The Problem With XML by SoSueMe · · Score: 1
      You should keep an eye on the W3c's XML page.

      More specifically, their XML Binary Characterization page.

      The XML Binary Characterization Working Group is tasked with gathering information about uses cases where the overhead of generating, parsing, transmitting, storing, or accessing XML-based data may be deemed too great for a particular application, characterizing the properties that XML provides as well as those that are required by the use cases, and establishing objective, shared measurements to help judge whether XML 1.x and alternate (binary) encodings provide the required properties.
    12. Re:The Problem With XML by Further82 · · Score: 4, Insightful

      They are supposed to be written so people can make programs to read the data without spending hours reading huge cryptic implementation manuals. You forget that computers do not program themselves yet. People still need to do that and XML is easier for people to read and thus easier for them to make programs to read. When machines can program themselves...we wouldnt be having this conversation.

    13. Re:The Problem With XML by Further82 · · Score: 0

      They wouldnt, I was merely pointing out that you dont have access to that resource in C.

    14. Re:The Problem With XML by Anonymous Coward · · Score: 0

      Troll? FFS!

    15. Re:The Problem With XML by cluckshot · · Score: 3, Interesting

      To be specific having spent the last 3 years working on XML I can suggest that there are numerous problems with XML.

      XML Tagging is tedious and stupidly top heavy in overhead. Contrary to being human friendly it isn't. XML Tagging should be shortened to a simple set of defined tag names and then type definitions. After that each name would be addressed by an index. Typing of data should be contained in a process to extract that is associated with either the tagging index or an over the top wrapper which is similar in function to the DTD. But frankly the whole process is currently a mess.

      The expansion of data with tagging currently can be as much as 3 or 4 to one. This is because of the recursive parsing process if you are recovering data a gemetricly expanding time consumer. If you use linear display the process is nearly worthless for anything but a single display process. It works great for short things. In short it just eats up processing time and band width. It makes a good universal file storage structure and that is it!

      Once the file is retreived it should be crunched into something like MySQL or such if any real processing is going to happen.

      Nothing really is gained by such a markup system over just a series of hashed tags that are indexed. Such tagging and indexing is a lot less of a tax on band width.

      This having been said, XML works and is OK for many uses. I am not sure it really has any advantage over flat files or such. It drinks band width and program operations time. I think in time it will turn out to be a fun toy but not much else. Of course someone else might find a good way to tell me why I should use 40 characters to transmit what should have taken 10 characters and how it should have been faster or more efficient some way to use it. The whole concept was definitely good for a lot of programmer payroll time.

      --
      Never Politically Correct ~ I prefer the facts If you don't like what I say, get a life, or comment yourself.
    16. Re:The Problem With XML by Eric+Giguere · · Score: 1

      In particular note the WAP Binary XML format (WBXML) that is used to transfer XML to and from mobile devices.

      Eric
    17. Re:The Problem With XML by Further82 · · Score: 2, Informative

      point is string parseing is neither easy for the programer or the machine. Compare finding a specific set of data in XML with its variable length branching sets of elements etc. to finding somthing in a SQL database where all data is at fixed offsets. With SQL the computer only needs to know how big each row is, and what row its looking for, then it can skip to (size of row)*(row number) just like that. That's fast. With XML, the whole file has to be parsed first, then once its in memory a faster lookup can be done. I'm not sure how XML databses work, but they look like they would aleiviate this problem.

    18. Re:The Problem With XML by icebattle · · Score: 1
      You say its not very machine-friendly... which machines did you ask, and what did they say?

      If I have this straight, you want a machine-friendly language that is also human-friendly. Since you can't have both, why not convert a machine into a human, or a human into a machine? Then use the appropriate format langauge.

    19. Re:The Problem With XML by Anonymous Coward · · Score: 0

      This is not a Troll. It is a valid criticism. Unfortunately only those who drink the koolaid get moderator points.

    20. Re:The Problem With XML by pyrrho · · Score: 2, Insightful

      one of the original ideas of XML was that a simple (SAX like) parser can be written by "a graduate student in two weeks".

      The validation etc is more difficult, but then it's not a matter of parsing the XML in the first place.

      It matters what you mean, but in general XML is easily parsed by machines... and easily represented in internal datastructure which are however efficient you make them.

      --

      -pyrrho

    21. Re:The Problem With XML by Frostalicious · · Score: 1

      What is inefficient about machine-parsing of XML

      Well among other problems, you typically have to parse & load the whole document in order to extract even a single piece of information. A DOM parser operates thus. A SAX parser may let you read from the top to where your data is.

      Contrast this with a binary format where you could navigate directly to your data and read a value. Imagine if you had to parse your whole database file in order to run a select statement.

    22. Re:The Problem With XML by Anonymous Coward · · Score: 1, Insightful

      Are you kidding? Let's say you have three employees. You want to send the following data on each to a remote machine: first name, last name, salary in dollars.

      Joe Smith $48000
      Jane Smith $50000
      Steve Shmo $65000

      Method #1: send the following as a string:

      <?xml version="1.0" encoding="iso-8859-1"?>
      <!DOCTYPE employees .. blah blah blah lots of crap here>
      <employees>
      <employee>
      <first-name>Joe</first-name>
      <last-name>Smith</last-name>
      <salary>48000</salary>
      </employee>
      <employee>
      <first-name>Jane</first-name>
      <last-name>Smith</last-name>
      <salary>50000</salary>
      </employee>
      <employee>
      <first-name>Steve</first-name>
      <last-name>Shmo</last-name>
      <salary>65000</salary>
      </employee>
      </employees>

      Method #2: send the following as a single string:

      3R3:Joe5:Smith5:480004:Jane5:Smith5:500005:Steve 4: Shmo5:65000

      I.e., number of records, followed by R, then length as decimal string, colon, string in iso-8859-1, repeated three times for First name, last name, salary, repeated for each row

      There's no way a parse for #1 is going to be more efficient than #2. In fact a parser for #2 can be made secure more easily because you can pre-allocate your buffers.

      #2 is easier to generate, and easier to explain than a full XML parser.

      Yeah, #2 is a little harder to read. That's because it's a *machine* format, designed to be easy for a machine to parse, and somewhat easy for a human to debug occasionally. If you need to read lots of it, write a program to dump it in a human-readable format. (I bet your human-readable format won't look anything like XML.. it'll probably look more like YAML)

      The advantage of XML is that you *don't* have to write the parser at all. It is slightly more programmer-efficient in many situations. And the tags give you the illusion of understanding the meaning. (I say illusion because there's no way to know which tags are optional in my example. You still need a description (a schema or DTD), which you need for method #2 anyway).

    23. Re:The Problem With XML by Pxtl · · Score: 1

      Nuts to that. Its overly verbose for humans, and overdesigned - I have yet to see any document XML that actually uses mixed tags, which are the entire justification for its distinction between attributes and content. Plus, the redundantly-named closing tags are excessive, making the files look even more cluttered to read.

      Compare vs. YAML or any other similar solutions, and its obvious.

      The principle of XML is nice - but for text documents only, as a superset of HTML. For the attribute files, properties, scripts, etc. that I see it being used for, its hideous.

    24. Re:The Problem With XML by agraupe · · Score: 1

      I could probably do such a thing, minus the namespaces, because those confuse me (although, I'm sure if I found out, I could do it). Given that XML is so standardized, it would be a relatively easy task to create a struct that holds all the information for a given tag. Writing a validator would be the difficult part. It is easier to offload that responsibility and blame people for writing bad XML.

    25. Re:The Problem With XML by Procyon101 · · Score: 4, Funny

      I think that I shall never smell
      A standard worse than XML.
      A standard I am loath to use
      Though offered parsers to abuse;
      The designers couldn't pass a class,
      CS201 can kiss their ass;
      A structure no one can traverse
      pre and post order routes are cursed;
      What are it's types you cannot tell;
      Though it promised self referential.
      Standards are assigned by committee,
      But any fool can make a tree.

    26. Re:The Problem With XML by Further82 · · Score: 0

      You're right of course. I guess what I meant was XML parsing is not very fast, compared to relational databases. And currently, unless you are using some kind of XML database, the XML must be parsed into memory every time you want to use it, which is a rather wasteful and repedative spending of computer time.

    27. Re:The Problem With XML by Surt · · Score: 1

      I did both this afternoon during lunch. Neither one seemed particularly difficult.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    28. Re:The Problem With XML by Anonymous Coward · · Score: 0

      They are supposed to be written so people can make programs to read the data without spending hours reading huge cryptic implementation manuals.

      That is huge illusion. Just because you can read the tags doesn't mean you have any idea what the data types are, or which tags are optional, or which are required, or what the constraints are.

      XML offers the programmer an easy way to glance at the data and see what it's "about". In fact in many documents the TAGS use more bytes than the actual data!

      I challenge you to write a large effective application for, say, a hospital records system or a billing system that uses one of the standardized business XML schemas out there that have 50-page descriptions. You will be "spending hours reading huge cryptic implementation manuals" to understand the schema and it's modules and variations. Your code may or may not work correctly with all possible permutations of data. I.e. somebody might come along and add a valid set of tags to your data and your app breaks.

      The value of XML is simply that you don't have to write a parser front-end. That's all. You save a few hours up front, and hope that it doesn't cause thousands of hours of headaches later. In many apps it doesn't, in a few simple ones you win out and XML justifies itself.

    29. Re:The Problem With XML by Procyon101 · · Score: 1

      I believe good old TP cheated and was compiling while you typed or something. I have never seen a compiler approach the speed of good old TP, either on modern machines or old school.

    30. Re:The Problem With XML by eap · · Score: 4, Insightful
      Of course someone else might find a good way to tell me why I should use 40 characters to transmit what should have taken 10 characters and how it should have been faster or more efficient some way to use it. The whole concept was definitely good for a lot of programmer payroll time.

      I would not be so quick to dismiss XML because of traditional arguments. Having worked with several different ways of storing and transmitting structured information, I can say without question XML comes out easiest in the end.

      If you're only transmitting 10 characters, then yes XML is not for you. However, if you're describing dynamically changing, complex data, even in large amounts, XML is very handy.

      There are turnkey parsers for XML that are well tested and which allow the client to see an abstracted view of the data as an object, at any level of detail desired.

      Platform independence is built in.

      It's easy to syntactically validate XML, as it's done automatically. It's also easy to isolate logical validation into discrete units since XML couples easily to object oriented designs.

      Very large XML messages can be processed quickly using a pull parser. Pull parsing is faster than SAX and has the intuitive benefit of being client driven, not event driven.

    31. Re:The Problem With XML by Anonymous Coward · · Score: 0

      there doesn't seem to be anything better and I am not sure there could be

      Wha?

      How about choosing a data format based on the type of data? If you're sending tabular data, use CSV. If you're sending plain text, send a length string followed by your text string. If you're sending JPG's, send compressed binary data. And so on.

      You wanna know why Turbo Pascal was so fast? Because the authors knew the only type of data it had to process was, you know, Pascal source code, so they could optimize it. Why does my billing application have to have a general-purpose XML parser???

      XML is a solution for very unique problem: the problem of having to think before you write a program.

      Hell, even XML can be improved by using SGML's notation for unambiguous closing tags: </>

    32. Re:The Problem With XML by fedor · · Score: 1, Interesting

      (first-name=Joe,last-name=Smith,salary= 48000),
      (first-name=Jane,last-name=Smith,salary=5 0000)
      etc...
      ]]>

      --
      :wq!
    33. Re:The Problem With XML by dustmite · · Score: 2, Interesting

      In my experience the main reason our clients want their data in XML is that most of them are afraid of single-vendor lock-in to proprietary formats, especially to smaller vendors they perceive could more easily go under - in other words, they want data longevity and a format they can easily process their data if they need to. And this trumps the inefficiency. Especially as people mostly transfer such documents across high-speed LANs and store them on modern 120+ GB hard disks and open them on machines with 512MB+ RAM ... in all of which cases inefficiency doesn't cause any problems.

      There are also generic XML content editors which, although rather pricy, help reduce a lot of the negatives associated with working with XML (i.e. you would be crazy these days to be writing XML in e.g. Notepad).

      I personally agree that XML is overrated, but many people want it because they understand one thing: if their data is in XML format, you can't in the long run lock them in to your software with excessive prices, and if you disappear, they can still get their data.

    34. Re:The Problem With XML by cecom · · Score: 1

      No, it wasn't compiling while you typed. There was a command line version (TPC.EXE) which was as fast as the IDE.

      I believe these were the reasons for its speed:
      - The Borland Unit format (.TPU): allowed for very fast recalculation of dependancies and for linking in-memory (no need to store the executable to disk)
      - Recursive descent parser written in assembler
      - Single pass compiler - it didn't perform any optimizations and generated code on the fly.

      Pascal is a really straight-foward language to compile and was designed for efficient LL(1) parsing.

      All in all, very reasonable compromises for those times. It was a work of art.

    35. Re:The Problem With XML by MSBob · · Score: 1

      And all that wonderful magic can't be accomplished with something like ANTLR, because...?

      --
      Your pizza just the way you ought to have it.
    36. Re:The Problem With XML by Anonymous Coward · · Score: 0

      I've had quite some discussions about the usefulness of XML compared to DBMSes, comma separated files etc...

      IMHO every good programmer should learn to work with XML and than decide if s/he should use it or not for the task at hand. The same goes for every distinct enough idea that 'looks' interesting. In the case of XML there are only a few other languages with similar characteristics. My point is, a good programmer doesn't just master every detail of his tools, but also should know which tools are out there. Not understanding what XML is and not wanting to is IMHO equal to non interest in software development as a whole.

    37. Re:The Problem With XML by Anonymous Coward · · Score: 0

      You don't have an actual job, do you? I hope you don't write code for a living.

      The number one reason for XML is interoperability. Publish a schema, anyone can implement it. Very good for these things called "web services", which you may have heard of, but without doubt think are a stupid waste of time.

    38. Re:The Problem With XML by Anonymous Coward · · Score: 1, Interesting

      > The advantage of XML is that you *don't* have to write the parser at all.

      Give the man a cigar. People got tired of reinventing that wheel and watching different parser implementations mangle the data slightly differently each time, so they went with one uber-parser for any structured data. They also tried to standardize datatypes (with XSD) and made a horrible hash out of it, but XML just works whether it's a webserver config or a HR database dump. The fact that it's not all things to all people doesn't make it suck, it just makes it a format like any other.

    39. Re:The Problem With XML by JacobO · · Score: 1
      There are also generic XML content editors which, although rather pricy, help reduce a lot of the negatives associated with working with XML (i.e. you would be crazy these days to be writing XML in e.g. Notepad).
      ... or poor
    40. Re:The Problem With XML by Anonymous Coward · · Score: 0

      We had a bunch of slightly different document formats that some hapless and unintelligent programmer elected to describe with XML. Unfortunately, XML has no control structures, the tools don't handle includes robustly, there are no conditional constructs, no loops, etc. I attempted to use "entity" includes and was compelled to remove them. So if you have two invoice forms, one with 10 lines for detail items and one with 11, you are stuck with cut-and-paste-and-manually-edit as your primary software development model. Now, if you have 100 slightly different invoiceformat.xml files, and you want to add one extra field to all of them, what the hell do you do?

    41. Re:The Problem With XML by Dolda2000 · · Score: 1
      I don't think it's a very good universal data description language.
      You don't think it's a very good universal data description language? Well, let me confirm your suspicions: XML absolutely and totally sucks as a universal data description language.

      The thing is, I don't understand for the life of me why people got the idea that XML should be used for data description to begin with -- it wasn't designed as such. XML was designed to be a document markup language, and that's precisely what it is. It's a good extensible document markup language (I don't say very good, since there are still some annoyances, but it certainly is good), but it is by no means a data description language.

      XML is, quite possibly, the most awkward way there is to describe arbitrary data structures. First, it's by far too verbose for machines, and I would argue even for humans. Second, it's by far too verbose for the majority of data structures. Third, it's too awkward to write by hand. Fourth, it's too high-level to describe simpler structures in an elegant way. Fifth, it's too awkward to parse simply. I could probably continue for ten more pages to spit out arguments against XML, but I hope I've made my point.

      Sure, XML is platform independent. So has every official internet protocol (STD document) been since the inception of the internet. The only reason I can think of why XML is so popular is because most platforms happen to ship with XML parsers (which is, of course, because it became so popular for that purpose to begin with -- I have no idea why that happened to begin with).

      Honestly, inventing a new format or protocol that suits your ends better is not that hard. If nothing else, at least use s-exps instead of XML.

      I'd really like to know why XML is so popular as it is. Can someone tell me?

    42. Re:The Problem With XML by DogDude · · Score: 1

      Hey, I've got some XML files from a 3rd party app that I use quite heavily that are pretty impossible to decipher, even with the silly "descriptors". If you've got some extra time, you can try to prove me wrong by deciphering these nasty things. It's just as easy (if not easier) to use a delimited file, with the first line of the file being the descriptions of the data.

      --
      I don't respond to AC's.
    43. Re:The Problem With XML by redhog · · Score: 2, Informative

      Because you need to _parse_ it in any way at all. Simply holerith/runlength-encoding the data would be much better.

      Take XPath as an example. How do you extract the fragment pointed to by the expression

      foo/bar/fie[@naja='hehe']

      ? You read the document, counting opening and closing tags, until you read in a foo-tag at topp-level then you continue, counting as before, until you, before a foo-ending tag at topp level, reaches a bar-tag at second level, and then until you reach a fie-tag with the attribute naja set to 'hehe' at third level. Then you read on counting opening and closing tags until you reach its ending tag and return the string between the opening and ending tags, including those tags, as result.

      Thus, if the foo-tag is at the end of the document, yoy have read the entire document just to extract those tiny bytes at the end of it.

      If you coded each tag something like

      4711 characters

      this task would directly be greatly minimized, as you could "jump" over big chunks of the file at once. Changing the coding of that 4711 to binary would also minimize the hassle, as reading the number would be a simple 4 byte read operation (one machine instruction).

      Even better would be to have tags not contain any information, but just pointers (indexes into the file) to the information, so that changing the file destructively to add some extra info would be possible without re-writing the whole (possibly big) file.

      All of this is old knowledge however. Go read up on SUN RPC, Corba, or, heaven forbid, ASN.1...

      --
      --The knowledge that you are an idiot, is what distinguishes you from one.
    44. Re:The Problem With XML by Anonymous Coward · · Score: 0

      And that is better than the "single string" example, becuase XML has very specific rules how to handle Unicode characters and linebreaks, where as with plain-old-text systems anything goes.

    45. Re:The Problem With XML by Proc6 · · Score: 4, Insightful
      This is like comment #492 that XML is slow and a poor format to use for databasing.

      People are trying to use XML for something other than for which it was intended then complaining at the sub-standard results. Surprise? XML is a common format to make it possible to move data between different, I'll use the word "domains" (as in division not URL), it should be used for "just" that.

      In other words, XML should be a "transport" mechanism. It's so I'm not writing a new parser by hand everytime some wanker like you sends me a file in yet another made-up-on-the-spot type. Your example is relatively clean but in the real world as the data gets harder to describe, humans start to make more ignorant made-up-on-the-spot rules like "Well ok if theres a sub record the line will start with a -, well ok it could be a + too, if the subrecord can only contain numbers... no you know what lets make it -n if the sub records can contain numbers only..". No matter how ingenious your "format" is, the problem isn't your format, its that your format isn't my other customers format.

      XML should be used in scenarios where the time spent being able to use all the readily available XML parsing and validating tools you don't have to re-invent the wheel writing is more than the milliseconds saved parsing a longer document "once".

      Don't use XML as your main, permanent, datastore for a gigantic database and complain. It's not for that. Its for when I need a copy of your data and I don't want to pay for a copy of "JackoffDb version 5" that you run, or hire a team of programmers to write a translator just to read your files. Gimme XML, I can take that and understand its contents and schema with ease, then Ill import it into my own system here.

      --

      I'm Rick James with mod points biatch!

    46. Re:The Problem With XML by pyrrho · · Score: 1

      that is a good point and xml is an interchange format... if you are going to load information regularly from disk then you very likely do want to consider a binary format better indexed and pre-parsed to address the issues you mention.

      --

      -pyrrho

    47. Re:The Problem With XML by EddWo · · Score: 2, Informative

      Sounds like you've been reading Joel.
      http://www.joelonsoftware.com/articles/fog00000003 19.html

      --
      "Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
    48. Re:The Problem With XML by Anonymous Coward · · Score: 0

      Actually, the problem XML solves is when you let your users input data. Unless you have a very specialized application that forces the user to only input data of a certain type, you need some sort of general-purpose parser to figure out what the user is saying, and you can't trust your users.

    49. Re:The Problem With XML by sapgau · · Score: 2, Interesting

      Yes, that's implementation.

      But the question was if it is a universal data description language. Sending binary will kill your data the first time you try to comunicate to a macitosh or Unix system (big endian, little endian).

      The common lowest denominator is just text, so to describe any structure we have trees in XML.

      Probably the confusion is the influence of Object Oriented design with Entity Relationship schemas in databases. The way that one-many relationships are described in both areas makes sparks fly.

      Pivoting on table data is what OO makes it look easy but complicated in ER. For these kinds of problems XML is just the messenger.

      I might be wrong but doesn't Oracle allow you to return data in xml format? I wonder how efficient that is.

    50. Re:The Problem With XML by sapgau · · Score: 1

      Two words: Web Services.

      It doesn't have to be SOAP. But if you are going to publish a web service you need at least to describe what the data looks like. And the client is not going to install your software to establish communication, he/she will only make a call to your server like a http request and you will respond with a stream of data.

      That data would be VERY useful if it is described in a standard format like XML.

    51. Re:The Problem With XML by Anonymous Coward · · Score: 0

      Nice. Now extend it to cope with colons in the data fields; and while you're at it, how about line breaks too? Oh, and don't forget data fields with non-ISO-8859-1 characters.

      And yes, while you're at it can you do me a parser for your format in javascript, C, C++, java, Perl, Scheme, PHP, C#...

      Say what? You're going to have to implement and maintain (as you mature your ridiculously naive format) separate parsers on all platforms and langauges?

      Mmm... in that case, just stick with XML. Thanks all the same.

    52. Re:The Problem With XML by aspx · · Score: 1

      My cow-orker writes XML, but parses it by ordinal position. Blank stares when I explain what is wrong with that.

    53. Re:The Problem With XML by sapgau · · Score: 1
      I think it's been said previously. Communication protocols are not easy to implement and if on top of that you add your own data representation, it will basically lock everybody into a very unflexible solution (not Universal). The thing is that XML excels at transporting data between computer systems. I'm talking
      Windows ->Macintosh ->Unix ->VMS Mainframe ->"Rusianiski" ->Windows
      Instead of agreeing with the authors/creators of all these systems in defining how a number or a date or a floating point is going to be sent over the wire, they can all agree on XML.

      So if the rusian guys have this very exotic computer that saves data in a very special format, it can still share data with the outer world by using XML as the standard.

      So what happens when we want to talk to a Chinese or Indian system? Good luck trying to put everybody together again because the chinese want to send their special characters in a conflicting encoding that it's already in use by the russians. (FOR EXAMPLE).

      The beauty part is that those battles have already been fought and XML has risen as a good standard.
    54. Re:The Problem With XML by Anonymous Coward · · Score: 0

      Yes, that's the theory, and yet of all the legacy file formats I've ever had to support in my life, the XML-based ones have been by far the most complex.

      Every use of XML I've ever seen has been the "solution" to a non-problem. I'll take a plain text format (in UTF-8) with a couple of examples over some XML/DTD/DOM/bleh monstrosity any day.

    55. Re:The Problem With XML by Dolda2000 · · Score: 1
      Instead of agreeing with the authors/creators of all these systems in defining how a number or a date or a floating point is going to be sent over the wire, they can all agree on XML.
      And just what of this does XML help to alleviate? How is it easier to agree on a common XML schema than to agree on a binary protocol or any other ASCII-based protocol? It's exactly the same thing.
      So if the rusian guys have this very exotic computer that saves data in a very special format, it can still share data with the outer world by using XML as the standard.
      Again, how does XML alleviate this problem? How can you say that data can just automatically be read and interpreted as long as it's using XML? It's not as if you magically get automatic interpretation just because your protocol/file format uses XML for external representation.
      if on top of that you add your own data representation, it will basically lock everybody into a very unflexible solution (not Universal)
      How is XML more universal than, say, LISP s-exps? How is XML to be preferred over the (extensible and flexible) X11 protocol? Any file format/protocol is just as extensible as its parser. Indeed, I'd care to say that LISP s-exps are to be preferred over XML -- they don't impose any extraneous structure, they're much more concise, and they can handle many more kinds of data types without using ad-hoc parsing. They can represent anything that XML can represent, and they can also represent less than XML is capable of.
      Communication protocols are not easy to implement
      I'm sorry, but this I just don't get. Just what part of a communication protocol is not easy to implement? Writing a parser? Also, even if I assume that your statement is indeed correct, how does XML alleviate this problem?
      The thing is that XML excels at transporting data between computer systems.
      Indeed, to recap -- just what part of XML makes it excel more than any other format at representing data (not to be nitpicking, but XML does not transport data, it represents it)?
    56. Re:The Problem With XML by Anonymous Coward · · Score: 0

      I am not sure it really has any advantage over flat files or such.

      There's no such thing as a "flat file". If you think that, it's because you've made up your own syntax, written your own parser, and pretended you haven't.

      Of course someone else might find a good way to tell me why I should use 40 characters to transmit what should have taken 10 characters

      I hope you write your programs in machine code and not some "inefficient" language like C.

    57. Re:The Problem With XML by zootm · · Score: 1

      Oh, it can. But with XML, it's done already. There's no need to create a new format, just a new subset tailored to the information you're sending. Hell, we could write SQL-like languages which query over tailor-made relational representations of data, but we generally don't, since we have SQL already.

    58. Re:The Problem With XML by MSBob · · Score: 1
      Heh. Bad example as SQL severely limits what you can do in relational algebra but I got your point :-)

      However your pre-cooked parser comes with a severe limitation... complete lack of flexibility which implies verbosity. Grammar files are immensely more flexible and just as precise as XML schemas. But it's true that for many people they seem to look harder to develop.

      --
      Your pizza just the way you ought to have it.
    59. Re:The Problem With XML by zootm · · Score: 1
      XML Tagging is tedious and stupidly top heavy in overhead. Contrary to being human friendly it isn't. XML Tagging should be shortened to a simple set of defined tag names and then type definitions. After that each name would be addressed by an index. Typing of data should be contained in a process to extract that is associated with either the tagging index or an over the top wrapper which is similar in function to the DTD. But frankly the whole process is currently a mess.
      XML's biggest failing, and arguably greatest strength, is its lack of typing. Any kind of data can be represented, since we have functions to represent any type of data as text. However this can get space-heavy, yes. XML Schema can provide restrictions on data types and so forth like you seem to want, as for defined tag names, that defies the polymorphic nature of XML. If you want to simplify the process like this, it's completely possible (a restricted version of XML like this is RDF), but XML seeks to provide a generic data-representation language, and if you restrict the tag names, they may as well not be there, and you may as well be using flat files.
      Once the file is retreived it should be crunched into something like MySQL or such if any real processing is going to happen.
      As I'm sure you know, this is a complex process. There are a number of existing generic approaches to doing this, such as ATGs and Silkroute, but you need to bear in mind that relational databases are not designed to hold data as structured as XML can be, and that designing schema to show how information is to be inserted into the relational systems is fairly critical if effective use is to be made of the data. Papers like this one attempt to automate the process of translating data types in a standardised way.
      Nothing really is gained by such a markup system over just a series of hashed tags that are indexed. Such tagging and indexing is a lot less of a tax on band width.
      And it's more of a tax on manpower and time, as parsers for custom types need to be written and tested, along with the fact that it makes interpretation of data between non-homogenous datasets far more open to interpretation.
      This having been said, XML works and is OK for many uses. I am not sure it really has any advantage over flat files or such. It drinks band width and program operations time. I think in time it will turn out to be a fun toy but not much else. Of course someone else might find a good way to tell me why I should use 40 characters to transmit what should have taken 10 characters and how it should have been faster or more efficient some way to use it. The whole concept was definitely good for a lot of programmer payroll time.
      Because, at the moment, in the tasks that we use XML for, we have bandwidth and processor cycles to burn, and it's quicker to have a generic data representation which can be parsed and interpreted by generic tools, than to hand-roll a new data format and parser every time we wish to represent another file. It might be quicker and less bandwidth-heavy, but does that really matter in the context of today's - and the future's - technology?
    60. Re:The Problem With XML by zootm · · Score: 1

      Well, XML Schemas are theoretically turing-complete (as is XML, interestingly), so they're not exactly less "flexible" as such. The actual representation they spit out is restricted to XML, but since this is a representation of data, it's not really a restriction at all. Grammars are excellent at representing complicated languages and so on, but data representation does not usually need such complicated syntactical rules. XML is a format more tailored to data representation, as opposed to language specification, and that's the distinction here. Theoretically, most standard, sensible data representations are easier to deal with in XML, and complex representations can be made if required. Obviously for simpler things, like a small mapping, writing one's own grammar is probably easier, but once you start getting into nested data types and so on, it quickly gets more complex, particularly if you want to ensure you can extend the representation at some later date.

    61. Re:The Problem With XML by jadavis · · Score: 1

      Huh? I don't think the SQL spec has anything to do with record sizes in the on-disk representation.

      Many RDBMSs use variable length records.

      I think you're talking about fixed-length record databases, like might be found in some embedded databases (like Berkeley DB, which also offers variable-length records).

      An RDBMS is very different from XML data.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    62. Re:The Problem With XML by sapgau · · Score: 1

      Well, the intention was not to say that XML is better or the best but to explain what is used for.

      So it happens that the history of XML allowed it to be used and distributed easier than LISP s-exps. Any other standard could be used, the usefulness is that at least someone can agree to use it so we don't have to reinvent the wheel everytime.

      I think the confusing part is that we are used to working with XML by reading it from a file, and so the risk of attaching it early to a specific format... but with Web Services XML could be streamed on the fly between servers to achieve some level of communication. This could help build new ideas/systems based on what it's agreed on. Meaning that if one system uses one binary representation that is different from my binary representation I have to account for the convertion before I can work with the data.

      The little I studied about communication protocols is that you get involved with automata theory, in where you have to account for all posible states where your automata could end up based on a particular stage of your communication protocol. And I just thought that on top of that you still have to account for all possible types of dates or floating point data interpretations and surely it would become a very complex system.

      Is not that it's written in stone, new alternatives could always be used and based on the complaints of this thread I'm sure that something will come up. As an example, SOAP was preached as the best solution for distributed RPC calls but it is so extensive that already a simpler alternative has been adopted extensively. Look at XML-RPC it's a good read.

    63. Re:The Problem With XML by uradu · · Score: 1

      > Contrary to being human friendly it isn't. XML Tagging should be shortened
      > to a simple set of defined tag names and then type definitions. After that
      > each name would be addressed by an index.

      You must be kidding me. You're dissing XML for not being human friendly, and then you propose to replace human readable redundant tags with dictionary indexes. While what you're proposing is indeed one of the mechanisms in binary XML, its furthest goal would be human friendliness.

      > It makes a good universal file storage structure and that is it! [...]
      > I am not sure it really has any advantage over flat files or such.

      There are "better" alternatives to XML for any given problem domain, but no single one comes close to XML's flexibility. For two-dimensional data sets CSV files are as simple and concise as it gets, for configuration storage it doesn't get much simpler and easier to read than INI files. But for storing generic and unforeseen data formats, XML is very close to the intersection point of simplicity and flexibility. One single parser reads them all, and with a bit of code can transform the data into just about any other format.

      > It drinks band width and program operations time.

      I'm sorry but this is becoming more and more a bogus argument. Have you ever tried zipping up a large XML file and observed the compression ratio? You get huge ratios precisely because of the redundancy. Even simple Huffman coding removes practically the entire overhead of verbosity and reduncancy. The more verbose and profusely tagged the data, the higher the compression ratio. In typical use a one megabyte XML file often compresses to under 50KB. Just about any programming language nowadays provides easy access to compression routines, be it file based or in-memory string compression. There is absolutely no need to transmit uncompressed XML over the wire, and in fact it should be a criminal offense. Besides, simply enabling HTTP gzip compression will buy you most of this compression on the fly. Yeah, it takes some processor cycles, but compared to how much work Windows does just to repaint the GUI while I'm typing this, those 2000 MHz are really not being overtaxed.

      > I think in time it will turn out to be a fun toy but not much else.

      Given the proper time scale, this will no doubt be true. But considering that XML 1.0 was officially defined in 1998, and its use and rate of adoption can still be considered as exponentially growing, you would have to choose a very large scale to be able to look BACK to XML. Till then, better learn to compress and find yourself an alternative to Notepad that displays XML in a more human readable form.

    64. Re:The Problem With XML by R.Caley · · Score: 1
      Of course someone else might find a good way to tell me why I should use 40 characters to transmit what should have taken 10 characters

      I might need to read that data 10 years from now.

      Of course, you could define a format and document it well, and provide an API to access it which will run on whatever system I am using 10 years from now, but let's face it, you won't. Even the best of us cut corners on that stuff under real world pressures.

      10 years from now when you are struggling to decode some bizzare, but efficiant, data format created by someone who read your comment, please don't beat your head on the desk too hard:-).

      The whole concept was definitely good for a lot of programmer payroll time.

      Having implemented more or less the same data storage API on top of simple text files and XML, I can state that XML eats much less programmer time. All the parsing and worrying about funny characters was done for me by a library.

      I do agree with more or less everybody that XML as a specific instance of a generic markup is a bit of a pig. However, a standard pig is better than 10000 neat non standard solutions for some things.

      --
      _O_
      .|<
      The named which can be named is not the true named
    65. Re:The Problem With XML by Bill+Dog · · Score: 1
      I.e. somebody might come along and add a valid set of tags to your data and your app breaks.

      This is not true. With the DOM parser your code wouldn't even drop into child nodes it didn't know about, and with SAX, you're also only processing those tags thrown at you that you know about/are interested in.

      --
      Attention zealots and haters: 00100 00100
    66. Re:The Problem With XML by R.Caley · · Score: 1
      But what % of xml files are written or read by humans?

      It should be that a reasonably high proportion of them are.

      XML is for:

      • Interchange between systems independently developed systems.
      • Storage which may need to be examined by people.
      • Input where there may be a need for human intervention, say in an emergency situation (``argh! the system won't let me change the tax rate on this transaction and it has to go through today!'')
      Of course the latter 2 are special cases of the first with a different meaning for `designed'.

      So, 2/3 of the reasons to use it involve people.

      If the data can never be seen by another system, nor by a person, you shouldn't be using XML, 'though you should be providing a way to export to XML.

      Now, having said that, if the data is accessed rarely enough (say once per run and no more than a handful of runs per day), it may be more efficiant to store in XML and take the access overhead in exchange for not implementing export and import mechanisms. But it's a hack.

      --
      _O_
      .|<
      The named which can be named is not the true named
    67. Re:The Problem With XML by Bill+Dog · · Score: 1

      This is overstated. If, say, you don't know the domain, then the items and their data are going to be foreign to you whether it's in XML or not. But otherwise, the schema designer would have to intentionally and knowingly make the data less decipherable by doing something unorthodox, like storing the numeric value of the year minus 1900 between year tags, instead of the more obvious actual year number itself. And nesting could only fail to convey hierarchical information if the tags were made cryptic, such as with mutilated abbreviations.

      --
      Attention zealots and haters: 00100 00100
    68. Re:The Problem With XML by Anonymous Coward · · Score: 0

      What's a cow-orker?

    69. Re:The Problem With XML by Anonymous Coward · · Score: 0

      What are it's types you cannot tell

      "its".

    70. Re:The Problem With XML by Anonymous Coward · · Score: 0
      But using 40GB to send 10GB of data is "the bomb"!

      gzip is your friend. XML files naturally compress way down.

    71. Re:The Problem With XML by Anonymous Coward · · Score: 0
      Besides, simply enabling HTTP gzip compression will buy you most of this compression on the fly.
      How so?
    72. Re:The Problem With XML by jgrahn · · Score: 1
      I might need to read that data 10 years from now. Of course, you could define a format and document it well, and provide an API to access it which will run on whatever system I am using 10 years from now, but let's face it, you won't.

      And how is XML different here? The tags in the file don't magically describe the semantics ...

      Having implemented more or less the same data storage API on top of simple text files and XML, I can state that XML eats much less programmer time. All the parsing and worrying about funny characters was done for me by a library.

      Two words for you: lex and yacc ...

    73. Re:The Problem With XML by tigersha · · Score: 1


      Pascal, as a language, is designed so that everything can only refer to things above it in a file, except where explicitly declared. The idea of simply oredering your methinds/procedures as you wish and calling them at will does not exist. This means that the compiler can be written as a pipeline that emits code as it goes through. Modern languages must load everything into memory, resolve dependencies and then emit stuff.

      --
      The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
    74. Re:The Problem With XML by R.Caley · · Score: 1
      The tags in the file don't magically describe the semantics ...

      No, but if I give you XML you will at least get the structure, and the tag names should be a hint that `this bit is the user's bank account number' or whatever. If I give you a hunk of highly optimally stored binary data, you have a major research programme in front of you. If you doubt that, take a look at the binary files makeing up a complex mysql database, and try and reverse engineer the data to the point you'd start from if it were an XML dump.

      Two words for you: lex and yacc

      Two words for you: Japanese and why-should-I-write-an-LALR-grammar-and-supporting- infrastructure-when-I-can-link-to-a-library-and-wr ite-one-call.

      OK, I cheated.

      --
      _O_
      .|<
      The named which can be named is not the true named
    75. Re:The Problem With XML by smittyoneeach · · Score: 1

      xml is an interchange format
      Then why are these JavaJocks using it for a build system (Ant) as well as, along with reflection, to build half the object model in their J2EE app servers (Tomcat, et. al.)?
      Doing smallish amount of app configuration (Subversion) in XML doesn't sound like a horrible idea, but the fundamental concept of platform-agnostic data representation is long since flushed down ye olde marketing toilet.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    76. Re:The Problem With XML by Dolda2000 · · Score: 1
      Meaning that if one system uses one binary representation that is different from my binary representation I have to account for the convertion before I can work with the data.
      Again, I don't see how XML alleviates this problem. If two sites uses two different XML schemas, the data still has to be converted, and I would claim that it's not like writing an XSLT filter is easier than writing a C routing to translate one binary format into another. And to be honest, it's not like it's easier to agree on an XML schema than a binary schema.
      I think the confusing part is that we are used to working with XML by reading it from a file, and so the risk of attaching it early to a specific format... but with Web Services XML could be streamed on the fly between servers to achieve some level of communication.
      I don't really understand this either. How are things made easier just because you read from a socket rather than from a file? I'm sorry if I missed the point somehow.
      The little I studied about communication protocols is that you get involved with automata theory, in where you have to account for all posible states where your automata could end up based on a particular stage of your communication protocol.
      First, that's not actually hard. My protocol/file format parsers usually end up in around 100 lines of C code: almost nothing, in other words. And that's when I write in C. Using Perl, LISP, PHP or virtually any other high-level language makes it even less. Second -- when in doubt, use Bison. It will do the job for you with hardly no work at all.
      And I just thought that on top of that you still have to account for all possible types of dates or floating point data interpretations and surely it would become a very complex system.
      Again, XML doesn't help alleviate this. XML only has strings, so you still have to interpret numerical formats manually. At least LISP actually specifies distinct formats for both integers and floating point numbers. If you are referring to representing numbers binarily, then maybe you should consider how the Internet has always been able to do this in a fully cross-platform manner using binary protocols such as IP, ICMP, UDP, TCP, DNS, etc.

      I'm not one who is in favor of binary application-level protocols, however. I always write ASCII-based protocols whenever I have a chance. Still, binary protocols have their place as well, but mostly where one wants to preserve bandwidth. Surely, you would not want the Internet to be using XML on the network layer?

      As an example, SOAP was preached as the best solution for distributed RPC calls but it is so extensive that already a simpler alternative has been adopted extensively. Look at XML-RPC it's a good read.
      In my workplace, these technologies are only praised by the non-technical people (managers etc.) that don't really know what they're speaking about. I've only seen it being used voluntarily by programmers once or twice. Ever. (The only implementation that comes to mind is freshmeat's version submittal system)
      So it happens that the history of XML allowed it to be used and distributed easier than LISP s-exps. Any other standard could be used, the usefulness is that at least someone can agree to use it so we don't have to reinvent the wheel everytime.
      It may just be me, but, to me, that sounds like "there are no technical advantages to XML before any other standard, it just so happens that it was used in many projects and became popular".

      Also, since LISP is around 40 years older than XML, I don't really see how the history of XML could allow it to be used easier. The thing is, nothing of the like has happened with any other format than XML. Assuming (which I do), that XML has no technical advantages in and of itself, I believe that it is purely because of hype that it happens.

    77. Re:The Problem With XML by uradu · · Score: 1

      A simple googling of "http gzip" would yield pages such as this. It explains quite well how HTTP compression works, but also shows that not all web servers support it equally well.

    78. Re:The Problem With XML by arkanes · · Score: 1

      It's actually pretty easy to make cryptic XML even when you're familiar with the domain, because for most reasonably complicated domains there's any number of ways to store data. One particular problem which XML deals with poorly is references, where you've got a common data element which is references from multiple locations. I object to the idea that a data file is somehow automatically more understandable just because it's in XML.

    79. Re:The Problem With XML by sapgau · · Score: 1

      * sigh *

      Well I could care less for XML, the fact that I get paid to code for it is my only inclination to understand it.

      I don't understand if XML is such a bad idea why haven't other superior alternatives been more popular? If it's because of the ignorant and stupid (which I agree they are) managers imposing it to their developers and engineers? Is marketing really that powerful?

      By reading this thread in slashdot (and others) I hear a lot of grief but I can't find an alternative that other people agreed on using.

      So, in the end if it takes 40 hours instead of 1 to acomplish the same thing I will get payed anyways, I still can't beleive technical merit for any standard could be such a terrible victim.

      peace.

    80. Re:The Problem With XML by pyrrho · · Score: 1

      well, there are a lot of possible examples like that...
      Ant seems like one of the more defensible ones though... I don't use Ant, but the parsing of Ant config files has to be a minimal part of the cost of the build.

      The thing that makes XML appear cool (assuming one want to try) is just the fact that it's HTML where you can make your own tags. But that is pretty cool.

      --

      -pyrrho

    81. Re:The Problem With XML by Anonymous Coward · · Score: 0
      That's a problem with querying XML files, not with XML. If you want efficient querying, then import into a real database and query to your heart's content.

      Plus, there might be more efficient ways of executing a 'bunch of' xml queries on a file at the same time with just one read of the file. You could collect the relavent data to the queries as you read the file once. That's fine when you are just importing the stuff you want into a database - you are only doing it once anyway. Plus, CPU cycles are usually cheaper than programmers. It's ok to burn a few to save programmer time.

    82. Re:The Problem With XML by Anonymous Coward · · Score: 0

      All that stuff you don't use is there for problems you haven't encountered yet. You can use a simple subset of xml and feel secure in the knowledge that it will scale to problems that you haven't encountered yet because people using it for everything imaginable have already run into those problems and the solutions have found their way into the standard. Be happy that you don't use all that stuff. The fact that you are using XML means that you don't have to know about it until you need it. ( as opposed to some other format where you would always have to be planning ahead ) With XML you can do 'faith based programming', having faith that even though you don't know what you are doing, any future obsticals that you may encounter have been cleared away by bigger XML-heads than you already.
      Just act naively and you will find that it's all smooth sailing. The answer to any question you might have can be looked up on w3c.org .

    83. Re:The Problem With XML by Black+Perl · · Score: 1

      We had a bunch of slightly different document formats that some hapless and unintelligent programmer elected to describe with XML.

      Agreed. It would have been better to describe them in XML Schema (that's what it's for!), or in an XSL Template.

      Unfortunately, XML has no control structures

      XSL has control structures. It looks like your programmer has tried to use XML when it didn't quite fit.

      the tools don't handle includes robustly

      xinclude has been robust for me. As long as you're using modern parsers/processors, it should be robust.

      there are no conditional constructs, no loops,

      Again, it seems like what you wanted was XSL templated documents. Actually, any form of templating with conditionals and looping would probably work. I have done cool things with Template Toolkit. But when working with end-to-end XML, you can't beat XSL.

      So if you have two invoice forms, one with 10 lines for detail items and one with 11, you are stuck with cut-and-paste-and-manually-edit as your primary software development model

      Editing XML files is not software development! You should generate them. There are many ways to do it, but more details would be necessary to find the best way.

      The problem with XML is that some people don't use it correctly. But that's true with any technology. There's nothing intrinsically wrong with XML.

      --
      bp
    84. Re:The Problem With XML by Bobbysmith007 · · Score: 1

      Thats all well and good until you realise that your 40 lines of xml could have been sent as a three variable form post to the exact same webservice and everything would have been easier. On everyone. Its way easier to send an HTTP Post than it is to create a bunch of crap that wraps my three variables and explains that these three variables have some type (which is laughable since its coming in as a string).

      XML is really cool until you realise that it is almost entirely pointless. You can release any specification for communication and have everyone implement it. Personally though HTTP Post is the way to go on web service.

      Oh yeah, nice troll, you reeled me in.

    85. Re:The Problem With XML by Dolda2000 · · Score: 1
      I don't understand if XML is such a bad idea why haven't other superior alternatives been more popular? If it's because of the ignorant and stupid (which I agree they are) managers imposing it to their developers and engineers? Is marketing really that powerful?
      The same thing could be said of Windows. ;-)

      Seriously, though -- marketing is no doubt that powerful. Think VHS vs. Betamax, for example. There is an almost infinite list of similar examples.

      By reading this thread in slashdot (and others) I hear a lot of grief but I can't find an alternative that other people agreed on using.
      Actually, there are two standards that other people agreed upon in almost every protocol/file format: 8-bit bytes and ASCII. Unlike XML, these actually provide real advantages. Since 8-bit bytes are being used, all systems can communicate over the same network. Since more or less every text editor and terminal emulator uses ASCII, it allows more or less every such program to manipulate files and network streams.

      The difference between the 8-bit byte and ASCII standards and XML is that XML is redundant. 8-bit bytes and ASCII describe data types at the primitive level, where no underlying description exists, while XML just restructures already structured data. As I see it, you create abstractions for two reasons: either to describe something that has no existing description, or to combine several distinct concepts into one. XML does neither -- it just redundantly restructures data.

      Don't get me wrong, though. I do love XML. I just don't like when it, like virtually every other web standard, is used out of context. XML is great as an eXtensible document Markup Language, HTTP is great as a HyperText Transfer Protocol, and the web is great as the global hypermedia it was intended as. When XML is being used to represent arbitrary data structures, HTTP is being used for anything between heaven and earth (the latest stupid thing I saw was some large standard that just uses HTTP to connect to a service and transfer arbitrary binary data (I think it was some part of Apple's iTunes, but IPP also does it) -- TCP already covers this!), and the web is being used for creating interactive user interfaces, then I get upset. It's like plugging round holes with square pins, or teaching pigs how to fly.

      My take on this is the pointy-haired-bosses started learning about the web, and suddenly all web-related concepts were buzzwords that were forced on unsuspecting engineers, and now students are learning in school to do things "the web way". I find no other explanation.

      </rant>

  5. Join the Dark Side by TripMaster+Monkey · · Score: 2, Funny

    One thing that really stands out in the whole text is that the author isn't afraid to cover the dark side of XML.

    [Obligatory Star Wars joke]

    --
    ____

    ~ |rip/\/\aster /\/\onkey

    1. Re:Join the Dark Side by TripMaster+Monkey · · Score: 5, Funny

      XML: You killled my father!

      HTML: No, XML....I am your father!

      XML: That's impossible!

      HTML: Grep your code...you know it to be true.

      XML: NOOOOOOOOOOOOOOOOOO!

      --
      ____

      ~ |rip/\/\aster /\/\onkey

    2. Re:Join the Dark Side by p0rnking · · Score: 1

      I know that this was an attempt at being funny, but I'm curious, did you say this because the author mentioned "darkside" in his review, or because html came before xml, or a combination of both?
      The reason why I ask, is because I thought "XML" is older than HTML, and is a simpler version of SGML (?), which HTML is also derived from ... no?

    3. Re:Join the Dark Side by m95lah · · Score: 1

      I guess that the dark side he is talking about is the fact that the parent was born after the child...
      I can almost hear banjos playing in the background too...

    4. Re:Join the Dark Side by johndiii · · Score: 2, Informative

      HTML significantly predates XML. Though both are derived from SGML, they are in somewhat different categories (HTML being an application of SGML, while XML is a profile). HTML is a closed development path, however; future versions will be XHTML, which is a derivative (application) of XML.

      --
      Floating face-down in a river of regret...and thoughts of you...
    5. Re:Join the Dark Side by atomm1024 · · Score: 1
      1. "Dark side" mention vs. HTML predating XML -- probably both.
      2. XML is not older than HTML. HTML was an application of SGML, but XML was only devised as an SGML replacement about 7 years after the first HTML applications appeared.
      3. I wouldn't say XML is "simpler" than SGML. SGML is much easier for human read/writability. XML's syntax is much stricter (requiring a single root element, all attributes quoted, all tags closed, etc.), so it's easier to parse.
      --
      Signature.
    6. Re:Join the Dark Side by Anonymous Coward · · Score: 0

      XML: did you fuck my wife? did you fuck my wife? did you fuck my WIFE?

      XML's Wife: I am your wife.

      XML: that doesn't matter, did you fuck my wife? answer!?

      XML's Wife: Ok Ok, I admit it, I fucked your wife, I am your wife and I fucked her, happy?

    7. Re:Join the Dark Side by Anonymous Coward · · Score: 0

      Shhhh, nigger, shhhh. Good boy.

    8. Re:Join the Dark Side by Anonymous Coward · · Score: 0

      happy?

      only if you made a video.

    9. Re:Join the Dark Side by Anonymous Coward · · Score: 0
      HTML: Grep your code...you know it to be true.

      HTML: Read my DTD...you know it to be true.

  6. damn by pyrrho · · Score: 5, Funny

    I want to say something funny about XML, but there is nothing.

    --

    -pyrrho

    1. Re:damn by Ivan+Todoroski · · Score: 5, Funny

      I completely agree with you .

    2. Re:damn by pyrrho · · Score: 1

      it was an honor to be your straight man!

      --

      -pyrrho

    3. Re:damn by pyrrho · · Score: 1

      .

      this part is especially funny... roflmao.

      --

      -pyrrho

    4. Re:damn by charlieo88 · · Score: 2, Insightful

      HA! I'd mod you up if you weren't already maxed out.

  7. n00b - help! by dsginter · · Score: 4, Interesting

    After seeing what can be done with simple javascript and XML, I'm wanting to get into this. Can someone point me to the best OSS way to do this (I can hear the groans now). I like Postgres but I don't see much in the way of getting it to spit out XML. I like documentation... MySQL? Am I missing something?

    --
    More
    1. Re:n00b - help! by Anonymous Coward · · Score: 0

      You didn't describe your application, but perl is an excellent choice for communicating with PostgreSQL through the DBI module, and can easily create and parse XML documents. It handles XML-RPC also if you're trying to share the data across network applications.

    2. Re:n00b - help! by Further82 · · Score: 0

      of course what you linked to could easily be done without XML at all. The xmlhttprequest object is fairly misnamed. A lot of times it is simpler and easier to pass data with a simple GET or POST request and return it as plain old text. XML is usefull only when more complex sets of data need to be passed, and even then, of course, the XML is serialized before being sent to and from the server. XML just makes it easier to interperte large data sets on either end

    3. Re:n00b - help! by aldoman · · Score: 5, Insightful

      XML is totally overhyped, which sadly makes people think it is a lot more complex than it is.

      Think of it more like CSV than mySQL. It's just a format for representing structured data. It also happens to be that it's quite easily read by humans.

      Yes, you can do incredibly advanced things with XML, but there is nothing you can do in XML compared to your own propietary data storing language.

      The reason people use XML instead of writing their own data storing format is simple:- there is a lot of tools for parsing it, which you'd have to write yourself if you had your own format.

      As for the javascript and XML example, it's impressive, but it's far more javascript than XML.

    4. Re:n00b - help! by DarkHelmet · · Score: 1
      Why don't you have your xml file really be PHP that exports XML.. Then you can do something like:
      <?
      // skip mysql init stuff:

      $dom = new DomDocument();
      $root = $dom->createElement("root");
      $dom->appendChild($r oot);

      $items = $dom->createElement("items");
      $root->appendChild( $root);

      $sql = "select * from items order by Id asc";
      $qu = mysql_query($sql);

      while ($s = mysql_fetch_array($qu, MYSQL_ASSOC))
      {
      $item = $dom->createElement("item");
      $items->appendChild($item);

      foreach ($s as $k => $v)
      {
      $x = $dom->createElement($k);
      $textn = $doc->createTextNode($v);
      $x->appendChild($textn);
      $item->appendChild($x);
      }
      }

      echo $dom->saveXML();
      exit;
      ?>
      BTW, I am just typing this from my head... so no guarantees on it working properly or not :)
      --
      /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
    5. Re:n00b - help! by daveed · · Score: 1

      I wouldn't start off with the database. Start off with plain XML stored as files. Then get working apps to pump out the XML as HTML or whatever using XSLT. Write some XSD to validate the XML before parsing it though.
      As for languages, use PHP, Java, .Net Mono. They all have pretty good XML parsers.
      www.w3schools.com has a pretty good XML tutorial (and all web type technologies).
      Hope that helped

    6. Re:n00b - help! by davez0r · · Score: 1

      you're (probably) not going to want to output XML directly from your database. with the XMLHttpRequest object, you're (probably) going to be requesting a specially formatted document from your webserver. so to create this document, instead of generating HTML tags in your PHP (or whatever), you generate XML.

      you then read this XML with the javascript on your (normal HTML) page, and change its formatting/data appropriately.

      here is a link about it

    7. Re:n00b - help! by Anonymous Coward · · Score: 0

      That depends on what you want to do with it. If it's web development, you can simply output XML as you would HTML, along with an application/xml Content-Type HTTP header. Or you can read it in using your favourite language's XML libraries. Pretty much every language under the sun has an XML parser available.

      XML is just markup. You can create XML files with a text editor.

    8. Re:n00b - help! by Anonymous Coward · · Score: 0

      MS-SQL comes with a HTTP-based interface that automagically returns XML from queries.

      Sounds nice in theory, but the XML it returns is really verbose and fugly, so most of the time you are better off writing your own document builder. See DarkHelmet's PHP example.

    9. Re:n00b - help! by ciroknight · · Score: 1

      I'll just use my XSLT to parse my XTC to convert my XML to XHTML, of course checking it with my XSD, because my XCOCK isn't long enough, and my XHEAD isn't filled to the brim with XLANGs yet.

      This is the one thing I've always hated about XML. It's an incomplete solution; you end up needing six different programming languages/abstractions layers/formatting layers/document validators/rendering layers... it seems like all we're doing is adding more and more overhead to something that used to be as simple as "".

      Please people, consider this before you start writing/converting everything to XML. Most of the time it adds countless thousands of un-needed characters to parse through, and could very well be obsoleted by the next big thing..

      --
      "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
    10. Re:n00b - help! by Piquan · · Score: 5, Insightful
      The coolness of XML is not in the format (which sucks); it's in the technologies around it.

      RelaxNG, for instance, lets you verify that your XML file is built correctly for your app: you write a RelaxNG spec for your XML file format, and then it verifies that all the mandatory fields are there, in whatever order is necessary, with the correct datatypes, etc, etc. RelaxNG processors are part of most major XML libraries now, so if you're writing Perl you can just tell your Perl library to validate your file and it's done. If you're editing in Emacs (with nxml-mode), you can point Emacs at your RelaxNG file, and have tab completion, error highlighting, etc, etc-- all customized for your file format.

      XSLT lets you take an XML file and perform transformations on it into another (possibly XML) file format. Need to convert XML into SQL INSERTS? Piece of cake. I use it to extract particular parts of an XML file and convert them into a significantly differently-ordered Lisp structure.

      Most modern web browsers are becoming CSS engines rather than HTML engines. So you can stick a CSS stylesheet reference at the top of your XML file, and have the CSS generate something that looks like what you want the user to see. The data file looks good to the app, and looks good to the user. You can also (with some browsers) use more powerful transformations using something like DSSSL or XSLT.

      DOM for a standard data manipulation API, so each program you write doesn't have a different data access language. XPath as a language to perform more complex queries. XML Namespaces to let users or apps tag their data with extensions. XInclude for data sharing. All of these are things you get for free with XML.

      All of these are general technologies, not specific apps. So they should be usable in most major libraries in most languages. (If you're using Perl, I'd recommend XML::LibXML.)

      Don't think of XML as just a file format, because that part sucks. Think of it as a buffet table of technologies. When you write a program, 10% is to do the program's processing; the other 90% is to handle I/O, data management, and other housekeeping. Using XML lets you get a lot of that for free.

      PS: I'm not an XML fanatic. A year ago, I was told to use XML for one particular project and was disgusted at the idea. I still think that XML gets a lot wrong, but I've come to recognize what benefits XML provides.

    11. Re:n00b - help! by runderwo · · Score: 1

      Look here. Several of the products that he mentions are addons for PostgreSQL.

    12. Re:n00b - help! by Anonymous Coward · · Score: 0

      n00b is right... XMLHttpRequest is badly named, it simply issues HTTP requests for *any* type of data.

      If you use XmlHttpRequest with XML, and I use compact strings, my app will be faster and more reliable than yours, every time.

      Why do you want Postgres to spit out XML? Do that in your application, let's not contaminate the database with arbitrary text file formats. Maybe *I'd* like Postgres to spit out S-expressions! Or MIME-encoded data!

    13. Re:n00b - help! by Anonymous Coward · · Score: 0

      xmlhttprequest object is fairly misnamed

      Good point. Perhaps because HTTP syntax is a subset of XML, it was thus named? Maybe the intent is for the functionality to remain in other markup languages besides http?

      By the way, were you "b!tchslapped" by one of the ediors? Your past comments don't seem to be trolls, but your score defaults to 0 (after the "Women encouragecd to go for IT" thread). But you would probably see your scores as defaulting to 1 if you were.

      Positing anonymously to avoid getting b!tchslapped myself.

    14. Re:n00b - help! by Stanistani · · Score: 1

      let me see if I can reduce this to a human-readable folk saying:

      If the only tool you have is a machine gun, everything starts to look like enemy soldiers.

    15. Re:n00b - help! by Anonymous Coward · · Score: 0

      HTTP has nothing to with XML.

      XMLHTTPRequest got it's name because it was copied by everyone from the Microsoft MSXML SDK, and every object in that sdk starts with "XML".

    16. Re:n00b - help! by zipwow · · Score: 1

      What's your beef with the file format? And what's your proposal or suggestion for an improvement?

      --
      I don't know which is more depressing, that 2/3 didn't care enough to vote, or that 1/2 of those that did are crazy.
    17. Re:n00b - help! by Further82 · · Score: 1

      I don't know if I was 'b!tchslapped', er, or even how I'd found out. I just figured the Karam system sucks. After many comments in a row modded Insightful, Funny, or just left alone, it only took one comment modded troll to give me 'bad' karma. Now it seems that my karma is positive again. The Karma system kinda reminds me of SimCity 2000, where everytime you raised taxes a percentage the citizens would boo you. So after raising it 10 percent and being booed 10 times, lowering it back to 9 would cause them to cheer again. Short memory.

    18. Re:n00b - help! by Anonymous Coward · · Score: 1, Interesting

      I still prefer CSV. I've saved countless
      maintenance hours by ripping out XML from old
      projects and replacing it with CSV.
      I've saved even more time by requiring that our
      business partners use CSV for data exchange.
      I refuse all requests to "upgrade" to XML.
      Usually I give them a choice -- choose CSV files
      or choose ISO X.12 EDI files. They always choose
      CSV files.

    19. Re:n00b - help! by Doctor+Faustus · · Score: 2, Funny

      XSLT lets you take an XML file and perform transformations on it into another (possibly XML) file format. Need to convert XML into SQL INSERTS? Piece of cake. I use it to extract particular parts of an XML file and convert them into a significantly differently-ordered Lisp structure.

      I really like XSLT for code generators, with the meta-data in XML. I do, however, miss the sheer perversity of using Access VBA to generate Java.

    20. Re:n00b - help! by Anonymous Coward · · Score: 0

      You're stupid. That XML is extensible and self-describing gives it a huge advantage over cobbling together your own file format.

    21. Re:n00b - help! by Piquan · · Score: 1

      I got quite a few responses here, but yours gave me a real chuckle; thanks!

      You're stupid.

      Ah, personal attacks, the last refuge of an argument without decent support. Or is it the first? It's so hard to remember.

      That XML is extensible and self-describing gives it a huge advantage over cobbling together your own file format.

      You're comparing two orthogonal attributes. One attribute is the power of the language, that is, that it's extensible and self-describing. The other attribute is that it's a standard.

      You can have languages that sit in any quadrant of the plane you just described: powerful standard languages (sexps), powerful non-standard languages (how most good programming languages start out), weak standard languages (HTML 1.0), weak non-standard languages (a deluge of config file formats come to mind). The fact that XML is extensible and self-describing has nothing to do with the fact that it's standard, rather than cobbled together by myself (or some other random hacker). It is possible for somebody to cobble together a language that is extensible and self-describing, and it is possible for a committee to standardize a format that is neither. So claims that a language's position on one axis gives it an advantage over languages that are positioned in a particular manner on the orthogonal axis are specious.

    22. Re:n00b - help! by Piquan · · Score: 1

      I really like XSLT for code generators, with the meta-data in XML. I do, however, miss the sheer perversity of using Access VBA to generate Java.

      For the project I'm working on, I considered doing just that. In the end, I decided it would be easier to use XSLT to transform the XML into domain-specific Lisp sexps, and then use Lisp to transform the data into the code format I need. But it certainly is fun!

    23. Re:n00b - help! by Piquan · · Score: 1

      What's your beef with the file format?

      Part of it is just the usual: too verbose, the data gets lost in the formatting, as a human-readable format it fails miserably but human readability shaped its form considerably. I also find that not everything fits nicely into a tree structure, like XML wants. (This is partially shaped by my latest project, which requires more general graphs.) Now there's some things like XPointer and the like, which are the beginnings of pretty lame attempts to pretend that XML doesn't force everything into a tree.

      The difference between attributes and elements isn't clear-cut, and providing both in such an arbitrary manner promotes confusion. Several people suggest good ways on when to use which, but it's silly to have a file format that apparently needs such recommendations to be repeated so many times. "Attributes are for metadata", the claim goes, "and elements are for data". I'll briefly ignore (as most such recommendations do) the fact that metadata may need to be structured as much as, or more than, data. My main point to this paragraph is that the distinction between data and metadata becomes blurred, particularly when you have different processors working on the same document. And isn't that what XML is for? If everything were clear-cut between attributes and metadata, we wouldn't have things like processing instructions (meta-meta-data?) and the like. Certainly, the xmlns attribute is on a different metalevel than some program's name attribute; why not draw a new metalevel distinction for that? Trying to draw a line between data and metadata like XML does a futile exercise. Everybody's metadata is somebody else's data, so just use namespaces or something instead of trying to make a distinction inherent to the file format.

      Most people these days use object-oriented programming, and it's nice to be able to read in a file and have it generate an object tree of appropriate classes. But surprise! it doesn't work that way. The best time to do type assignment is during validation, and there's three main validation systems: DTDs, XML Schema, and Relax NG. DTDs are woefully inadequate in more ways than I can describe. Relax NG, while my favorite validation structure, doesn't assign type information to the data as it's validating because of the holy "Thou shalt not augment the infoset" mandate of Relax NG. Even though it has several constructs that just scream OOP subclassing, those constructs aren't visible to the user, just the parser. XML Schema puts type information into the infoset, so XPath etc can get at it. Other than that, once you get beyond the trivial, it sucks in just about every other way: it's not orthogonal, its extensibility and reuse constructs are as pathetic as anything I've ever seen, and it reeks of something that was built by passing a slapped-together application-specific spec into a committee after it's been pushed in several directions without any thought of graceful growth (which, of course, it was).

      This is just what came to me quickly (although I've been wrestling with these issues for a while); I haven't stopped typing to think about what I'm typing. So I haven't carefully spelled out my issues, but that's okay; other people have issues too that you can read about. There's lots of places on the web to talk about the suckage of XML's file format. My personal favorites usually come from discussions among Lispers, because any Lisp programmer sees XML as sexps in drag.

      Now that I've ranted about XML for a while, don't miss the point of my post (the GPP). XML brings some fine things to the table; I've spent the last year working heavily with XML by my own choice. But it's not a great file format. Many people see just the file format, and think that XML sucks. My point was to bring other aspects of XML to light, aspects that make XML a good thing overall.

      And what's your propos

    24. Re:n00b - help! by Kjella · · Score: 1

      hink of it more like CSV than mySQL. It's just a format for representing structured data. It also happens to be that it's quite easily read by humans.

      Never mind easily read by developers. That is my big "wohoo" about it. I do some work using binary protocols, and it is a pain in the butt to update. More often than not, an XML problem can be solved by doing a debug ->toString() and see wtf just went wrong with your logic.

      As a format, XML is to binary protocols what high-level languages (flamewar-proof formulation) are to assembler. Bigger, more bloated, abstracted out, easier to read but it saves the thing that matters most in 95% of the cases - developer time. If I can have a working XML solution out the door by the time you're still stuck fixing your buggy homemade format, you're SOL.

      One of the few structure of any size I use it for is 42k raw data, which gets XML'd to ~168k, then zipped back down to 70-80k for transfer. Close to 100% overhead. But 30k? What's that today? Nothing. I can understand if you have huge datasets with millions of records, but otherwise I couldn't care less that it is bloated.

      Kjella

      --
      Live today, because you never know what tomorrow brings
    25. Re:n00b - help! by Anonymous Coward · · Score: 0
      That's a great attitude. How long do you think it will take your business partners to choose to partner with somebody that doesn't force their own preferences on them? Or more likely, how soon before your employer realizes your hard-headedness is going to drive their business partners to a more flexible competitor, and dumps you for somebody who plays well with others?

      Seriously, at first I thought you were just a troll, but your use of hard line breaks speaks volumes of your 'gotta have it my way' attitude.

    26. Re:n00b - help! by spuke4000 · · Score: 1

      I don't think you need a database necessarily, and you don't need 1337 XML skillz either. Check this out for a simple tutorial on XMLHttpRequests.

      --
      This post cannot be rebroadcast without the express written constent of Major League Baseball.
    27. Re:n00b - help! by 14erCleaner · · Score: 1
      Look at XPriori (where I work) if you want a native XML database that you can try and use for personal use, for free.

      XML in, XML out, with a web-based console and APIs for C++, Java, and .NET.

      --
      Have you read my blog lately?
    28. Re:n00b - help! by zipwow · · Score: 1

      So... is your saying that "XML sucks as a data storage language" kind of like Winston Churchill saying, "Democracy is the worst form of government except for all those others that have been tried."?

      Which I read to mean, "It works, but there's room for improvement"?

      -Zipwow

      --
      I don't know which is more depressing, that 2/3 didn't care enough to vote, or that 1/2 of those that did are crazy.
    29. Re:n00b - help! by Anonymous Coward · · Score: 0

      My last Ford was an XSLT.

  8. Hey, come on... by Anonymous Coward · · Score: 5, Funny

    XML is all about loosely bound interfaces.

    Get with the program.

    1. Re:Hey, come on... by LiquidCoooled · · Score: 1

      CLASSIC!

      You owe me a new keyboard and cup of coffee.

      --
      liqbase :: faster than paper
    2. Re:Hey, come on... by sapgau · · Score: 1

      Right. And the if the boundaries are between remote systems the better it is.

      Not the same as "program to an interface, not an implementation"... (The three amigos book)

  9. Dear XML-Junkies, by Letter · · Score: 5, Funny

    <letter>
    <salutation>Dear XML-Junkies</salutation>
    <body>
    I type all my business letters in <link href="http://www.google.com/?q=XML>XML</link>. Sometimes it can be a bit <link href="http://dictionary.reference.com/search?q=ver bose">verbose</link>.
    </body>
    <signature>
    <name ><nickname>Letter</nickname></name>
    </signature>
    </letter>

    1. Re:Dear XML-Junkies, by Anonymous Coward · · Score: 0

      That's considerably less verbose than the HTML to render the typical slashdot comment.

    2. Re:Dear XML-Junkies, by Anonymous Coward · · Score: 0

      Dear Letter,

      I see from your comments page you have pulled yourself out from trolldom.
      WELL DONE!

      I enjoy reading your postings, keep up the good work :)

      A Fan

    3. Re:Dear XML-Junkies, by refactored · · Score: 2, Funny

      nsgmls:letter.xml:1:0:E: no document type declaration; will parse without validation
      nsgmls:letter.xml:4:78:W: character "" is the first character of a delimiter but occurred as data
      nsgmls:letter.xml:4:78: open elements: letter body
      nsgmls:letter.xml:4:114:W: character "" is the first character of a delimiter but occurred as data
      nsgmls:letter.xml:4:114: open elements: letter body
      nsgmls:letter.xml:4:132:E: net-enabling start-tag not immediately followed by null end-tag
      nsgmls:letter.xml:4:132: open elements: letter body
      nsgmls:letter.xml:4:46:E: literal is missing closing delimiter
      nsgmls:letter.xml:4:46: open elements: letter body

  10. XML Seems Cool by Aknaton · · Score: 2, Insightful

    XML seems cool to me. I like the thought of being able to design a schema to suit my personal needs. But when it comes time to make use of that schema and actually keep data in it, it seems to be useless, as least as far as an end user (non programmer) is concerned.

    Do I have the wrong impression?

    1. Re:XML Seems Cool by gizmofan · · Score: 2, Insightful

      XML is a way of decorating data with meaning but it's not the most efficient or effective way of doing it. From a software point of view it's expensive to parse - incredibly so when heavily nested/structured and just in terms of size it can be huge in terms of the raw data that it's actually transmitting. The main problem I have with the way XML is often used is the fact that's it's the worst of both worlds. It documents the data that it encapsulates badly from a human point of view (it's difficult to read and repetitive) and verbosely from a machine point of view (ditto). Why not use something more apt from a machine point of view (lisp s expressions?) and something more apt from a human point of view (a document?).

    2. Re:XML Seems Cool by Anonymous Coward · · Score: 0

      First of all, you don't "keep data in" a schema. A schema describes the structure of some data.

      An XML file is simply that, a *file*. XML is a file format, like GIF or CSV. It isn't a database, or even a model for accessing and processing data. It's just a way of storing data as a stream of characters.

      There are some folks who are trying to invent a hierarchic data model for XML that looks like DOM, but they are quite misguided. I should say "reinvent" because the hierarchic data model was tried and rejected in the 60s and 70s as not being general enough, and being hard to formalize.

      So XML isn't really "useless", but keeping data in XML files is probably a bad idea. What if you mistype one character in one tag for instance? What does your document mean now?

      If you need to store data, do it in a database that enforces constraints and guarantees your data conforms to your schema at all times. Then put a UI on it that makes it easy to find and enter data. That's how apps should be designed. Data and UI with some glue in between.

    3. Re:XML Seems Cool by Creosote · · Score: 2, Insightful
      XML isn't really "useless", but keeping data in XML files is probably a bad idea. What if you mistype one character in one tag for instance? What does your document mean now?
      This is sort of like saying that programming in C is a bad idea, because what happens if you mistype a function name, and your program refuses to run? That's what debuggers are for. Likewise, the XML world is full of open-source or low-cost schema-aware editors and validators. Minimally you should use an editor that knows which elements and attributes are legal while you're entering data. If you design a schema appropriately for your data, you can constrain data types with a great degree of precision.
    4. Re:XML Seems Cool by elharo · · Score: 2, Informative

      Please don't tar XML with the schema brush. One of the unique innovations of XML is that schemas are optional, and need not be agreed on. Schemas can be useful as I discuss in Item 37. However, they are misused and overused far more often than they're used correctly.

      Really, schemas are just convenient tools for a few special purposes. Not everyone needs them, and no one needs them all the time. Schemaless XML is a lot more interesting and practical.

    5. Re:XML Seems Cool by zipwow · · Score: 1

      Why not? Because those two documents (lisp s expressions and documentation thereof) are only in synch once, at creation. From there forward, never again.

      Everyone here at /. whines about the readability of XML. I have yet to see an example of an improvement.

      As for performance, for 99% of your applications, it just doesn't matter. Software analysis and development time is much more expensive than clock cycles.

      Would I use XML for a database? Probably not without a lot of convincing. Do I use it for data exchanges? Absolutely, and I don't know of a better tool.

      -Zipwow

      --
      I don't know which is more depressing, that 2/3 didn't care enough to vote, or that 1/2 of those that did are crazy.
    6. Re:XML Seems Cool by R.Caley · · Score: 1
      Why not use something more apt from a machine point of view (lisp s expressions?)

      Have you ever tried finding where a brace has been lost in a machine produced s-expression?

      Of course, you can indent the expression to make error detection and correction easier, but then you have created something with exactly the same type of redundancy as XML, for the same reason, except it's actually harder to hand edit.

      Redundancy is a Good Thing in an archiveing and interchange format.

      --
      _O_
      .|<
      The named which can be named is not the true named
  11. Yes, it's a great book ... so far by page275 · · Score: 2, Informative

    I just bought a book a couple days ago. Great one so far, even it does not teach you XML, but for anyone who have even small experience with XML, the book is still great. Just like me, you will pick up really fast.

  12. Mod parent up by atomm1024 · · Score: 1
    Come on, mods, a "troll"? That wasn't a troll, it was the truth!

    I agree that it's too wordy and hard to parse, and I definitely don't think it's human-friendly. (Only if one's been immersed in it for a while does it become easily readable.)

    I also dislike the XML data model at all. I strongly prefer the RDF data model (not to be confused with the bad XML serialization of RDF), basically a set of subject-predicate-object triples. It's a much more natural data model: things have properties, and they do actions. It's as simple as that. XML's inherently tree-like structure is much more awkward for real-world and purely electronic data alike.

    Personally, my favorite structuring language is Notation 3 (a very readable extended RDF serialization).

    --
    Signature.
    1. Re:Mod parent up by agraupe · · Score: 1

      It was moderately okay before people tried doing complex things with it, which brought in schemas and namespaces and other things of that nature. Writing a pure xml file to be paired with CSS for browser display, free of any namespaces or other elements that may have a point but decrease readability, is relatively easy to do, and is later much easier to read that HTML.

    2. Re:Mod parent up by Anonymous Coward · · Score: 1, Insightful

      I strongly prefer the RDF data model

      Ugh, no. How do I say that object X has the following TWO properties? I can't. I have say: "person has first name tom". "person has last name jones". I can't say "person has first name tom and last name jones".

      The Relational model is the best model for data because 1) it allows multiple attributes in a single predicate and 2) you don't have to repeat the attributes, they just go in the relation header. But these are DATA MODELS, they aren't TEXT FORMATS, which is what XML is. They are trying to reverse-engineer a hierarchic data model for XML, but hierarchic data models are flawed because they are optimized for certain uses and not others (i.e., what if the data I want is at the leaves of the tree?)

    3. Re:Mod parent up by elharo · · Score: 1

      There is no such thing as "the XML data model". There are XML data models, in fact any number of them. For instance, right now I'm working on a program that processes XML a as linear stream of events, with little if any hint of a tree structure anywhere to be found.

      There is not now, never has been, and never will be one canonical XML data model. XML is about syntax, not data models. Data models are local and non-exchangeable. Syntax is interoperable and transferable. This is one of the points I try to bring out in the book.

    4. Re:Mod parent up by atomm1024 · · Score: 1

      <#person>
      :firstName "Tom";
      :lastName "Jones".

      --
      Signature.
    5. Re:MOD PARENT UP by Anonymous Coward · · Score: 0

      no

  13. Tip #1 by Anonymous Coward · · Score: 1, Insightful

    1) XML is not designed to be used for everything under the sun.

  14. FYI by Anonymous Coward · · Score: 5, Informative

    Bookpool has it for $28.50. Don't click the bn sponsored link (where it's a whopping $44.95).

    PS, I don't work for Bookpool, I hate it when /. gets a kickback from doing something dumb like clicking the link to overpriced merchandise.

  15. Try the other "Effective" books, too by Eric+Giguere · · Score: 4, Informative

    If you like this book, don't forget to check out Scott Meyers' Effective C++ or Joshua Bloch's Effective Java. Both are great. I devoured Meyers' book when it first came out, and I was happy to see Bloch's book was similarly useful. There is also an Effective Perl book out, but I don't know how good it is -- it follows the same general format, but hasn't been updated since 1997. (Neither has the C++ book, but C++ hasn't changed that much since then.)

    Eric
    See your HTTP headers here
  16. What XML is good for. by mikeumass · · Score: 1

    XML is excellent for data exchange and providing an open standard for interoperability. It provides a way to present data that can be used in software desgined by different vendors and even on different architectures. However it does have it's downfall, and that is that it is wordy and overly inefficent. Any programmer worth what he is being paid, knows that you don't represent your data internally by XML. When your program starts you parse the XML into a nice data structure that can be quickly accessed and modified. When the program closes you convert the data structure back to XML and save it.

    1. Re:What XML is good for. by Anonymous Coward · · Score: 0

      it does have it's downfall

      "its".

  17. Just because you CAN... by IGnatius+T+Foobar · · Score: 4, Insightful

    Sometimes, the most effective use of XML is to simply not use XML at all. XML is a wonderfully useful tool when applied correctly. It's architecture-independent and is a great way to communicate unstructured and/or hierarchial data.

    Sometimes, though, your data can be simple enough that XML is overkill. Software developers need to make themselves aware of situations when they might be better served by a simple "flat file" of delimited data. In situations like this, using XML can amount to what I like to call "gratuitous complexity."

    Always use the right tool for the job.

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
    1. Re:Just because you CAN... by stratjakt · · Score: 1

      You can't learn that from a book, only from experience.

      The problem with most "how to program" books is that they use trivial examples to show an advanced concept. For instance, every tutorial about recursion out there uses calculation a factorial as an example. What they never mention is that when you calculate 1000! recursively, you push a thousand function calls onto the stack, and basically waste a whole lot of the computers time. "for(i=1;i=1000;++i) result*=i;" is a much more efficient and practical solution.

      I'm working with a kid who's just cutting his teeth in the real world. He's fresh out of learning about parsing trees and basic compiler theory, and for every simple data validion task, he wants to write some 2000 line object oriented parsing subsystem, when a simple regex not only does the job, but does it faster and with less resources.

      They tend to think that "skillz"="complex code". One day it'll dawn on him just how much time he's wasted doing it all the hard way, and that skill is "working smarter, not harder", to apply a cliche.

      I saw a system that just persisted a counter, a single integer, to a file. And it was all in xml, wrapped in a jillion tags. Of course, it was completely unreadable, because theres so much crap, it gets hard to find the actual data, which was the number "9".

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:Just because you CAN... by gremlins · · Score: 1

      I don't really agree that using XML is overkill. Infact I think xml is great for just about everything. What causes most problems is the code used to process XML into something you want to use. When you use XML it gives you the ablity to later have another application use that same data alot quicker. With a flat file you would need write up a way for your new app to interpret your old file format. I do agree that coders have to be aware at the overhead of the XML processing libs they choose to use however it is just as easy to regex simple xml.

      --
      just because your a schizophrenic doesn't mean people arn't really out to get you
    3. Re:Just because you CAN... by sfjoe · · Score: 1

      Sometimes, though, your data can be simple enough that XML is overkill

      True enough, however, simple data all too often becomes complex data. That's why it's a good thing to be "extensible".

      --
      It's simple: I demand prosecution for torture.
    4. Re:Just because you CAN... by Tablizer · · Score: 1

      True enough, however, simple data all too often becomes complex data. That's why it's a good thing to be "extensible".

      Delimited can also be extensible if there is a field definition header. The problem is that there is no standard.

      Wiki notes on improved delimited format suggestions.

    5. Re:Just because you CAN... by elharo · · Score: 2, Informative

      These days data has to be pretty damn simple to justify using a flat file rather than XML. I wrote more about this in my previous book, Processing XML with Java than in this one, though. Chapters 1-4 discuss this in some detail.

      Real-world data often gets messy in ways that don't lend themselves to flat files. For instance, two of the thorniest problems:

      1. How do you handle encoding detection and international characters?
      2. What do you do when the data contains characters you're using as field delimiters?

      Both of these are completely solved by XML with no extra effort on your part, and these are hardly the only issues.

      I certainly agree that it's easier to write a parser for a flat file format than it is to write a parser for XML. However, it's much easier (and much more reliable) to use one of the existing well-tested, debugged XML parsers than it is to write your own flat-file parsing code.

    6. Re:Just because you CAN... by snorklewacker · · Score: 1

      2. What do you do when the data contains characters you're using as field delimiters?

      Both of these are completely solved by XML with no extra effort on your part, and these are hardly the only issues.


      CDATA is delimited at the end by ]]>. There is no way to escape this delimiter. If you need to enclose one XML fragment in another using CDATA, you had best base64 encode it, because there simply isn't any way to nest them.

      This is not what I call "well thought out"

      --
      I am no longer wasting my time with slashdot
    7. Re:Just because you CAN... by elharo · · Score: 1

      CDATA sections don't need to nest. If you're trying to nest them, you're doing something wrong. CDATA sections are merely syntax sugar. (Items 9, 14 and 15) You absolutely can include the three character sequence ]]> in XML documents. You just have to escape the greater than sign as &gt;.

      The point is not that escaping is not necessary when creating an XML document. The point is that the escapes you need are predefined and understood by the parser. You don't need to think about them.

      I've seen way too many CSV and similar flat-file parsers that keel over and die (or worse, corrupt data without noticing a problem) when presented with data that contains commas, tabs, quotation marks, line breaks and the like.

      XML avoids this by providing necessary escapes. Furthermore, when you receive an XML file, you know what the escapes are. You don't have to guess whether this file uses \" or "" or some other mechanism for escaping otherwise reserved characters. It's not that XML's escape mechanism is fundamentally better or worse than other escape mechanisms. It's just that it's standard enough that we can stop worrying about it.

    8. Re:Just because you CAN... by magi · · Score: 1

      Like, I recently encountered a large project where they have, after many years of development, published a "revolutionary" XML format for...tables.

      XML exists because using relational databases with inherently hierarchical data creates a lot of problems. You have to arrange the data artificially to numerous - sometimes even hundreds - of tables, and make tedious normalizations to make them efficient. Then you need to write complex JOIN queries to get your data back, which requires very complex and efficient RDBMS:s. But still, they have sticked with the relational databases for decades because tables are so damned fast to process.

      And now, people are representing tables in XML, and are looking for techniques to process it fast. Damn, can't even make random access any longer...

      Well, the only excuse for using XML for tables would be uniformity, as when you have tables embedded in HTML.

    9. Re:Just because you CAN... by mwlewis · · Score: 1
      What they never mention is that when you calculate 1000! recursively, you push a thousand function calls onto the stack, and basically waste a whole lot of the computers time. "for(i=1;i=1000;++i) result*=i;" is a much more efficient and practical solution.
      Ok, ignoring your actual point, but I couldn't resist...What's practical about 1000!? And what were the specs on the machine that ran the code?
      --
      JOIN US FOR PONG!
    10. Re:Just because you CAN... by snorklewacker · · Score: 1

      > CDATA sections don't need to nest. If you're trying to nest them, you're doing something wrong. CDATA sections are merely syntax sugar. (Items 9, 14 and 15) You absolutely can include the three character sequence ]]> in XML documents. You just have to escape the greater than sign as >.

      You are wrong. Entities do not decode in CDATA, so you do not get a greater-than sign, you get ampersand, g, t, and semicolon. CDATA sections are the only way to get to CDATA mode, whereas the rest of XML is in PCDATA mode (the P stands for "parsed"). You've failed to impress me with the official answer of "don't do that".

      The fact is, they could have come up with an escape sequence for just that delimiter, and an escape for that escape. It's not infinite regress -- that's as far as you have to go, and it's not something that has to be special cased any more than that escape sequence has to be in the first place.

      --
      I am no longer wasting my time with slashdot
  18. Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

    Why, oh WHY do people generate XML this way? Compared with:

    echo ("<root><items>");
    $qu = mysql_query("select * from items order by Id asc");
    while ($s = mysql_fetch_array($qu, MYSQL_ASSOC))
    {
    echo("<item>");
    foreach ($s as $k => $v)
    {
    echo("<$k>$v</$k>");
    }
    echo("</item>");
    }
    echo("</items></root>");

    If you're going to be passing around DOM trees in your script, then fair enough, but if you are simply outputting data, it's quicker and easier to simply write it out.

    1. Re:Dear Lord make the madness stop! by DarkHelmet · · Score: 1
      Yeah, you're right. I'm just used to having to deal with a DOM passed to me anyway.

      Also, most of my stuff nowadays involves transforming with XSLT, so I need to create a DOM object anyway.

      BTW, the example I did was for PHP5.

      --
      /^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
    2. Re:Dear Lord make the madness stop! by Osty · · Score: 1

      If you're going to be passing around DOM trees in your script, then fair enough, but if you are simply outputting data, it's quicker and easier to simply write it out

      Except that now you introduce the potential to have bugs that you wouldn't get from a DOM tree. For example, building a document through a DOM tree makes it impossible to forget a / on a close tag, or to open a tag as "<foo>" but close it as "</Foo>". Sure, it's quicker, and in trivial examples you're not likely to have a problem. Real-world problems are not always as simple as examples, and you're trading the memory used by building a DOM for the accuracy of building your tree within the DOM rather than ad-hoc by hand.

    3. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      For example, building a document through a DOM tree makes it impossible to forget a / on a close tag, or to open a tag as "" but close it as "".

      But if you do that, you find out about it immediately anyway, since any conformant XML parser will point it out to you when you try and use that data.

      You might as well complain that it's possible to make a syntax error in your PHP - sure, it's possible, but it's not a problem because it's immediately obvious and easily fixable.

    4. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      Heh, or just do this:

      $qu = mysql_query("select * from items order by Id asc");
      foreach(mysql_fetch_fields($qu) as $k)
      echo "$k\n";
      while ($s = mysql_fetch_array($qu, MYSQL_ASSOC))
      foreach ($s as $v)
      echo "$v\n";

      You don't have to repeat the tags on every piece of data, in fact you don't need *tags* at all for data interchange.

    5. Re:Dear Lord make the madness stop! by Ivan+Todoroski · · Score: 1

      Not always. As soon as you start dealing with more complex XML, which can contain certain optional elements depending on complex conditions, things will get messy. Errors in rarely used parts of the generated XML document can get past you easily.

      Also, even if the tags in your example are correct, your code still has a serious bug: what if your $v variable sometimes gets a '<' character in it (e.g. as a result of user input)? Your program would fail immediately, whereas DOM automatically escapes such characters before inserting the text into the document tree. You just proved how easy it is to introduce bugs using your method.

    6. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      Congratulations, you've just corrupted the data every time the application encounters records with new lines.

      XML exists because it's a standard syntax. When you invent your own syntax, you fail to consider exceptional cases. Once you start making up rules about how to handle the exceptional cases, implementing the special cases in your custom-written parser, deciding which delimiters to use and so on, you soon realise that it's quicker, easier and more reliable to simply use a standard format that has already done all of this for you.

    7. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      Errors in rarely used parts of the generated XML document can get past you easily.

      Not if you test properly. In practice, I haven't found this to be a problem.

      what if your $v variable sometimes gets a '<' character in it

      I omitted the escaping because I couldn't remember off the top of my head which of the three(?) different PHP markup-escaping functions works for XML. I'd certainly expect anybody implementing something like this to be aware of simple stuff like that, it's just I haven't written PHP recently.

      Your program would fail immediately

      Precisely. It's not a bug that goes unnoticed, it's a syntax error that you immediately fix. The impact of such bugs is twenty seconds of typing. I believe the impact of maintaining twice as much code (actually, much more in anything other than simple examples) is far greater.

    8. Re:Dear Lord make the madness stop! by Ivan+Todoroski · · Score: 1
      You're missing the point.

      Errors in rarely used parts of the generated XML document can get past you easily.

      Not if you test properly. In practice, I haven't found this to be a problem.

      With the DOM approach, you don't have to test the minutiae of the XML generation part of the program. Instead, you can spend that time more productively by better testing other more relevant parts of your program logic. Also, the argument that generating XML directly is easier is true only for simple examples like this, it gets complicated pretty fast as your requirements grow more complex.

      what if your $v variable sometimes gets a '<' character in it

      I omitted the escaping because I couldn't remember off the top of my head which of the three(?) different PHP markup-escaping functions works for XML.

      Exactly. With the DOM approach, you don't have to remember this. It's easy to miss it in more complex XML generation routines, and after you've done all the escaping and error checking that DOM already does for you, will your program really be that much more readable than the DOM approach?

      Your program would fail immediately

      Precisely. It's not a bug that goes unnoticed, it's a syntax error that you immediately fix.

      Umm... no. Your program will get compiled (or parsed by the interpreter) like nothing is wrong. There will be no syntax error. I meant above that it would fail to generate proper XML as soon as it receives an incorrect character during runtime, which can happen well after development, when the program is already deployed to customers, and all that hassle could have been avoided by using APIs that were specificaly designed for XML parsing/generation/manipulation, which have been debugged and field-tested countless times before.

      You won't catch this failure unless you specifically test for this type of escaping bug, for that exact variable. The more complex your XML generation routines, the more difficult to test each possible combination of program inputs to excercise each possible XML output. Multiply this with many XML generation routines and many programs. You will miss a test, sooner or later. It is naive to generalize from your simple example above, and to think that you can test everything. Sure, bugs will always happen, but why compound the problem when you can reduce the number of bugs ever so slightly by using safe APIs that can also make your code more organized?

      Of course, DOM doesn't solve everything, you will still have tests even in the DOM case, but atleast the DOM API is a safety net that will catch and handle some of these most basic corner cases that your tests might miss.

      I've been bitten with small stuff like this (not only XML related) enough times to know better than try to splice strings together to generate XML, or try to take other quick & dirty shortcuts in bigger programs. I learned the hard way that shortcuts don't always cut short.

      One additional bonus of using the DOM API is that the XML can optionaly be output indented by any half-decent DOM serializer, which makes the generated XML easier to debug in protocol dumps or whatever. Trying to generate properly indented XML from nested if/for/whatever code split accross multiple functions would require non-trivial additional effort better spent elsewhere.

      And it's not only simple things like escaping variables or mismatching tags... when you also factor in namespace handling in complex documents with multiple namespaces, the DOM API with its automatic hierarchical namespace & prefix management starts to look really good. Similar arguments apply to XML entities, processing directives, CDATA, etc.
    9. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      With the DOM approach, you don't have to test the minutiae of the XML generation part of the program.

      No, you're not seeing the bigger picture. You are saying that you'll miss the errors because the malformed part of the XML document isn't always generated.

      ANY decent testing - testing that has to take place anyway - would immediately uncover this problem. You are talking as if you need to test the XML syntax specifically, but that is simply not the case, you get it for free because XML parsers require well-formed documents.

      What you are saying is that if you don't test all the branches of your code, you won't find this error. I completely agree - but if you aren't testing those branches, you've got a hell of a lot more to worry about than XML problems!

      Also, the argument that generating XML directly is easier is true only for simple examples like this, it gets complicated pretty fast as your requirements grow more complex.

      What can I say, my experience is the exact opposite of yours; I find that generating XML through the DOM gets increasingly unwieldy as the documents increase in complexity, while simple echo()ing scales appropriately.

      With the DOM approach, you don't have to remember [escaping].

      Remembering it isn't an issue if you are actually working with that language. Omitting it because I couldn't remember the API for a language I haven't used in a while is hardly proof that this is some onerous burden.

      It's easy to miss it in more complex XML generation routines

      What makes you say that? It's a simple case of an escape call every time you include raw data in the output.

      after you've done all the escaping and error checking that DOM already does for you, will your program really be that much more readable than the DOM approach?

      Oh come off it! You are talking about "all the escaping and error checking", when in reality, it's a case of escapefunction($foo)! Don't you think you're being a bit misleading by using such language?

      Yes, the program really will be more readable than the DOM approach. Echoing:

      echo("<root><items><items id="1">This is a test.</items>");
      for ($i = 2; $i <= 10; $i++) {
      $text = escapefunction($data[$i]);
      echo("<item id="$i">$text</item>");
      }
      echo("</item></root>") ;
      }

      Using the DOM:

      $dom = new DomDocument();
      $root = $dom->createElement("root");
      $dom->appendChild($r oot);
      $items = $dom->createElement("items");
      $initial = $dom->createElement("section");
      $initial->setAttr ibute("id", "1");
      $text = $dom->createTextNode("This is a test.");
      $initial->appendChild($text);
      root->app endChild($initial);
      for ($i = 1; $i <= 10; $i++) {
      $item = $dom->createElement("item");
      $item->setAttribute( "id", "$i");
      $items->appendChild($data[$i]);
      }
      echo $dom->saveXML();

      That's about as simple as the real world gets. If you're generating something even moderately complex, the number of lines of code shoots way up. You typically need over half a dozen function calls to output a single XHTML image!

      Umm... no. Your program will get compiled (or parsed by the interpreter) like nothing is wrong. There will be no syntax error. I meant above that it would fail to generate proper XML as soon as it receives an incorrect character during runtime, which can happen well after development

      I understand the mechanisms, thanks. What I am assuming is that there is a halfway competent testing procedure in place - not specifically for the XML, but for the application itself. If you haven't caught a syntax error in your XML, it's because you haven't tested something.

    10. Re:Dear Lord make the madness stop! by Azghoul · · Score: 1

      Just thought I'd point out that I've read the whole thread; I agree with you completely. The AC doesn't seem to realize that with DOM you don't have to worry about echo()ing exactly the right stuff at the right time.

      For almost anything non-trivial I'd use DOM over echo. The only reason not to is laziness or 1337ness.

    11. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      The AC doesn't seem to realize that with DOM you don't have to worry about echo()ing exactly the right stuff at the right time.

      Listen, I'm not exactly unfamiliar with the DOM. I've been on the opposite end of the argument and argued for the DOM in some applications. But for just exporting in a controlled environment, simply echo()ing is almost always the better solution.

      If you disagree, then by all means expand upon "worrying about echo()ing the right stuff at the right time", because as far as I can tell, that statement applies to all forms of programming everywhere.

    12. Re:Dear Lord make the madness stop! by Ivan+Todoroski · · Score: 1
      Again, you fall prey to using trivial examples to argue something which applies to more complex documents.

      Even then, you still fail it. Your echo() example has a bug again! You have an <items> element that is being terminated by </item>.

      Also, you don't do DOM justice in your DOM example, it can be simplified. You also have a bug in it also, you never attach the $items element enywhere, so I don't know how exactly your XML is supposed to look like.

      My PHP is very rusty, but I'll try to indulge your apparent fondness for trivial examples. I know you can certainly simplify things by stringing DOM calls together, to create whole branches of your XML in one go, like this:

      $dom->appendChild($dom->createElement("root"))-> ap pendChild($dom->createElement("section"))->setAttr ibute("id", "1");

      Also, I'm not sure if PHP allows in-place assignment, or white-space after the method call operator (->), but in languages like Java/C++/C#/etc you can do something like this:

      dom.appendChild(root = dom.createElement("root")).
      appendChild(items = dom.createElement("items")).
      appendChild(whatever = dom.createElement("whatever")).setAttribute("blah" , "23);

      That is, create the XML branch and assign various sub-nodes to variables for later use in one go. As a bonus, you also make your code follow the tree-like structure of your XML. This way your code ends up manipulating (and looking like) tree fragments.

      Atleast for me (and I suspect I'm not alone), it's much more natural to think of the XML as a tree-like structure of nodes that I manipulate.

      The human brain is not a stack-based machine, and thinking of XML as a nested series of open/close tags is much less intuitive, as you demonstrated by bungling your close tag in your echo() example. It's not that you can't do it, but why burden your mind with banalities like remembering the proper close tag at a given point, or remembering to escape values, when you can apply it to higher level problems? Let the machine handle the mechanical details.

      One additional bonus of using the DOM API is that the XML can optionaly be output indented by any half-decent DOM serializer, which makes the generated XML easier to debug in protocol dumps or whatever.

      Just dump the output through xmllint or view it in a web browser.

      Never said it was the ultimate advantage, just a minor bonus of using DOM.

      Trying to generate properly indented XML from nested if/for/whatever code split accross multiple functions

      That's downright disingenuous. DOM routines can also be in "nested if/for/whatever code split across multiple functions". How that code is arranged is utterly irrelevent, it's how the output can be arranged that matters, and there is no difference between the two methods in this respect.

      I'm sorry, this is just wrong. With your echo() approach, generating properly indented XML is more difficult, because you have to pass a "whitespace" or a "level of nesting" parameter around in your multiple functions for generating the XML text. With DOM, you do no such thing. You just pass your nodes around as usual, and the indentation is done by DOM at the end.

      Anyway, nobody said it is impossible to produce correct XML code with the echo() approach, but it is definitely more bug-prone, and you'll end up wasting more time going back to your code and fixing those bugs. Your repeated failure to produce correct XML code in even the most simplistic examples is an excellent illustration of my point.

      And no, "we'll catch it in testing" is not an excuse for writing shoddy code in the first place, when there are better code practices to help you and guide you along the way.

      P.S. Sorry for not providing more complicated examples with namespaces etc.,

    13. Re:Dear Lord make the madness stop! by Azghoul · · Score: 1

      Go back and reread this thread from the beginning. It's obvious there are circumstances when both methods make sense. I'm happy you're familiar with DOM, but come on, if you can't see that manipulating a DOm is better for you in the long run, then have a gay-all time playing with string concatenation.

      I'm certainly not about to convince you otherwise.

    14. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      Again, you fall prey to using trivial examples to argue something which applies to more complex documents.

      Again, you say that without backing it up. Like I say, I find the more complex the documents get, the more overbearing the DOM becomes. You haven't offered a single reason to believe otherwise.

      Even then, you still fail it. Your echo() example has a bug again! You have an <items> element that is being terminated by </item>.

      Great! Explain to me how that bug would not be noticed immediately.

      Making a typo without realising is something that happens to all programmers, every day. I could just have easily mistyped the element type name in the createElement call when using the DOM with similar results - in fact, worse results as the error would not be an immediately obvious syntax error.

      The errors the DOM protects you from are a small fraction of the type of errors that happen every single day to every single developer without consequence. Sure, you can cut out 0.1% of those errors by using the DOM. You'll still be making the other 99.9% of errors, and the mechanisms that catch those would have caught the 0.1% as well. Only now you need to deal with lots more code - something that does have consequences.

      I know you can certainly simplify things by stringing DOM calls together, to create whole branches of your XML in one go, like this

      That wasn't possible in previous versions of PHP, but I've just tried it in PHP5 and it seems you can dereference objects returned from functions now. Thanks for the shortcut.

      Unfortunately, the shortcuts you describe aren't as effective in more complex documents. You can't use the shortcut to append three siblings to a single parent, for example, or add half a dozen attributes to an element in one go. You simply can't be as concise as echo("<img src='$filename' alt='$alt', width='$width' height='$height' title='$title'>");

      It's not that you can't do it, but why burden your mind with banalities like remembering the proper close tag at a given point, or remembering to escape values, when you can apply it to higher level problems?

      I don't see any difference between the "banality" of closing tags and the "banality" of closing parentheses in a function call. Yeah, occasionally you forget them or make a typo, but you find out straight away and it's a doddle to fix. The same goes for escaping, but perhaps I'm just used to it because I deal with markup every day.

      With your echo() approach, generating properly indented XML is more difficult, because you have to pass a "whitespace" or a "level of nesting" parameter around in your multiple functions for generating the XML text.

      You missed my point. Why on earth would your program need to pretty print serialised XML? If you are debugging something, just use xmllint/a web browser/an XML editor to view it. No need for your program to handle that.

      So no, there is absolutely no difference whatsoever between the DOM and echo() when it comes to pretty printing.

      Anyway, nobody said it is impossible to produce correct XML code with the echo() approach, but it is definitely more bug-prone, and you'll end up wasting more time going back to your code and fixing those bugs. Your repeated failure to produce correct XML code in even the most simplistic examples is an excellent illustration of my point.

      You're only looking at one half of the equation in that statement though. If the DOM didn't have the downside of needing more code, then I'd take exactly the same attitude as you. But it does. That extra code, the lower readability and the extra complexity all have prices to pay.

      I'm paying the price for typos in my code already. I already have methods for dealing with typos. Every programmer does. In that context, the DOM gives me very little, and it costs more.

      And please. Typos in a textarea a f

    15. Re:Dear Lord make the madness stop! by Anonymous Coward · · Score: 0

      Go back and reread this thread from the beginning. It's obvious there are circumstances when both methods make sense.

      Please take your own advice. I have agreed from my very first comment that there are circumstances when both methods make sense.

      Now, if you want to actually contribute something to the discussion instead of AOLing, then please do so.

  19. overstock by Quiet_Desperation · · Score: 2, Informative

    $28.27 at overstock.com.

  20. Really? by stratjakt · · Score: 1

    <name><given>John</given> <family>Doe</family>, <title>Ph.D.</title></name>. If you like that, you'll enjoy this whole section.

    What if I don't like that? What if I hate trivial useless examples that don't mean anything in the real world?

    Noone's ever asked me to write a program that prints "hello world" on the console and then exits.

    I'm more interested in using XML as a means for language independent object persistence (not just cheesy .NET XmlSerializer class stuff either). How much coverage of such things is there in the book? Ie; creating an object in Java on one machine, persisting it and it's state to an XML file, and recreating it on some other machine in C++ or C#. I'm tired of writing my own "protocols" to migrate running code from one app to another.

    How about binary XML implementations?

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:Really? by sosume · · Score: 2, Insightful

      I'm more interested in using XML as a means for language independent object persistence (not just cheesy .NET XmlSerializer class stuff either). How much coverage of such things is there in the book? Ie; creating an object in Java on one machine, persisting it and it's state to an XML file, and recreating it on some other machine in C++ or C#. I'm tired of writing my own "protocols" to migrate running code from one app to another.

      You have obviously never looked into soap, which seems to be able to address every requirement you are describing.

      But, not using Soap is quite common on Slashdot ;)

    2. Re:Really? by stratjakt · · Score: 1

      No, I'm not talking about SOAP. I'm not talking about DCOM, CORBA or any other means of RPC either.

      I'm talking more about migrating running processes from one machine to another, by persisting the objects, and recreating them. More along the lines of having MOSIX built into the application.

      --
      I don't need no instructions to know how to rock!!!!
    3. Re:Really? by elharo · · Score: 1

      I'm very skeptical of so-called binary XML formats, as you'll find in Item 50, Compress if Space is a Problem. There are use cases where XML isn't appropriate (and I discuss these in the book, mostly data scanned from nature such as JPEGs and MP3s) but it isn't at all clear how a binary encoding of XML, would help these use cases. There are also environments like the smaller cell phones where XML doesn't (yet) work very well. Again, moving to binary doesn't necessarily address the underlying issues here. Furthermore, developing new formats tailored to special purposes and environments such as cell phones and scientific data, tends to deoptimize XML for other uses. XML isn't an optimal format for any one use case, but it's a very nice compromise across many different areas.

      The one use case a binary XML encoding does address well is the need of a number of vendors to sell expensive tools for working with data and hide people's data from them. XML is just too obvious and too cheap to justify lots of expenditures on tools. If you hide the text inside an opaque binary format that programmers need special (even patented) tools to view, why then, companies can sell tools again! Surprisingly, I don't find this use case too compelling. :-)

    4. Re:Really? by sosume · · Score: 1

      No, I'm not talking about SOAP. I'm not talking about DCOM, CORBA or any other means of RPC either.

      I'm talking more about migrating running processes from one machine to another, by persisting the objects, and recreating them. More along the lines of having MOSIX built into the application.


      A decent soap implementation can provide you with the xml representation of an object's state, which can be recreated and manipulated on almost any platform. what else would you like to transfer? You cannot store language independent method implementations in XML, its just the state.

    5. Re:Really? by elharo · · Score: 2, Interesting

      There's a very real tension between making examples too trivial to be interesting and making them too long to be readable. I struggle with it in every book I write, and every other programming book author I know does so too. I've tried putting so-called real-world examples in books, and it's hopeless. It can't be done. There wouldn't be any space left for the explanatory text, nor would anyone put up with reading page after page of code.

      Most importantly, while I tend to be writing about just one topic at a time, real world programs wander all over the map. I may be trying to explain how to use callbacks in SAX, but a realistic program also has to consider network latency, GUI design, error logging, numerical algorithms, internationalization, and a hundred other things that aren't on topic. Covering them all would obscure the subject I'm actually trying to explain. Some things you just have to leave for other books and other authors.

      As an author, I try to strike the right balance between excessive simplicity and excessive length. Sometimes I hit it. Sometimes I don't. I actually think Effective XML hits it fairly well. In fact, this book was one of the toughest I ever had to write, precisely because it was so short that I couldn't spew pages like I did in Processing XML with Java (1100 pages) or the XML 1.1 Bible (1000 pages). I had to be really picky about how much code I included, and make sure that each example carried its weight, demonstrated just the point at hand, and nothing else.

      By the way, the chapter with that specific example is online if anyone cares to see for themselves just what it is that makes names a more interesting and complex problem than "John Doe Ph.D" seems to be at first glance.

    6. Re:Really? by elharo · · Score: 1

      XML works extremely well as a mechanism for language independent object persistence, precisely because XML is language independent. It's not tied to any one language's structures or data types. The key to using it this way is to simply define an appropriate XML format for your data, and then write the code to persist that format. It's actually quite easy to do.

      The problems arise when you start drinking the snake oil that many object-to-XML mappers are trying to sell you, both in the payware and open source worlds. Way too many of these tools treat XML as just a format for persisting objects, and forget that XML structures are much richer than naive object mappings sometimes allow. Mixed content, document order, multiple child elements with the same name, and invalid documents are just some of the bugbears that haunt poorly designed OO-to-XML mapping tools.

      However, if objects are what you start with, it's pretty easy to write them to XML and then read them back in again. Starting with arbitrary XML and going to objects is a lot trickier.

    7. Re:Really? by cookie_cutter · · Score: 1
      The one use case a binary XML encoding does address well is the need of a number of vendors to sell expensive tools for working with data and hide people's data from them. XML is just too obvious and too cheap to justify lots of expenditures on tools. If you hide the text inside an opaque binary format that programmers need special (even patented) tools to view, why then, companies can sell tools again! Surprisingly, I don't find this use case too compelling. :-)

      Is this not the situation which we'll have if there isn't a binary xml standard? The whole point of having a binary xml standard, or maybe it's better described as a standard mapping from text xml to binary xml, is so that we don't have proliferation of a bunch of proprietary binary xml formats trying to fill the void left by the lack of such a standard.

      Or am I missing something here?

    8. Re:Really? by elharo · · Score: 1

      Binary formats are fundamentally more opaque than text formats. You cannot just open up a binary file in emacs or jEdit and start hacking on it. They require special purpose tools to generate, edit, and consume. The more complex the format the more complex (and expensive) the tools become. Currently it's possible to generate completely well-formed, valid XML using nothing more complex than printf(). With a binary format, standard or otherwise, this would no longer be feasible.

  21. Re:fffft! by decipher_saint · · Score: 1

    01001001001000000111000001110010011001010110011001 10010101110010001000000111010001101111001000000111 00110110010101101110011001000010000001100010011010 01011011100110000101110010011110010010000001100101 00101101011011010110000101101001011011000111001100 10110000100000011101000110100001100101011110010010 00000110000101110010011001010010000001101101011101 01011000110110100000100000011011010110111101110010 01100101001000000110001101101111011011100110001101 101001011100110110010100101110

    --
    crazy dynamite monkey
  22. Re:Not only understandable and parseable.. by symbolic · · Score: 1


    Being text, it is also not tied to a specific vendor or platform.

  23. Good for him! by Anonymous Coward · · Score: 0

    John Doe's got a PhD now? Last I heard he turned up dead!

  24. A perfect eXaMpLe of a good use for XML by swrider · · Score: 5, Funny

    There are valid uses for XML. Just look at http://www.x-cp.org/

  25. Disgruntled with XML.... by MSBob · · Score: 1
    I've seen XML used for quite a while in a number of completely inappropriate situations such as configuration data, RPC implementations, scripting etc.

    Yet, every time I see XML (mis)applied in those cases I keep asking the fundamental question. What does it allow me to do that a decent Lexer and Parser does not? You could be sending grammar files just as easily and without the ridiculous verbosity of XML. Most parsers can work with either text or binary and BNF has been a golden standard for decades. XML reinvents the wheel for the umpteenth time and without a single good reason to justify its existance.

    --
    Your pizza just the way you ought to have it.
    1. Re:Disgruntled with XML.... by elharo · · Score: 2, Informative

      Hmm, that's one I haven't been asked before.

      I suspect what it offers is that you don't have to define and write your own BNF grammar, and then implement it in lex and yacc or similar tools.

      Grammar design is non-trivial, especially if you need to consider issues like internationalization. Picking XML as the underlying format means you don't have to do this work yourself. Why reinvent the wheel?

      Sometimes you do need something different, but a lot of alternative formats don't really have a good reason to exist. More often than not, custom parsers just come about because a programmer is more comfortable writing bad parsing code quickly than learning a new, more robust API in order to use someone else's parser.

    2. Re:Disgruntled with XML.... by MSBob · · Score: 1
      I agree with the i18n bit, but it's only handled in XML better by the virtue of XML being newer than most parser generating tools out there.

      As far as XML development being "easier"... I find that questionable (but it may be my personal view). If the problem domain is trivial then it might be the case that your XML schema happens to be simpler than your BNF grammar. In most non-trivial cases I find it's about even. As far as verbosity goes, 99% of the time your custom grammar will be a lot more space efficient than XML unless you define something very Cobolesque ;-)

      I also have an issue with most XML parsers where I cannot work with my own classes but have to use the silly API provided objects like TextNode, ThisNode, ThatNode (At least in Java DOM,JDOM etc...). What I usually end up doing is having a parallel hierarchy of my own classes which contain the functionality I actually need. Something like ANTLR allows me to simply provide my own classes for some or all of the AST nodes. Very handy.

      --
      Your pizza just the way you ought to have it.
    3. Re:Disgruntled with XML.... by elharo · · Score: 1

      Building dual hierarchies is indeed a problem when documents get large relative to available memory. In these cases you're normally better off using a streaming parser like SAX or StAX or System.Xml.XmlReader rather than a tree API like JDOM, DOM, etc. There's little to no extra overhead there, and it's a lot faster than writing your own grammar (more robust too). Possibly you can takle a middle ground with XOM and only build subtrees in memory.

      You might want to consider some of the XML data binding APIs. However, you need to be very careful when choosing one, as most of these tools have serious design flaws that are not always apparent at first glance; but if those flaws don't impact your specific application you may be able to get away using one.

    4. Re:Disgruntled with XML.... by MSBob · · Score: 1
      I understand you're talking about tools such as JAXB. Care to elaborate about the aforementioned design flaws? I've not been too involved with advanced XML processing (though I've seen JAXB used and didn't like what I saw but mostly due to my sense of "aesthetics").

      What are the big gotchas of those XML binding APIs as you see them?

      --
      Your pizza just the way you ought to have it.
    5. Re:Disgruntled with XML.... by rossifer · · Score: 1

      I've seen XML used for quite a while in a number of completely inappropriate situations such as configuration data

      Let me start this off by saying that I'm no fanboy of XML. If anything, I'm the local "get XML away from me" person.

      But config files are one place that I actually like XML. And I like it because these files are (1) typically fairly small and (2) I don't want to have to write and debug another lexer/parser. I have a utility (castor) that handles the parser for me based on the objects I hand it. When I want to find out what the config has been set to, I ask castor for the object graph from the config file resource and I have my objects. If I want to allow users to change configuration information, the use case changes the objects, I push the objects to castor, the file is updated. There's simply no effort involved.

      XML is slow, verbose, and is very easy to make unreadable, but none of these downsides affect its utility as a config file manager.

      What does it allow me to do that a decent Lexer and Parser does not?

      Well, my answer sounds trite: XML (with toys) gives you access to stream data without needing to put in the effort to make a lexer/parser work. Again, don't get me wrong, I think antlr is great, but it is not trivial to understand exactly what parse error caused the wrong token to be emitted. When I want simple and heavy, I'll concede to XML.

      Regards,
      Ross

    6. Re:Disgruntled with XML.... by MSBob · · Score: 1
      OK. I wasn't specific enough. When I'm talking about config files I'm mostly thinking "Java config files" or even more specifically "J2EE config files". I find that way too much stuff ends up in XML config files that should actually be implemented in the Java code itself. In some cases, perhaps scripted with Groovy or Beanshell. XML is not the vehicle for application scripting alas that's what most j2ee "config" files have become.

      Antlr (actually that's my personal favourite) is harder to learn than JDOM but not impossible. In my opinion it's just as easy to use. For very simple grammars it's even simpler than defining your own XML schema.

      As far as parsing APIs go, I really like antlr and its ability to build ASTs with a mix of its own adn client provided classes. I haven't worked with castor but if it's anything like JAXB I don't like it... and working with plain DOM/DOM4j or JDOM is a royal Pain In the Arse.

      --
      Your pizza just the way you ought to have it.
    7. Re:Disgruntled with XML.... by Anonymous Coward · · Score: 0

      They're put in config files because that stuff should be done declaratively, not procedurally. When things are done that way, you get tools support trivially as well as simpler debugging and a host of other advantages. XML is a very simple format to get that done in.

    8. Re:Disgruntled with XML.... by elharo · · Score: 1

      The big problem arises when you try to process arbitrary XML by binding it to object structures. Consider, for instance, trying to data bind XHTML. It can be done, but you're unlikely to come up with anything simpler than DOM in which case, why not just use DOM?

      Data binding tools tend to implicitly subset XML. That is they assume things like

      • Documents have schemas or DTDs.
      • Documents that do have schemas and/or DTDs are valid.
      • Structures are fairly flat and definitely not recursive; that is, they look pretty much like tables.
      • Narrative documents aren't worth considering.
      • Mixed content doesn't exist.
      • Choices don't exist; that is, elements with the same name tend to have the same children.
      • Order doesn't matter.

      Different data binding tools have different subsets of these problems, but most have at least some of them.

      The problem is data binding tools tend to view the world through object colored glasses. They assume elements are just a funny kind of serialized object, and they're not really. That said, if all you're doing with XML is serializing objects the limitations of a data binding API may not bother you so much because you already have a class and object centric view of the world. However, if you start with arbitrary XML you're unlikely to be able to bind it to anything much simpler than DOM without throwing information away.

    9. Re:Disgruntled with XML.... by rossifer · · Score: 1

      Well, I think EJB config files have a lot wrong with them, starting with the EJB architecture, so you won't find me defending them :)

      You're right that we're using the term "config file" to mean different things, and my confusion is hereby cleared up.

      One advantage that XML has over other approaches that must be included: the syntax, though heavy, is widely understood. In antlr, you have the power to create a "great, new config file language" (just using config files as an example), but other people (including IT people where the system ends up being deployed) will have to understand and maintain those files and probably don't want to learn something new when XML will suffice.

      As for XML API's, DOM blows pretty hard. There's just no two ways around that. SAX parsers can actually be pretty cool, but antlr is absolutely more fun to work with (in part because of the enormously increased syntactic capture ability). Too bad "more fun" isn't usually a compelling argument to those specifying requirements... :)

      Regards,
      Ross

  26. Alternatives by Tablizer · · Score: 1

    Functional fans will say that Lisp's "ess expressions" are better (more compact), and relational fans will suggest improving/refining delimited formats instead.

  27. Very clever compression technique... by smcdow · · Score: 0, Redundant
    Look everyone, I very cleverly compressed 14 bytes of data into 72 bytes!

    p length( 'John Doe Ph.D.' )
    14

    p length( '<name><given>John</given><family>Doe</family><tit le>Ph.D.</title></name>' )
    72

    Why, with this compression scheme, I'll soon rule the world!!!

    --
    In the course of every project, it will become necessary to shoot the scientists and begin production.
    1. Re:Very clever compression technique... by Anonymous Coward · · Score: 0

      Yes the first one might be 14 bytes in comparison to the xml version but try exchanging the first line from your custom application with some other remote system that you've got no idea on how it works or is implemented...

      XML is just a way of making information self descriptive. the 14 byte string contains no information about what it represents.

      Thank you

    2. Re:Very clever compression technique... by Anonymous Coward · · Score: 0

      Your compression is lossy! :p

      Why doesn't anyone mention ASN.1 ?

  28. MOD PARENT UP by n6mod · · Score: 1

    Oh, this is priceless. What a gem!

    Thanks, I needed that today.

    -Z

    --
    You have violated Robot's Rules of Order and will be asked to leave the future immediately.
  29. Delicious irony by dubbayu_d_40 · · Score: 3, Funny

    ridiculing the verbosity of xml, on a web page.

    1. Re:Delicious irony by owlstead · · Score: 3, Funny

      Yeah, but this is slashdot HTML, as far away from XML as it can be.

  30. so do we love or hate Mozilla and FireFox today? by roman_mir · · Score: 4, Insightful

    After all XUL and RDF together with js, css and resource files - that's what makes FireFox tick.

  31. "The Office" reference by Anonymous Coward · · Score: 1, Informative

    Just in case anyone didn't get it - the dept line is a reference to an episode from the BBC series "The Office"... can anyone pick the episode?

    1. Re:"The Office" reference by davisk · · Score: 1

      Unless i'm mistaken, it's from the christmas special, where gareth is filling out an online dating application for david brent.

  32. Good Old Rusty by fm6 · · Score: 1

    This book appeart to be for people who already know XML, but need to work on their technique. (I refuse to use that vague term "advanced users".) If you're an XML newbie, you probably need to buy The XML Bible from the same author, Yeah, the title is dumb (computer book publishers have a thing for dumb titles) and the CD is screwed up. But I know of no other book that will allow your typical HTML hacker to make the transition to XML so easily.

  33. I agree by DogDude · · Score: 1

    Even with the "descriptors", you still have to know how the data is laid out. It adds a ton of overhead, with, as far as I have been able to tell, little benefit. Hence, everything I program is still in good ol' comma (or some other character) delimited, without all of the XML fluff.

    --
    I don't respond to AC's.
    1. Re:I agree by Anonymous Coward · · Score: 0

      Quite frankly if you can solve your problem with CSV or Tab files, you shouldn't be using XML.

      That approach tends to collapse as soon as you have to deal with:
      + Multivalue fields
      + Data that contains tabs, spaces, returns that must be quoted or escaped.
      + Data that is exchanged with platforms that have different linebreaks.
      + Non-ASCII characters
      + All of the above.

      Try to solve this with CSV files and you end up with a parser that's only slighly less complex than the XML one that everyone already has.

    2. Re:I agree by karakal · · Score: 1

      Yes and there is on of your problems. Maybe you don't have to deal with international characters (is there anything behind US borders?) but I have. And character-delimited files are no easy way to exchange data (especially with multibyte characters) I like XML. It is a great way to describe data in ordered structures and it enhances interoperability. There is only a problem with "old" programmers. They don't like structured data. They like some weired hacks better.

  34. What's so bad about XML? by rikkus-x · · Score: 3, Insightful

    I give customers a specification showing how I would like data sent to me. They can use the specification to tell them how to store their data, because they can read it. They can check that their data matches the specification, because their machine can read it.

    When I receive their data, I can check that it matches the specification, because my machine can read it. If there is something wrong with their data, I can point out where it's broken, because it's human-readable.

    Writing specifications is easy. Writing generators and parsers is easy. The tools are ubiquitous. Generation and parsing are usually fast 'enough'. The standards are freely available. Complex data structures may be described. Data may be transformed using a common language based on XML itself.

    Yes, I'd like it to be easier to write XML parsing tools. Yes, I'd like it to be easier to write tools which handle XML more efficiently. No, the two points above don't make XML the devil's data encapsulation.

    Rik

    1. Re:What's so bad about XML? by Xorkid · · Score: 2, Insightful
      Nothing,

      People just fail to realise what XML is (or isn't). Basically XML is just a way for you to define your own (markup) language for any purpose.

      That it. Is not a database replacement. It won't walk on water or feed the hungry or kill all the communists/terrorists.

      But if you want to persist textual data with structure, in a form that will most probably be readable in 20 years time, XML is for you.

      --
      www.microsoft.com/athome/sec urity/children/kidtalk.mspx Was This Information Useful?
    2. Re:What's so bad about XML? by Anonymous Coward · · Score: 0

      Data may be transformed using a common language based on XML itself.

      You say that like it's a good thing.

      It may also be transformed using a thing called a program. This "program" will run faster, be quicker for me to write, and be easier to read and modify. It's cutting-edge technology, but I'm confident that all of the major 3 platforms will have "programs" by 2006.

  35. By reading the introduciton... by sapgau · · Score: 1

    I realize that he was defining the vocabulary of the subjects that he will talk about later. I recognize this as very important first step that many authors take for granted.

    Vocabulary increases our understanding of the entities that we want to work with, so we don't spend our time arguing about what we are trying to say...

    For this I remember Ludwig Wittgenstein and his methodology of achieving the Truth by establishing the meaning of words and their relationship with thoughts and their link to reality. I most likely got that wrong (read it in school long time ago) but that philosophy is heavily based on logic and predicates.

    My point is, I liked how (at least from reading the intro) how he is preparing us to talk about the rest of the book. Conclusion, I ordered the book.

    /obscure?

  36. XML as a fall-back standard by galdur · · Score: 2, Interesting

    When it comes to speed, XML sucks. It does provide incomparable interchange of data on a human- and machine-readable level. It would be nice on the other hand to be able to select a faster standard when both ends of a transaction support it. XML would become the lowest denominator.

  37. I believe you meant... by game+kid · · Score: 3, Informative

    ... but yeah, you're right. Helps do away with the (ugh!) parenthesis matching crap in LISP, so actual people can edit it too, verbose as it may seem.

    --
    You can hold down the "B" button for continuous firing.
  38. Importance vastly overstated by s88 · · Score: 2, Informative

    The review almost sold me on the fact that I could actually learn something from this book. Looking at the sample chapters here told me the truth

  39. Too expensive... by dantheman82 · · Score: 1

    As mentioned previously, shop somewhere else than BN.com. Try fetchbook.info, which is a search engine for new and used books from 110 bookstores.

    What I gleaned was that it's sold used at half.com for under $25 shipped and new at Overstock.com for less than $30 shipped.

    --
    This sig donated to Pater. Long live /.
  40. XSLT is a piece of cake? by sorbits · · Score: 1
    Need to convert XML into SQL INSERTS? Piece of cake.

    Not sure I agree. I had to convert the XML log from subversion into an RSS feed (which is also XML) but while I did use XSLT for 99% of the transformation, I still had to pipe it through perl to do a few things that XSLT couldn't, since it doesn't even do simple string replacements (only translation from one character to another).

    And the stuff it did do for me, I wouldn't call that a piece of cake, more like a lot of complexity for something which should have been trivial.

    1. Re:XSLT is a piece of cake? by Piquan · · Score: 1

      Not sure I agree. I had to convert the XML log from subversion into an RSS feed (which is also XML) but while I did use XSLT for 99% of the transformation, I still had to pipe it through perl to do a few things that XSLT couldn't, since it doesn't even do simple string replacements (only translation from one character to another).

      You may want to check out EXSLT, in your case the str:replace function.

  41. Plug: unofficial companion article by Uche · · Score: 1

    ERH is the best XML teacher in books. I did, however, have a few notes on Effective XML:

    "Thinking XML: Harold's Effective XML" [IBM developerWorks]

    --
    "What thou lovest well remains, the rest is dross" -- E.P.
  42. Compress it by Anonymous Coward · · Score: 0

    The verbose nature of xml is highly redundant. If you think this is bulky, then compress it.