Effective XML
Before I tell you what's inside though, let me tell you what you won't find in these pages. Primarily you need to know that this book does not teach XML. I know a lot of books say that, yet still include an introduction or appendix that covers the basics, but this isn't one of them. You're expected to know XML from page one. Even syntax is only covered from a proper usage angle. Personally, I appreciated this. It always bothers me when an obvious non-beginner's book starts off by wasting a chapter on things I should already know. You just need to be aware when you buy that you won't learn XML here. Knowledge of namespaces, DTDs, the W3C's Schema Language, XSLT, and more aren't strictly required to get something out of this book, but they certainly would help you get a lot more out of it.
What you will get here is coverage of fifty miscellaneous topics spread across four sections on "Syntax", "Structure", "Semantics", and "Implementation". In "Syntax", ten topics delve into the details of things like DTDs, entity references and the XML declaration itself. It may sound silly to dig deep into a single line of XML that simply declares the format, but I doubt you will think so after reading that topic. There's a lot going on in that line and you want to be in control of those decisions instead of just copying and pasting. Entity references are an even smaller chunk of XML output, but they too get illuminated by a rare insight on how and when they should be used, and for what. Did you know that it is possible to write a namespace savvy DTD? I do now and I learned that in this section as well.
The second section of the book covers "Structure", and to me it was the best part. This collection of seventeen topics is loaded with good advice about how to build an XML document that will be ideal for anyone who needs to work with it. Here you see how metadata should be stored in XML, get tips on embedding binary content, learn which schema language is better for which tasks, and finally understand rare XML constructs like processing instructions and exactly what they are for. Additionally, there's a lot of general advice on the right way to mark up content that's really worth its weight in gold. Just one example of what I learned here is that I under appreciate mixed content for great constructs like <name><given>John</given> <family>Doe</family>, <title>Ph.D.</title></name>. If you like that, you'll enjoy this whole section.
Section three, "Semantics", deals primarily with parsers and their APIs. Again, you won't learn any APIs here. What's covered is their strengths and weaknesses and why you should choose a given API for a given task. SAX and DOM are the main focus of these ten topics, but there are other details sprinkled in, like XPath.
The fourth and final section is all about "Implementation". The thirteen topics here address client-side XML styling, server-side transformations, signatures, encryption, compression, and more. My favorite topic here was a terrific coverage of Unicode and how it affects XML. All developers should know at least as much about Unicode as what's printed here and this is a fine source to learn it from.
One thing that really stands out in the whole text is that the author isn't afraid to cover the dark side of XML. He will tell you where the design process was less than perfect, which tools have little practical value, and some of the problems with where XML technologies are headed. This isn't complaining though. All of this is targeted at how it affects XML developers today. You learn what you can safely skip and what should be outright avoided. The author even tells you what XML is bad at and gives you advice about when you shouldn't use it. That's the mark of a man who knows his subject, if you ask me.
All told, I think the author failed to completely convince me his way is perfect on only 2 topics. That means I learned 48 expert XML tricks. Surely that's worth the cost of the book in time and money. This isn't the first XML book you need, but I think it is the second XML book everyone should read.
You can purchase Effective XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
I love the book, but once it encountered a humid day the binding fell apart. Anyone else have this experiance ?
Any ideas what those 2 are?
Is that it's not a very machine-friendly language (more wordy than it ought to be; parsing of tags is not very efficient) and it's not a very human-friendly language (the human style is free-style, really). I don't think it's a very good universal data description language. sorry that I had to go on a bit of a tangent...
One thing that really stands out in the whole text is that the author isn't afraid to cover the dark side of XML.
[Obligatory Star Wars joke]
____
~ |rip/\/\aster /\/\onkey
I want to say something funny about XML, but there is nothing.
-pyrrho
After seeing what can be done with simple javascript and XML, I'm wanting to get into this. Can someone point me to the best OSS way to do this (I can hear the groans now). I like Postgres but I don't see much in the way of getting it to spit out XML. I like documentation... MySQL? Am I missing something?
More
XML is all about loosely bound interfaces.
Get with the program.
<letter>r bose">verbose</link>.e ><nickname>Letter</nickname></name>
<salutation>Dear XML-Junkies</salutation>
<body>
I type all my business letters in <link href="http://www.google.com/?q=XML>XML</link>. Sometimes it can be a bit <link href="http://dictionary.reference.com/search?q=ve
</body>
<signature>
<nam
</signature>
</letter>
XML seems cool to me. I like the thought of being able to design a schema to suit my personal needs. But when it comes time to make use of that schema and actually keep data in it, it seems to be useless, as least as far as an end user (non programmer) is concerned.
Do I have the wrong impression?
I just bought a book a couple days ago. Great one so far, even it does not teach you XML, but for anyone who have even small experience with XML, the book is still great. Just like me, you will pick up really fast.
I agree that it's too wordy and hard to parse, and I definitely don't think it's human-friendly. (Only if one's been immersed in it for a while does it become easily readable.)
I also dislike the XML data model at all. I strongly prefer the RDF data model (not to be confused with the bad XML serialization of RDF), basically a set of subject-predicate-object triples. It's a much more natural data model: things have properties, and they do actions. It's as simple as that. XML's inherently tree-like structure is much more awkward for real-world and purely electronic data alike.
Personally, my favorite structuring language is Notation 3 (a very readable extended RDF serialization).
Signature.
1) XML is not designed to be used for everything under the sun.
Bookpool has it for $28.50. Don't click the bn sponsored link (where it's a whopping $44.95).
/. gets a kickback from doing something dumb like clicking the link to overpriced merchandise.
PS, I don't work for Bookpool, I hate it when
If you like this book, don't forget to check out Scott Meyers' Effective C++ or Joshua Bloch's Effective Java. Both are great. I devoured Meyers' book when it first came out, and I was happy to see Bloch's book was similarly useful. There is also an Effective Perl book out, but I don't know how good it is -- it follows the same general format, but hasn't been updated since 1997. (Neither has the C++ book, but C++ hasn't changed that much since then.)
EricSee your HTTP headers here
XML is excellent for data exchange and providing an open standard for interoperability. It provides a way to present data that can be used in software desgined by different vendors and even on different architectures. However it does have it's downfall, and that is that it is wordy and overly inefficent. Any programmer worth what he is being paid, knows that you don't represent your data internally by XML. When your program starts you parse the XML into a nice data structure that can be quickly accessed and modified. When the program closes you convert the data structure back to XML and save it.
Sometimes, the most effective use of XML is to simply not use XML at all. XML is a wonderfully useful tool when applied correctly. It's architecture-independent and is a great way to communicate unstructured and/or hierarchial data.
Sometimes, though, your data can be simple enough that XML is overkill. Software developers need to make themselves aware of situations when they might be better served by a simple "flat file" of delimited data. In situations like this, using XML can amount to what I like to call "gratuitous complexity."
Always use the right tool for the job.
Tired of FB/Google censorship? Visit UNCENSORED!
$28.27 at overstock.com.
<name><given>John</given> <family>Doe</family>, <title>Ph.D.</title></name>. If you like that, you'll enjoy this whole section.
.NET XmlSerializer class stuff either). How much coverage of such things is there in the book? Ie; creating an object in Java on one machine, persisting it and it's state to an XML file, and recreating it on some other machine in C++ or C#. I'm tired of writing my own "protocols" to migrate running code from one app to another.
What if I don't like that? What if I hate trivial useless examples that don't mean anything in the real world?
Noone's ever asked me to write a program that prints "hello world" on the console and then exits.
I'm more interested in using XML as a means for language independent object persistence (not just cheesy
How about binary XML implementations?
I don't need no instructions to know how to rock!!!!
Also, most of my stuff nowadays involves transforming with XSLT, so I need to create a DOM object anyway.
BTW, the example I did was for PHP5.
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$/i
The essence of XML is this: the problem it solves is not hard, and it does not solve the problem well. - Phil Wadler
XML is not the end of our problems, it is the beginning of our problems. - ditto
Shortly after the release of XML, some folks, including some very important folks in W3C and its members, who had been big supporters of XML, actually got around to reading the spec, and discovered to their horror that they had an XML which included entities, DTDs, PIs, and assorted other baggage. - Tim Bray
When XMI came out, I had just been studying up on UML, and I thought "Cool! I'll print out the DTD so that I can look it over on the subway ride home!" When I saw how big the XMI DTD was, I decided not to print it out--I prefer not to spend that much time in the subway. - Robert DuCharme
XML was monocase until quite late in its design, when we ran across this ugliness. I had a Java-language processor called Lark - the world's first - and when XML went case-sensitive, I got a factor of three performance improvement, it was all being spent in toLowerCase().- Tim Bray
XML-based technologies seem particularly susceptible to the "if we standardize it, everyone will use it" fallacy. - Simon St. Laurent
Except that now you introduce the potential to have bugs that you wouldn't get from a DOM tree. For example, building a document through a DOM tree makes it impossible to forget a / on a close tag, or to open a tag as "<foo>" but close it as "</Foo>". Sure, it's quicker, and in trivial examples you're not likely to have a problem. Real-world problems are not always as simple as examples, and you're trading the memory used by building a DOM for the accuracy of building your tree within the DOM rather than ad-hoc by hand.
01001001001000000111000001110010011001010110011001 10010101110010001000000111010001101111001000000111 00110110010101101110011001000010000001100010011010 01011011100110000101110010011110010010000001100101 00101101011011010110000101101001011011000111001100 10110000100000011101000110100001100101011110010010 00000110000101110010011001010010000001101101011101 01011000110110100000100000011011010110111101110010 01100101001000000110001101101111011011100110001101 101001011100110110010100101110
crazy dynamite monkey
Being text, it is also not tied to a specific vendor or platform.
There are valid uses for XML. Just look at http://www.x-cp.org/
Yet, every time I see XML (mis)applied in those cases I keep asking the fundamental question. What does it allow me to do that a decent Lexer and Parser does not? You could be sending grammar files just as easily and without the ridiculous verbosity of XML. Most parsers can work with either text or binary and BNF has been a golden standard for decades. XML reinvents the wheel for the umpteenth time and without a single good reason to justify its existance.
Your pizza just the way you ought to have it.
Not always. As soon as you start dealing with more complex XML, which can contain certain optional elements depending on complex conditions, things will get messy. Errors in rarely used parts of the generated XML document can get past you easily.
Also, even if the tags in your example are correct, your code still has a serious bug: what if your $v variable sometimes gets a '<' character in it (e.g. as a result of user input)? Your program would fail immediately, whereas DOM automatically escapes such characters before inserting the text into the document tree. You just proved how easy it is to introduce bugs using your method.
Functional fans will say that Lisp's "ess expressions" are better (more compact), and relational fans will suggest improving/refining delimited formats instead.
Table-ized A.I.
Oh, this is priceless. What a gem!
Thanks, I needed that today.
-Z
You have violated Robot's Rules of Order and will be asked to leave the future immediately.
ridiculing the verbosity of xml, on a web page.
After all XUL and RDF together with js, css and resource files - that's what makes FireFox tick.
You can't handle the truth.
Just in case anyone didn't get it - the dept line is a reference to an episode from the BBC series "The Office"... can anyone pick the episode?
This book appeart to be for people who already know XML, but need to work on their technique. (I refuse to use that vague term "advanced users".) If you're an XML newbie, you probably need to buy The XML Bible from the same author, Yeah, the title is dumb (computer book publishers have a thing for dumb titles) and the CD is screwed up. But I know of no other book that will allow your typical HTML hacker to make the transition to XML so easily.
Even with the "descriptors", you still have to know how the data is laid out. It adds a ton of overhead, with, as far as I have been able to tell, little benefit. Hence, everything I program is still in good ol' comma (or some other character) delimited, without all of the XML fluff.
I don't respond to AC's.
I give customers a specification showing how I would like data sent to me. They can use the specification to tell them how to store their data, because they can read it. They can check that their data matches the specification, because their machine can read it.
When I receive their data, I can check that it matches the specification, because my machine can read it. If there is something wrong with their data, I can point out where it's broken, because it's human-readable.
Writing specifications is easy. Writing generators and parsers is easy. The tools are ubiquitous. Generation and parsing are usually fast 'enough'. The standards are freely available. Complex data structures may be described. Data may be transformed using a common language based on XML itself.
Yes, I'd like it to be easier to write XML parsing tools. Yes, I'd like it to be easier to write tools which handle XML more efficiently. No, the two points above don't make XML the devil's data encapsulation.
Rik
With the DOM approach, you don't have to test the minutiae of the XML generation part of the program. Instead, you can spend that time more productively by better testing other more relevant parts of your program logic. Also, the argument that generating XML directly is easier is true only for simple examples like this, it gets complicated pretty fast as your requirements grow more complex.
Exactly. With the DOM approach, you don't have to remember this. It's easy to miss it in more complex XML generation routines, and after you've done all the escaping and error checking that DOM already does for you, will your program really be that much more readable than the DOM approach?
Umm... no. Your program will get compiled (or parsed by the interpreter) like nothing is wrong. There will be no syntax error. I meant above that it would fail to generate proper XML as soon as it receives an incorrect character during runtime, which can happen well after development, when the program is already deployed to customers, and all that hassle could have been avoided by using APIs that were specificaly designed for XML parsing/generation/manipulation, which have been debugged and field-tested countless times before.
You won't catch this failure unless you specifically test for this type of escaping bug, for that exact variable. The more complex your XML generation routines, the more difficult to test each possible combination of program inputs to excercise each possible XML output. Multiply this with many XML generation routines and many programs. You will miss a test, sooner or later. It is naive to generalize from your simple example above, and to think that you can test everything. Sure, bugs will always happen, but why compound the problem when you can reduce the number of bugs ever so slightly by using safe APIs that can also make your code more organized?
Of course, DOM doesn't solve everything, you will still have tests even in the DOM case, but atleast the DOM API is a safety net that will catch and handle some of these most basic corner cases that your tests might miss.
I've been bitten with small stuff like this (not only XML related) enough times to know better than try to splice strings together to generate XML, or try to take other quick & dirty shortcuts in bigger programs. I learned the hard way that shortcuts don't always cut short.
One additional bonus of using the DOM API is that the XML can optionaly be output indented by any half-decent DOM serializer, which makes the generated XML easier to debug in protocol dumps or whatever. Trying to generate properly indented XML from nested if/for/whatever code split accross multiple functions would require non-trivial additional effort better spent elsewhere.
And it's not only simple things like escaping variables or mismatching tags... when you also factor in namespace handling in complex documents with multiple namespaces, the DOM API with its automatic hierarchical namespace & prefix management starts to look really good. Similar arguments apply to XML entities, processing directives, CDATA, etc.
I realize that he was defining the vocabulary of the subjects that he will talk about later. I recognize this as very important first step that many authors take for granted.
/obscure?
Vocabulary increases our understanding of the entities that we want to work with, so we don't spend our time arguing about what we are trying to say...
For this I remember Ludwig Wittgenstein and his methodology of achieving the Truth by establishing the meaning of words and their relationship with thoughts and their link to reality. I most likely got that wrong (read it in school long time ago) but that philosophy is heavily based on logic and predicates.
My point is, I liked how (at least from reading the intro) how he is preparing us to talk about the rest of the book. Conclusion, I ordered the book.
When it comes to speed, XML sucks. It does provide incomparable interchange of data on a human- and machine-readable level. It would be nice on the other hand to be able to select a faster standard when both ends of a transaction support it. XML would become the lowest denominator.
... but yeah, you're right. Helps do away with the (ugh!) parenthesis matching crap in LISP, so actual people can edit it too, verbose as it may seem.
You can hold down the "B" button for continuous firing.
The review almost sold me on the fact that I could actually learn something from this book. Looking at the sample chapters here told me the truth
Just thought I'd point out that I've read the whole thread; I agree with you completely. The AC doesn't seem to realize that with DOM you don't have to worry about echo()ing exactly the right stuff at the right time.
For almost anything non-trivial I'd use DOM over echo. The only reason not to is laziness or 1337ness.
As mentioned previously, shop somewhere else than BN.com. Try fetchbook.info, which is a search engine for new and used books from 110 bookstores.
What I gleaned was that it's sold used at half.com for under $25 shipped and new at Overstock.com for less than $30 shipped.
This sig donated to Pater. Long live
Even then, you still fail it. Your echo() example has a bug again! You have an <items> element that is being terminated by </item>.
Also, you don't do DOM justice in your DOM example, it can be simplified. You also have a bug in it also, you never attach the $items element enywhere, so I don't know how exactly your XML is supposed to look like.
My PHP is very rusty, but I'll try to indulge your apparent fondness for trivial examples. I know you can certainly simplify things by stringing DOM calls together, to create whole branches of your XML in one go, like this:
$dom->appendChild($dom->createElement("root"))-> ap pendChild($dom->createElement("section"))->setAttr ibute("id", "1");
Also, I'm not sure if PHP allows in-place assignment, or white-space after the method call operator (->), but in languages like Java/C++/C#/etc you can do something like this:
dom.appendChild(root = dom.createElement("root")).
appendChild(items = dom.createElement("items")).
appendChild(whatever = dom.createElement("whatever")).setAttribute("blah" , "23);
That is, create the XML branch and assign various sub-nodes to variables for later use in one go. As a bonus, you also make your code follow the tree-like structure of your XML. This way your code ends up manipulating (and looking like) tree fragments.
Atleast for me (and I suspect I'm not alone), it's much more natural to think of the XML as a tree-like structure of nodes that I manipulate.
The human brain is not a stack-based machine, and thinking of XML as a nested series of open/close tags is much less intuitive, as you demonstrated by bungling your close tag in your echo() example. It's not that you can't do it, but why burden your mind with banalities like remembering the proper close tag at a given point, or remembering to escape values, when you can apply it to higher level problems? Let the machine handle the mechanical details.
Never said it was the ultimate advantage, just a minor bonus of using DOM.
I'm sorry, this is just wrong. With your echo() approach, generating properly indented XML is more difficult, because you have to pass a "whitespace" or a "level of nesting" parameter around in your multiple functions for generating the XML text. With DOM, you do no such thing. You just pass your nodes around as usual, and the indentation is done by DOM at the end.
Anyway, nobody said it is impossible to produce correct XML code with the echo() approach, but it is definitely more bug-prone, and you'll end up wasting more time going back to your code and fixing those bugs. Your repeated failure to produce correct XML code in even the most simplistic examples is an excellent illustration of my point.
And no, "we'll catch it in testing" is not an excuse for writing shoddy code in the first place, when there are better code practices to help you and guide you along the way.
P.S. Sorry for not providing more complicated examples with namespaces etc.,
Not sure I agree. I had to convert the XML log from subversion into an RSS feed (which is also XML) but while I did use XSLT for 99% of the transformation, I still had to pipe it through perl to do a few things that XSLT couldn't, since it doesn't even do simple string replacements (only translation from one character to another).
And the stuff it did do for me, I wouldn't call that a piece of cake, more like a lot of complexity for something which should have been trivial.
ERH is the best XML teacher in books. I did, however, have a few notes on Effective XML:
"Thinking XML: Harold's Effective XML" [IBM developerWorks]
"What thou lovest well remains, the rest is dross" -- E.P.
Go back and reread this thread from the beginning. It's obvious there are circumstances when both methods make sense. I'm happy you're familiar with DOM, but come on, if you can't see that manipulating a DOm is better for you in the long run, then have a gay-all time playing with string concatenation.
I'm certainly not about to convince you otherwise.