Effective XML

← Back to Stories (view on slashdot.org)

Posted by timothy on Monday February 28, 2005 @09:30AM from the under-weaknesses-you-put-xml dept.

James Edward Gray II writes "I'm not an XML junkie and I thought this was a very good book, so I'm betting that XML aficionados will love it. Effective XML covers 50 best practices that all developers should know and use. This amounts to a book of distilled wisdom that will push you a good distance up the chart of XML mastery." Read on for the rest of Gray's review. Effective XML author Elliotte Rusty Harold pages 304 publisher Addison-Wesley rating 8 reviewer James Edward Gray II ISBN 0321150406 summary A guide to the correct use of XML.

Before I tell you what's inside though, let me tell you what you won't find in these pages. Primarily you need to know that this book does not teach XML. I know a lot of books say that, yet still include an introduction or appendix that covers the basics, but this isn't one of them. You're expected to know XML from page one. Even syntax is only covered from a proper usage angle. Personally, I appreciated this. It always bothers me when an obvious non-beginner's book starts off by wasting a chapter on things I should already know. You just need to be aware when you buy that you won't learn XML here. Knowledge of namespaces, DTDs, the W3C's Schema Language, XSLT, and more aren't strictly required to get something out of this book, but they certainly would help you get a lot more out of it.

What you will get here is coverage of fifty miscellaneous topics spread across four sections on "Syntax", "Structure", "Semantics", and "Implementation". In "Syntax", ten topics delve into the details of things like DTDs, entity references and the XML declaration itself. It may sound silly to dig deep into a single line of XML that simply declares the format, but I doubt you will think so after reading that topic. There's a lot going on in that line and you want to be in control of those decisions instead of just copying and pasting. Entity references are an even smaller chunk of XML output, but they too get illuminated by a rare insight on how and when they should be used, and for what. Did you know that it is possible to write a namespace savvy DTD? I do now and I learned that in this section as well.

The second section of the book covers "Structure", and to me it was the best part. This collection of seventeen topics is loaded with good advice about how to build an XML document that will be ideal for anyone who needs to work with it. Here you see how metadata should be stored in XML, get tips on embedding binary content, learn which schema language is better for which tasks, and finally understand rare XML constructs like processing instructions and exactly what they are for. Additionally, there's a lot of general advice on the right way to mark up content that's really worth its weight in gold. Just one example of what I learned here is that I under appreciate mixed content for great constructs like <name><given>John</given> <family>Doe</family>, <title>Ph.D.</title></name>. If you like that, you'll enjoy this whole section.

Section three, "Semantics", deals primarily with parsers and their APIs. Again, you won't learn any APIs here. What's covered is their strengths and weaknesses and why you should choose a given API for a given task. SAX and DOM are the main focus of these ten topics, but there are other details sprinkled in, like XPath.

The fourth and final section is all about "Implementation". The thirteen topics here address client-side XML styling, server-side transformations, signatures, encryption, compression, and more. My favorite topic here was a terrific coverage of Unicode and how it affects XML. All developers should know at least as much about Unicode as what's printed here and this is a fine source to learn it from.

One thing that really stands out in the whole text is that the author isn't afraid to cover the dark side of XML. He will tell you where the design process was less than perfect, which tools have little practical value, and some of the problems with where XML technologies are headed. This isn't complaining though. All of this is targeted at how it affects XML developers today. You learn what you can safely skip and what should be outright avoided. The author even tells you what XML is bad at and gives you advice about when you shouldn't use it. That's the mark of a man who knows his subject, if you ask me.

All told, I think the author failed to completely convince me his way is perfect on only 2 topics. That means I learned 48 expert XML tricks. Surely that's worth the cost of the book in time and money. This isn't the first XML book you need, but I think it is the second XML book everyone should read.

You can purchase Effective XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

11 of 269 comments (clear)

Min score:

Reason:

Sort:

hmmm by elid · 2005-02-28 09:36 · Score: 1, Interesting

All told, I think the author failed to completely convince me his way is perfect on only 2 topics.
Any ideas what those 2 are?
The Problem With XML by osewa77 · 2005-02-28 09:40 · Score: 5, Interesting

Is that it's not a very machine-friendly language (more wordy than it ought to be; parsing of tags is not very efficient) and it's not a very human-friendly language (the human style is free-style, really). I don't think it's a very good universal data description language. sorry that I had to go on a bit of a tangent...
1. Re:The Problem With XML by cluckshot · 2005-02-28 10:15 · Score: 3, Interesting
  
  To be specific having spent the last 3 years working on XML I can suggest that there are numerous problems with XML.
  
  XML Tagging is tedious and stupidly top heavy in overhead. Contrary to being human friendly it isn't. XML Tagging should be shortened to a simple set of defined tag names and then type definitions. After that each name would be addressed by an index. Typing of data should be contained in a process to extract that is associated with either the tagging index or an over the top wrapper which is similar in function to the DTD. But frankly the whole process is currently a mess.
  
  The expansion of data with tagging currently can be as much as 3 or 4 to one. This is because of the recursive parsing process if you are recovering data a gemetricly expanding time consumer. If you use linear display the process is nearly worthless for anything but a single display process. It works great for short things. In short it just eats up processing time and band width. It makes a good universal file storage structure and that is it!
  
  Once the file is retreived it should be crunched into something like MySQL or such if any real processing is going to happen.
  
  Nothing really is gained by such a markup system over just a series of hashed tags that are indexed. Such tagging and indexing is a lot less of a tax on band width.
  
  This having been said, XML works and is OK for many uses. I am not sure it really has any advantage over flat files or such. It drinks band width and program operations time. I think in time it will turn out to be a fun toy but not much else. Of course someone else might find a good way to tell me why I should use 40 characters to transmit what should have taken 10 characters and how it should have been faster or more efficient some way to use it. The whole concept was definitely good for a lot of programmer payroll time.
  
  --
  Never Politically Correct ~ I prefer the facts If you don't like what I say, get a life, or comment yourself.
2. Re:The Problem With XML by fedor · 2005-02-28 10:55 · Score: 1, Interesting
  
  (first-name=Joe,last-name=Smith,salary= 48000),
  (first-name=Jane,last-name=Smith,salary=5 0000)
  etc...
  ]]>
  
  --
  :wq!
3. Re:The Problem With XML by dustmite · 2005-02-28 11:12 · Score: 2, Interesting
  
  In my experience the main reason our clients want their data in XML is that most of them are afraid of single-vendor lock-in to proprietary formats, especially to smaller vendors they perceive could more easily go under - in other words, they want data longevity and a format they can easily process their data if they need to. And this trumps the inefficiency. Especially as people mostly transfer such documents across high-speed LANs and store them on modern 120+ GB hard disks and open them on machines with 512MB+ RAM ... in all of which cases inefficiency doesn't cause any problems.
  There are also generic XML content editors which, although rather pricy, help reduce a lot of the negatives associated with working with XML (i.e. you would be crazy these days to be writing XML in e.g. Notepad).
  I personally agree that XML is overrated, but many people want it because they understand one thing: if their data is in XML format, you can't in the long run lock them in to your software with excessive prices, and if you disappear, they can still get their data.
4. Re:The Problem With XML by Anonymous Coward · 2005-02-28 11:45 · Score: 1, Interesting
  
  > The advantage of XML is that you *don't* have to write the parser at all.
  
  Give the man a cigar. People got tired of reinventing that wheel and watching different parser implementations mangle the data slightly differently each time, so they went with one uber-parser for any structured data. They also tried to standardize datatypes (with XSD) and made a horrible hash out of it, but XML just works whether it's a webserver config or a HR database dump. The fact that it's not all things to all people doesn't make it suck, it just makes it a format like any other.
5. Re:The Problem With XML by sapgau · 2005-02-28 14:16 · Score: 2, Interesting
  
  Yes, that's implementation.
  
  But the question was if it is a universal data description language. Sending binary will kill your data the first time you try to comunicate to a macitosh or Unix system (big endian, little endian).
  
  The common lowest denominator is just text, so to describe any structure we have trees in XML.
  
  Probably the confusion is the influence of Object Oriented design with Entity Relationship schemas in databases. The way that one-many relationships are described in both areas makes sparks fly.
  
  Pivoting on table data is what OO makes it look easy but complicated in ER. For these kinds of problems XML is just the messenger.
  
  I might be wrong but doesn't Oracle allow you to return data in xml format? I wonder how efficient that is.
n00b - help! by dsginter · 2005-02-28 09:47 · Score: 4, Interesting

After seeing what can be done with simple javascript and XML, I'm wanting to get into this. Can someone point me to the best OSS way to do this (I can hear the groans now). I like Postgres but I don't see much in the way of getting it to spit out XML. I like documentation... MySQL? Am I missing something?

--
More
1. Re:n00b - help! by Anonymous Coward · 2005-02-28 14:15 · Score: 1, Interesting
  
  I still prefer CSV. I've saved countless
  maintenance hours by ripping out XML from old
  projects and replacing it with CSV.
  I've saved even more time by requiring that our
  business partners use CSV for data exchange.
  I refuse all requests to "upgrade" to XML.
  Usually I give them a choice -- choose CSV files
  or choose ISO X.12 EDI files. They always choose
  CSV files.
Re:Really? by elharo · 2005-02-28 11:58 · Score: 2, Interesting

There's a very real tension between making examples too trivial to be interesting and making them too long to be readable. I struggle with it in every book I write, and every other programming book author I know does so too. I've tried putting so-called real-world examples in books, and it's hopeless. It can't be done. There wouldn't be any space left for the explanatory text, nor would anyone put up with reading page after page of code.

Most importantly, while I tend to be writing about just one topic at a time, real world programs wander all over the map. I may be trying to explain how to use callbacks in SAX, but a realistic program also has to consider network latency, GUI design, error logging, numerical algorithms, internationalization, and a hundred other things that aren't on topic. Covering them all would obscure the subject I'm actually trying to explain. Some things you just have to leave for other books and other authors.

As an author, I try to strike the right balance between excessive simplicity and excessive length. Sometimes I hit it. Sometimes I don't. I actually think Effective XML hits it fairly well. In fact, this book was one of the toughest I ever had to write, precisely because it was so short that I couldn't spew pages like I did in Processing XML with Java (1100 pages) or the XML 1.1 Bible (1000 pages). I had to be really picky about how much code I included, and make sure that each example carried its weight, demonstrated just the point at hand, and nothing else.

By the way, the chapter with that specific example is online if anyone cares to see for themselves just what it is that makes names a more interesting and complex problem than "John Doe Ph.D" seems to be at first glance.
XML as a fall-back standard by galdur · 2005-02-28 14:24 · Score: 2, Interesting

When it comes to speed, XML sucks. It does provide incomparable interchange of data on a human- and machine-readable level. It would be nice on the other hand to be able to select a faster standard when both ends of a transaction support it. XML would become the lowest denominator.