Effective XML
Before I tell you what's inside though, let me tell you what you won't find in these pages. Primarily you need to know that this book does not teach XML. I know a lot of books say that, yet still include an introduction or appendix that covers the basics, but this isn't one of them. You're expected to know XML from page one. Even syntax is only covered from a proper usage angle. Personally, I appreciated this. It always bothers me when an obvious non-beginner's book starts off by wasting a chapter on things I should already know. You just need to be aware when you buy that you won't learn XML here. Knowledge of namespaces, DTDs, the W3C's Schema Language, XSLT, and more aren't strictly required to get something out of this book, but they certainly would help you get a lot more out of it.
What you will get here is coverage of fifty miscellaneous topics spread across four sections on "Syntax", "Structure", "Semantics", and "Implementation". In "Syntax", ten topics delve into the details of things like DTDs, entity references and the XML declaration itself. It may sound silly to dig deep into a single line of XML that simply declares the format, but I doubt you will think so after reading that topic. There's a lot going on in that line and you want to be in control of those decisions instead of just copying and pasting. Entity references are an even smaller chunk of XML output, but they too get illuminated by a rare insight on how and when they should be used, and for what. Did you know that it is possible to write a namespace savvy DTD? I do now and I learned that in this section as well.
The second section of the book covers "Structure", and to me it was the best part. This collection of seventeen topics is loaded with good advice about how to build an XML document that will be ideal for anyone who needs to work with it. Here you see how metadata should be stored in XML, get tips on embedding binary content, learn which schema language is better for which tasks, and finally understand rare XML constructs like processing instructions and exactly what they are for. Additionally, there's a lot of general advice on the right way to mark up content that's really worth its weight in gold. Just one example of what I learned here is that I under appreciate mixed content for great constructs like <name><given>John</given> <family>Doe</family>, <title>Ph.D.</title></name>. If you like that, you'll enjoy this whole section.
Section three, "Semantics", deals primarily with parsers and their APIs. Again, you won't learn any APIs here. What's covered is their strengths and weaknesses and why you should choose a given API for a given task. SAX and DOM are the main focus of these ten topics, but there are other details sprinkled in, like XPath.
The fourth and final section is all about "Implementation". The thirteen topics here address client-side XML styling, server-side transformations, signatures, encryption, compression, and more. My favorite topic here was a terrific coverage of Unicode and how it affects XML. All developers should know at least as much about Unicode as what's printed here and this is a fine source to learn it from.
One thing that really stands out in the whole text is that the author isn't afraid to cover the dark side of XML. He will tell you where the design process was less than perfect, which tools have little practical value, and some of the problems with where XML technologies are headed. This isn't complaining though. All of this is targeted at how it affects XML developers today. You learn what you can safely skip and what should be outright avoided. The author even tells you what XML is bad at and gives you advice about when you shouldn't use it. That's the mark of a man who knows his subject, if you ask me.
All told, I think the author failed to completely convince me his way is perfect on only 2 topics. That means I learned 48 expert XML tricks. Surely that's worth the cost of the book in time and money. This isn't the first XML book you need, but I think it is the second XML book everyone should read.
You can purchase Effective XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
I just bought a book a couple days ago. Great one so far, even it does not teach you XML, but for anyone who have even small experience with XML, the book is still great. Just like me, you will pick up really fast.
Bookpool has it for $28.50. Don't click the bn sponsored link (where it's a whopping $44.95).
/. gets a kickback from doing something dumb like clicking the link to overpriced merchandise.
PS, I don't work for Bookpool, I hate it when
If you like this book, don't forget to check out Scott Meyers' Effective C++ or Joshua Bloch's Effective Java. Both are great. I devoured Meyers' book when it first came out, and I was happy to see Bloch's book was similarly useful. There is also an Effective Perl book out, but I don't know how good it is -- it follows the same general format, but hasn't been updated since 1997. (Neither has the C++ book, but C++ hasn't changed that much since then.)
EricSee your HTTP headers here
HTML significantly predates XML. Though both are derived from SGML, they are in somewhat different categories (HTML being an application of SGML, while XML is a profile). HTML is a closed development path, however; future versions will be XHTML, which is a derivative (application) of XML.
Floating face-down in a river of regret...and thoughts of you...
point is string parseing is neither easy for the programer or the machine. Compare finding a specific set of data in XML with its variable length branching sets of elements etc. to finding somthing in a SQL database where all data is at fixed offsets. With SQL the computer only needs to know how big each row is, and what row its looking for, then it can skip to (size of row)*(row number) just like that. That's fast. With XML, the whole file has to be parsed first, then once its in memory a faster lookup can be done. I'm not sure how XML databses work, but they look like they would aleiviate this problem.
$28.27 at overstock.com.
Not sure if you were serious here or not, but this is necessary to disambiguate the following improperly formed XML:
<start> Now is the time for all good men to come to the aid of their <noun>country</noun></phrase>which is either missing a "phrase" start tag or mixed up the start & end tags... in a long XML document, the parser can give you a better hint where to look for the error.
Or you were kidding and I missed the joke, in which case I'm about to be called all sorts of impolite things... (I might even be referred to as Sean Penn).
Proud neuron in the Slashdot hivemind since 2002.
Please don't tar XML with the schema brush. One of the unique innovations of XML is that schemas are optional, and need not be agreed on. Schemas can be useful as I discuss in Item 37. However, they are misused and overused far more often than they're used correctly.
Really, schemas are just convenient tools for a few special purposes. Not everyone needs them, and no one needs them all the time. Schemaless XML is a lot more interesting and practical.
These days data has to be pretty damn simple to justify using a flat file rather than XML. I wrote more about this in my previous book, Processing XML with Java than in this one, though. Chapters 1-4 discuss this in some detail.
Real-world data often gets messy in ways that don't lend themselves to flat files. For instance, two of the thorniest problems:
Both of these are completely solved by XML with no extra effort on your part, and these are hardly the only issues.
I certainly agree that it's easier to write a parser for a flat file format than it is to write a parser for XML. However, it's much easier (and much more reliable) to use one of the existing well-tested, debugged XML parsers than it is to write your own flat-file parsing code.
Hmm, that's one I haven't been asked before.
I suspect what it offers is that you don't have to define and write your own BNF grammar, and then implement it in lex and yacc or similar tools.
Grammar design is non-trivial, especially if you need to consider issues like internationalization. Picking XML as the underlying format means you don't have to do this work yourself. Why reinvent the wheel?
Sometimes you do need something different, but a lot of alternative formats don't really have a good reason to exist. More often than not, custom parsers just come about because a programmer is more comfortable writing bad parsing code quickly than learning a new, more robust API in order to use someone else's parser.
Just in case anyone didn't get it - the dept line is a reference to an episode from the BBC series "The Office"... can anyone pick the episode?
Because you need to _parse_ it in any way at all. Simply holerith/runlength-encoding the data would be much better.
Take XPath as an example. How do you extract the fragment pointed to by the expression
foo/bar/fie[@naja='hehe']
? You read the document, counting opening and closing tags, until you read in a foo-tag at topp-level then you continue, counting as before, until you, before a foo-ending tag at topp level, reaches a bar-tag at second level, and then until you reach a fie-tag with the attribute naja set to 'hehe' at third level. Then you read on counting opening and closing tags until you reach its ending tag and return the string between the opening and ending tags, including those tags, as result.
Thus, if the foo-tag is at the end of the document, yoy have read the entire document just to extract those tiny bytes at the end of it.
If you coded each tag something like
4711 characters
this task would directly be greatly minimized, as you could "jump" over big chunks of the file at once. Changing the coding of that 4711 to binary would also minimize the hassle, as reading the number would be a simple 4 byte read operation (one machine instruction).
Even better would be to have tags not contain any information, but just pointers (indexes into the file) to the information, so that changing the file destructively to add some extra info would be possible without re-writing the whole (possibly big) file.
All of this is old knowledge however. Go read up on SUN RPC, Corba, or, heaven forbid, ASN.1...
--The knowledge that you are an idiot, is what distinguishes you from one.
Sounds like you've been reading Joel.3 19.html
http://www.joelonsoftware.com/articles/fog0000000
"Taligent is still pure vapor. Maybe they'll be the last who jumps up on Openstep... "
... but yeah, you're right. Helps do away with the (ugh!) parenthesis matching crap in LISP, so actual people can edit it too, verbose as it may seem.
You can hold down the "B" button for continuous firing.
The review almost sold me on the fact that I could actually learn something from this book. Looking at the sample chapters here told me the truth