XML in a Nutshell
The Scoop
While one of the original goals of XML was to create a specification simple enough that a computer science student could produce a working parser in a week, a few new developments have complicated things slightly. The sea of W3C-recommended acronyms includes namespaces, XPath, XSL, XPointers, schemas, and dozens of specific XML applications. Adopting the simple rules of well-formed data helps, but the quickly-growing stable of related technologies is enough to make the sturdiest information architect weep. The specifications aren't as easy to read as, say, the latest Terry Pratchett novel, either.
XML in a Nutshell covers just the most important concepts. Cleanly written, it walks through the XML aspects likely to be used in most projects. As it assumes existing familiarity with the subjects, it does not spend much time in tutorial mode. Instead, these are the guts of the subjects, arranged nicely in dissection jars.
The first section covers XML basics. This includes the ubiquitous grove of angle brackets, the semantic intent and implication, a good chapter on DTDs, as well as internationalization concerns. The short discussion of namespaces is the clearest explanation this author has yet encountered.
Part two delves further into the reasons for using XML, exploring documents that use the structure to explain semantic relationships. DocBook and XHTML appear, as extended examples. Further, it explores the assistive technologies of XSL, XPath, XLinks, and XPointers. Again, the discussions of XSL and XPath compare very favorably to longer works, intended as tutorials. A brief examination of CSS and XSL Formatting Objects rounds out the section.
Part three explores the use of XML as a data transport. In this section, programming languages come into play. There's a strong hint of Java in the air, though most of the discussion follows a language-neutral path. Both the DOM and SAX parsing models have a dedicated chapter. They're short, but the essential pieces are described simply and effectively.
The final section makes or breaks the book. Luckily, XML in a Nutshell won't have much chance to gather dust. The two-hundred page reference section includes the most useful information. There's an annotated copy of the XML 1.0 Reference, arranged logically. The XSL reference, in particular, is quite good. DOM and SAX programmers will also enjoy their respective chapters. Finally, it's nice to have a large set of printed character tables handy.
What's to ConsiderThe parsing examples don't go much beyond DOM or SAX, and there's more than a strong Java flavor. (Of course, the models are very similar in most modern languages.) As well, some of the class interfaces in the SAX reference are hard to read. This is probably due to the complexity of the information instead of any editorial decision. There's also little discussion of actual XML applications. Instead, the book covers the principles behind perhaps 90% of XML usage. Again, this is not a complaint, just a clarification of the intended audience.
The SummaryThe value of XML in a Nutshell should be readily apparent to XML developers. The material is well-organized and concise. It's a quintessential Nutshell book, upholding a tradition of utility and quality. Readers who've already been exposed to the presented material will likely keep this book close at hand.
Table of Contents- XML Concepts
- Introducing XML
- XML Fundamentals
- Document Type Definitions
- Namespaces
- Internationalization
- Narrative-Centric Documents
- XML as a Document Format
- XML on the Web
- XSL Transformations
- XPath
- XLinks
- XPointers
- Cascading Stylesheets (CSS)
- XSL Formatting Objects (XSL-FO)
- Data-Centric Documents
- XML as a Data Format
- Programming Models
- Document Object Model (DOM)
- SAX
- Reference
- XML 1.0 Reference
- XPath Reference
- XSLT Reference
- DOM Reference
- SAX Reference
- Character Sets
You can purchase this book at Fatbrain.
I do not expect any more advances in web technologies. HTML 4 is where we are going to stick. Even CSS is not properly implemented in any browser.
XML will be useeless in web browsers, until one is released with full XSL support, or CSS3 is released and supported. Until this happens, XML is an orphaned technology.
Due to the deprecation of most of the HTML interface in XML, no web author will willingly use the new technology. It is more work for less reward. XML is a buzzword. It offers illusory benefits that only make sense to people who are pedantically concerned about hings like using em tags instead of b tags on their web pages.
Denial isn't just a river in Italy
I'm sorry. Really.
</POST>
Take information you want to store and sandwich it between <{name}> and </{name}> where {name} describes the information in between. Mimic the structure of the data, and sprinkle in <{name} otherData="{neatStuff}"> every once in a while. Congratulations, that's XML.
Seen any BadMarketing lately?
the only problem is, I learned a lot of the concepts, however I usually learn a lot faster with code examples. Anyways the SAX and DOM areas have a little bit of code, but do not go into huge parsing examples. (Maybe I read it wrong...) Good book O'Reilly usually doesn't put out bad ones. Hopefully there will be Java / XML Cookbook. (I know there already is a Java Cookbook) I love those...
"It takes many nails to build a crib, but one screw to fill it."
I want to learn plain XML, not XML in a bash shell, XML in a csh shell, or any other shell for that matter. What were they thinking?
Unfortunately the book doesn't cover the successor to DTDs: XML Schema.
Some people are under the misapprehension that XML's role is as the successor to HTML; that's a very limited viewpoint. Far more important and interesting is the role of XML as a language and host independent way of specifying data, particularly with respect to relational databases, and to type in conventional languages.
Does anyone have any *good* links to online XML references? whenever I look all i find are things like "What is XML? ..Its not HTML"
I SURVIVED THE GREAT SLASHDOT BLACKOUT OF 2002!
XML is not going to replace HTML and that's great because XML is better suited to data than display.
;) ). I have worked with EDI formats before and it is a pain in the butt to set up message positions for all of your data and to work with nested lists of information. XML makes that so much easier and lets you use DTDs to enforce stuff. I also like the fact that XML was made to be read by a human being. We can actually look at the data file and tell what a field is by looking at the tag. This is why XML is going to be ubiquitous.
I have used XML on several projects not to send to Browsers to display, but to transfer data between disparate systems. Finally there is a way that two computers can exchange data & meta data without worrying about memory use, big/little endian, EDI formats, and character positions. XML is great in that almost everyone agrees to use it to transfer information. HTML is great for formatting display to a degree (PostScript people please don't flame me!
Don't expect it to be a browser language, it's just data. With nicely structured data you can use that to generate HTML, WML, anything...
The future of data transfer looks bright.
You can purchase this book at Fatbrain.
a sp ?theisbn=0596000588&from=MJF138
The link:
http://www1.fatbrain.com/asp/bookinfo/bookinfo.
Not a bad idea - using a slashdot posting to drive sales through a referral link. I'll be back later - I'm off to find some books to review...
Seen any BadMarketing lately?
Take LISP, make the syntax twice as annoying, and hey presto, XML!
XML is just an annoyingly verbose way of representing s-expressions, data structures that lisp was designed around.
So much so, in fact, that it's possible to do a 1:1 mapping of XML into Scheme - see this site for the most sensible way of processing XML - translate it into the equivalent scheme representation.
This allows you to use all the LISPy tricks in the book to munge your XML data.
Choice of masters is not freedom.
On a related note - O'Reilly's 'Java & XML' book by Brett McLaughlin was eventually released this week after sliding from it original July release date.
Highly useful, and highly recommended.
When I was between jobs earlier this year, I decided to learn XML, and bought this book after perusing several others in the bookstore. I'd had a vague introduction to it at my previous job, and understood the basic ideas behind it. The book gave me a thorough understanding, and I was able to talk about it intelligently (and correctly) at subsequent job interviews. I now work with it on a nearly-daily basis, and the book is a big source of my knowledge.
I've heard of something aclled AIML. What is this? The next XML? The next html? Neither, both? Tell me tell me tell me please Google has failed me.
As someone already said, XML is the ultimate replacement for the comma-delimited file. For the purposes of storing human readable/modifyable data, it's great, and does fill many of the roles a comma seperated file used to fill. XML itself is pretty darned easy to pick up.
That's not the problem.
The problem is with the description technologies - most of which just add a layer of abstraction to the XML data, and try to pass a secondary version of the data back to an HTML template.
That's all well and good - but quite frankly, the current incarnation of XSL stinks. It's tough to comprehend, easy to butcher, and half the time doesn't make sense.
Much easier (and more useful, I would think) are the parsers which transform an XML document into a data structure you can use in an existing language like Perl or PHP (for the web), or C, or whatever you want. Once you're in a native data format, you're set, and can manipulate the data just as you normally would.
That's the way to leverage the strength of XML. Ditch XSL for now, until it can be made clearer - and use some existing backend technology to format the data once it's in a data structure.
My 2 cents, anyway =)
All HTML documents, by contrast, are HTML documents. Does an H1 element represent a chapter title, a section title, a heading, or just a line of bold text separated from the rest? Who knows? The content and the presentation are mixed together in a one-size-fits-all syntax that forces you to throw away the underlying meaning of your information when you shoehorn it into HTML.
For example, I'm working on a web site to help people affected by breast cancer. The main value of the site is the information it contains, so you can be darn sure that I'm preserving the information's meaning. I'm not using XML as a better HTML but rather as a rich medium that captures all of my information's value. Once captured, the information is easily "extruded" into HTML for web presentation, simple HTML for Palm and hand-held devices, and typeset pages in PDF for offline reading.
Make no mistake about it, XML is already a winner.
Easy, automatic testing for Perl.
XML is not going to replace HTML and that's great because XML is better suited to data than display.
Well, I think XML is a generalization of HTML because of the repetition of HTML extension. The W3C committee designed it so that the future extension wouldn't be as painful. However, this XML thingie creates an unprecendented hit so that everything can be encoded in that form, albeit not efficiently sometimes. Because of this, XML is then used to represent database, and so on.
Just my 2c.
--
Error 500: Internal sig error
I often buy these books with a few questions in mind that I need answered. I always find that such questions are hard to find the answers to even when the book contains the answers. This was the worst case of it.
When do I use an attribute and when do I nest a sub-element? Any "leaf" could be either. The pathetic answer was "duh, nobody's made up their mind about this." Oh well, so much for the genious of OReilly and the w3c. (hint, how about coming up with a good reason to use one or the other.)
The worst thing you can do is have a programmer write a programming manual. The second worst thing you can do is organize these books like school text books.
While eveyone seems to agree that XML is important but a book simply about XML may not be as useful as a book with an explanation of XML and some examples of real life usage.
Possibly a better book (also on O'Reilly title) is O'Reilly's Java & XML (ISBN: 0-596-00016-2 or EAN: 9780596000165). I have read this book and found it to be execellent. Although it is java-centric, it discusses concepts that could be easily applied to other languages. The book has good coverage of XML as well as usage of SAX, DOM, and JDOM, and using XML with databases, as configuration files, and in wireless devices. It also covers XSL/T and focuses on Apache XML projects.
A GOOD READ for anyone iterested in using XML.
I've had trouble with the implementation side of XML. While the concept behind XML is extremely simple, getting it to display is quite another. XSL chose some extremely hard to understand syntax for a data structure designed to be human-readable.
Travis
The book review mentioned a chapter on DTDs, but what about schemas? Aren't schemas the way we're supposed to go? Without coverage of Schemas, I will stick with ageing but excellent "Professional XML" book from the Wrox Press.
Apple seems to have utilized XML in a rather remarkable fashion in OS X... makes all those annoying .plist's quite easy to understand.
The point is, there are many incredibly useful places for XML to contribute to web dev and app dev without XML ever being sent to a browser.
Yes, XSLT is a hassle, and no, web developers are not likely to move quickly to a technology that requires strict adherence to syntax rules and well-formed code, and no, browsers are not likely to have decent support for XML display anytime soon... BUT XML and Java are excellent together, and this doesn't even touch on data feeds which are exponentially more reliable and configurable and maintainable using XML than any other format...
Ok I'm meandering now. Just a big fan, having used XML extensively over the last year or so.
La via sola al paradiso incommincia nel inferno
How the XML is constructed is just like the usual context-free language. Any context-free grammar language (C/C++, Java, Pascal, etc) can easily be parsed by any functional language, such as Scheme, LISP, ML, or OCAML. Because context-free language is based on recursive grammar, it is pretty direct to translate it into the functional language. Manipulating and constructing the AST are also very easy.
Mapping 1:1 from XML to functional language representation is highly exaggerated. In ML, for example, one would have to build the table data structure -- eventhough this thing can be easily made. There are still some idiosyncrasies that you have to handle too, albeit is not as intricate as the one in imperative languages like Java or C/C++.
Mapping to AST itself does NOT yield the full usable extent of XML. XML itself is used to describe tuples of data. How you can flatten the AST tree out to records/structs/classes that is directly usable to the subsequent program? It's not that easy either in functional language. Moreover, the post product of records is highly suitable to imperative language rather than the functional language's.
--
Error 500: Internal sig error
The purpose of XML is not as an HTML replacement. Those who use XML to generate HTML are doing one moderately interesting thing with some powerful technologies. But the real power of XML is that everyone is speaking the same language.
When you see technologies like SOAP and ebXML, you really start the understand the value of this common language. Don't judge XML as an HTML replacement.
I spend most of every day working with XSL and XML, and continually have to listen to people complain how hard XSL is. It's not. Though it's a different meathod of writing code than some people are used to, most people I work with, have no problem with it, once they break out of the C-type syntax of coding. Once you comprehend the template concept of development, XSLT is actually rather easy.
Don't get me wrong, there are limitations to the language, and hopefully, we'll see those limitations removed in 2.0.
But, if you can make the conceptual jump in coding styles, it can be very effictive.
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Every slashdot review does this.
It's a racket.
The problem I have with it is that it doesn't respect a DTD. This places too much dependence on a specific XML file. If I have a node that is allowed to have more than one child, XML::Simple will return different results depending on how many children are in the node. If it is just one, then the data in that node is placed as a scalar. If there are more than one, then the data is put into an array(an arrayref, actually).
Personally, I think it should always be an array if there is a possiblity for more than one element. If there is just one thing in there, then it should have just one element. But you can't tell if there would ever be more than one element inside a node just by looking at the XML file, because that is just one instance. You have to look at the DTD.
XML::Simple would be extremely useful if it returned the same data structure for the same DTD, every single time. Each XML instance would have different data filled out, of course, but the structure of the data would be the same. Maybe this isn't quite possible in perl.
I think a lot of the XML development I have seen really ignores the usefulness of DTD's. If you want to make nicely structured data, XML is great. But if you really want to provide something robust and extensible, you have to provide a DTD and test that you will be able to handle anything that DTD provides. Otherwise you are just kidding yourself.
-Mike
If it should be possible for a CS student to write an XML parser, are there any good texts that people know of that go through this very process? I think it would be an interesting side project, but since I am not a computer scientist (or a student), its not easy to get a start.
Troll Like a Champion Today
Great. I look forward to reading them. Sorry I don't have any other convenient way to pay you for writing it.
the current incarnation of XSL stinks.
It doesn't stink, it just smells different. It's a functional language, not a procedural language, and those of us who didn't grow up at MIT still find that a bit weird. There's certainly a culture shock, but once you start to get it, then it's no harder than anything else.
There's nothing wrong with variables that you can't change the value of ! You just need to lose that inate fear of recursion most of us procedural people still carry around.
XPath does look a bit like Martian, granted, but it's no worse than regexes.
A really good text on XSLT needs to go beyond the reprinted standard level, and Michael Kay's is pretty good for that. Lots of useful cookbook stuff, and the 2nd edition is also well up to date.
Now xmlns:xsl="http://www.w3.org/TR/WD-xsl" used to pong a bit... Nested templates ? Blaurgh !
XML is hard to learn, and easy to remember. Nutshell guides are best for complex lists of obscure settings in little-used config files. I have a bunch of similar Nutshell guides, and they see much hard and useful service.
This book isn't a good tutorial (it isn't meant to be) and I see no need for a "handy quick reference" guide to the parts of XML that are covered here. It's not a bad book, but I see no real useful purpose to it.
Sometimes I need to read the XML Spec. This is only ever for really obscure and bizarre minutiae, and in those cases I have to go back to the W3C original. Fortunately that's on-line and already on my desk in a well-thumbed paper copy. I've never felt the slightest need for an XML Nutshell.
Omitting Schema is a real drawback. The Schema spec is one of the very few XML-related specs that's at all large and can't easily be memorised.
I agree with the first post flamebait to an extent; XML is all well and good, nice way for my database guy to get me the goods for Web presentation, but I need to DO something with that data.
The answer is XSL, but i've had to blunder around for what works. There isn't even a decent FAQ anywhere, that I know of. Suggestions anyone? Following is a list of links i've found useful; please don't send me to any of those...
TIA
http://www-106.ibm.com/developerworks/xml/
http ://www.ucc.ie/xml/
http://www.vbxml.com/xsl/xsltref.asp
http://www.xmlhack.com/
http://www.xml.com/index.csp
http://www.xmlpitstop.com/ --very good!
http://www.biglist.com/lists/xsl-list/archives/
http://www.xslt.com/
Enjoy!
ceci n'est pas un 'sig'
I own the O'Reilly book, and considering when it came out, it should've had way more on Schemas than it did. (That is, Schemas weren't a W3C recommendation yet, but enough was known to be able to give more coverage in this book.) Instead, the coverage is tilted way toward DTDs.
In fact, even though I think the O'Reilly book did an excellent job covering the most important XML-related standards in this book, the future importance of Schemas keeps me from recommending this book. If they covered the subject as well as they covered everything else, I'd easily say that this was one book that should be on every XML-monkeys' shelves. I can't say that now, so hopefully they have a second edition in the works where they fix this gaping hole. As for now, I'd probably stick for recommending Holzner's Inside XML, but note that I'm not familiar with the Wrox book that I've seen other people recommend.
That all depends on exactly why you are doing this. If you are doing this just to get practice on building a basic parser, then you probably want to look at some basic compiler books, or the documentation on the common lexical and parser generators (i.e. Flex and Bison). While that may be useful, remember that correct XML requires a little more work than just parsing (opening and closing tag names must match exactly, etc.). You probably want to read the w3c recommendation, or some annotated version if it.
Alternatively, if you just want to be able to read in XML, there are several free or GPL libraries out there already. The one I'm most familiar with is Xerces, the xml parser for the apache project. You can find it here.
If you are not a CS student, you probably want to make sure you're familiar with some of the basics (a set of languages, basic data structures, etc.) before taking on this sort of project. I'd recommend C++ and its Standard Template Library, but there are many other viable alternatives out there (e.g. C, Python, Java, etc.). There are lots of books which cover this, though none come to mind offhand. If any other reader would like to help, I'd be much obliged.
I hope some of this info helps, and I wish you luck.
So there I was, juggling apples and small animals, when I accidentally bit into the wrong one...
I more or less agree with the review, but I found an inordinate amount of typos, particularly in the XML examples (where it matters).
If I can spot them on a more-or-less casual read, how many more did I miss? What about the others who might not catch them?
O'Reilly needs to step up their technical reviewing, it's been lacking lately.
--
Marc A. Lepage
Software Developer
Speaking (loosely) of O'Reilly XML books, my local library messed up recently and actually got some current, useful, tech books. One I picked up there was "Learning XML", and I am finding it a very good read. And I am not a neophyte, XML-wise (no expert either, mind you).
...because XML Schema is the replacement for DTDs (thus making that chapter out-of-date). Here's why:
-XML Schemas are easier to learn than DTD
-XML Schemas are extensible to future additions
-XML Schemas are richer and more useful than DTDs
-XML Schemas are written in XML
-XML Schemas support data types
-XML Schemas support namespace
Not to mention that XML Schema is a W3C Recommendation as of May 2nd, 2001.
Download it. It's pretty good. I think the XML guys at MS are pretty good even though they may feel compelled to make MSXML chained to .Net and Office. SQL Server 2000 has some neat stuff and good help files. Oops, I forgot this is a Linux NG. Go with Jython, Python and Perl solutions with Apache/Sax/Saxon thrown in. All for now.
Some decent stuff in it though the character of it is corrupted and possibly redeemed. Like a Christmas Carol. Dickens could have written the tale of .net and IE bringing young XML tags into a corrupt underworld.
How does this book compare with Elizabeth Castro's book on XML?
There are many good books on most computer topics. A review that says "this book is good" is useless. What we need is a comparison of the book being reviewed, with other books which cover the same material.
I use this book a lot at work, and it generally gives out the straight facts, unadorned by interpretation or comment.
There are odd bits of editorializing, for instance a bizarre rant about how useless unparsed external entities are, and how we should replace them all with HREFs. Harold and Means little rant suggests that they think they're a bad idea simply because they've never found a use for them. Our clients use XML for document markup, with the documents being produced in multiple languages - they think unparsed external entities are fab.
The books a good reference. Be a bit wary of the opinion.
I've had the Wrox book (Kay's "XSLT : Programmer's Reference") for a while and am reading the O'Reilly book now (literally, as in it's sitting here on my desk).
... I still would have preferred to see a separate introductory discussion of XPath somewhere near the beginning of the book. XPath isn't rocket science, but covering XPath concepts as they arise in examples muddles things quite a bit. If you've read pretty far into the book and are wondering how/why a given XPath expression was written in a certain way, you can't easily "flip back" to the section on XPath to answer your question because the information that you need is scattered throughout the book.
As chromatic said, the O'Reilly book includes a chapter on XPath and an XPath reference as an appendix, which is great. Additionally, XPath functions are covered (along with XSLT functions) in an "XSLT and XPath Functions Reference" appendix. While "XSLT : Programmer's Reference" is well-written and very useful (my copy is dog-eared), the absence of any separate discussion of XPath in Kay's book is, IMO, a significant flaw.
Kay tells readers near the outset that his book is written as though XSLT and XPath were one language. Since XPath acts as a sort of "sub-language" in an XSLT stylesheet, I can understand why he chose to cover the material in this way, but
Right now, I'm about halfway through Tidwell's "XSLT" (the O'Reilly book). Based on my impressions so far, I would definitely recommend it. Back to my book.
I think the best book I've read is Essential XML. Good coverage of XML, XSL, XML Schemas ,SOAP. Examples in JScript for MS and JAVA for the rest. ISBN is 0-201-70914-7