Effective XML

← Back to Stories (view on slashdot.org)

Posted by timothy on Monday November 24, 2003 @07:15AM from the specificity dept.

milaf writes "Who doesn't know about XML nowadays? Quite a few people, actually: there has been so much hype around it that some people think that XML is a programming language, a database, or both at the same time. On the other hand, if you are a developer, chances are that you feel that -- no matter its usefulness -- there is not much to XML. After all, it may take just a few hours to get the hang of creating and parsing an XML document. Maybe this is why most of the many and voluminous books discuss numerous XML-related technologies, but say less about the usage of XML itself." Read on for milaf's review of a book that takes the opposite tack. Effective XML: 50 Specific Ways to Improve Your XML author Elliotte Rusty Harold pages 336 publisher Addison-Wesley rating 10/10 reviewer milaf ISBN 0321150406 summary Very well written collection of topics on XML Best Practices

In Effective XML: 50 Specific Ways to Improve Your XML, Elliotte Rusty Harold takes a different approach: know your elements and tags -- they are not the same thing! -- and weigh your choices in a context, because any technology applied for the wrong reasons may fail to deliver on its promises.

Following Scott Myers' groundbreaking Effective C++, the author invites us to re-evaluate seemingly trivial issues to discover that life is not as simple as it seems in the world of XML. In each of the 50 items (chapters), he gets into the inner workings of the language, its usage and related standards, thus giving us specific advice on how to use XML correctly and efficiently. The 300-page book is divided into four parts: Syntax, Structure, Semantics, and Implementation. Yet in the introduction, the author sets the tone by discussing such fundamental issues as "Element versus Tag," "Children versus Child Elements versus Content," "Text versus Character Data versus Markup," etc. On these first pages the author started earning my trust and admiration for his knowledge and ability to get right to the point in a clear and simple language.

The first part, Syntax, contains items covering issues related to the microstructure of the language, and best practices in writing legible,maintainable, and extensible XML documents. (In it, over 19 pages are dedicated to the implications of the XML declaration!) That seems a lot for one XML statement that most people cut-and-paste at the top of their XML documents without giving it much thought, doesn't it? Actually not, if you follow the author's reasoning and examples.

The second part, Structure, discusses issues that arise when creating data representation in XML, i.e. mapping real-world information into trees, elements, and attributes of an XML document; it also talks about tools and techniques for designing and documenting namespaces and schemas.

The third part, Semantics, explains the best ways to convert structural information represented in XML documents into the data with its semantics. It teaches us how to choose the appropriate API and tools for different types of processing to achieve the best effect. This chapter has a lot of good advice for creating solutions that are simple, effective, and robust.

The final part, Implementation, advises the reader on design and integration issues related to the utilization of XML; these issues include data integrity, verification, compression, authentication, caching, etc.

This book will be useful to a professional with any level of experience. It may be used as a tutorial and read from the cover to cover, or one can enjoy reading selected items, depending on the experience and taste. The book's very detailed index makes it an excellent reference on the subject as well. In the prefix to the book, the author writes, "Learning the fundamentals of XML might take a programmer a week. Learning how to use XML effectively might take a lifetime." I'm not sure about the "lifetime" -- that's an awfully long time for using one technology -- but for the most confident of us this still may not be enough :) . Your mileage may vary, but I suspect that you could shave a few months off that time by browsing through this book once in a while. Most importantly, it will make you a better professional and make you proud of the results of your work. Wouldn't this worth your while?

You can purchase Effective XML: 50 Specific Ways to Improve Your XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

18 of 312 comments (clear)

Min score:

Reason:

Sort:

The main issue with XML is performance by Anonymous Coward · 2003-11-24 07:23 · Score: 4, Informative

Others have said it before, but I'll say it again. XML is heavy weight and isn't free. The best example of this is SQLXML. Although it sounds nice to use SQLXML, the performance on most commercial database see a huge drop in performance. This is due to the fact that parsing XML blows and eats up copious amounts of CPU and memory. I've had people ask me about how to solve problems with SOAP on windows and java applications. The bottom line is, unless you're using hardware XML accelerators, XML is a resource hog.
On a related note, more details on Microsoft Indigo are finally available. According to this article on XML mania microsoft's future platform will use XML as much as possible. More details are available on microsft's site. The funniest part is they are claiming indigo + longhorn will be the best thing since slice bread. Maybe they haven't learned the hard lesson that parsing XML kills performance.
1. Re:The main issue with XML is performance by Anonymous Coward · 2003-11-24 07:39 · Score: 1, Informative
  
  Sure, I've tried and it not an easy task. The original post is bitching, but have you seen any body improve the performance by 2-3x the last 3 years? Is it even possible to improve XML parser performance beyond current parsers? Both .NET and Java have some nice parser that use either SAX or stream based parsing. In .NET it's the XMLTextReader. In Java there's XML PullParser v2. Think of it another way. If you only need a few nodes in a XML structure, you can definitely improve performance. On the otherhand, if you're doing webservices and using a DOM centric approach, there's very little you can do outside of hardware acceleration. Put it another way, your statement or maybe they stopped bitching about the XML performance and found a faster/better way to parse it. is equivalent to saying "write a faster encryption library". There's only so much you can do with software to speed up XML parsing. IBM, Sarvega and a couple of other companies are making XML accelerator to get around the performance problem. Go google for it and you'll see that hardware accelerators can provide 10x improvement over the fastest SAX/Stream parser today.
2. Re:The main issue with XML is performance by starm_ · 2003-11-24 08:31 · Score: 3, Informative
  
  I have to agree with that. Last year I did a work term in a department where they where converting their software to XML and SOAP. When I came in they asked me to learn XML and SOAP (c++'s gsoap and java soap). We were making and converting distributed applications. Usually with a user client made in java and a c++ server (for performance). After a few weeks into my work term I was still in the processes of working on one of these SOAP servers when finally one group finished converting one of our main product. When they went to test it they discovered it was too slow to use. When the user on the client side wanted to visualize the results of its database query it took 40 seconds to serialize sent thrue the network (fiber optic network, top of the line computers) deserialize and display the results. It took only 2 seconds with RPC.
  
  They just didn't know how they could explain this to the users. They could not see that the users would understand that in the new and improved program that looks exactly the same as the old one, when they clicked the "visualize" button they had to wait 40 seconds.
  
  Also XML is very cryptic. Has anyone tried to do XSLT? my god I had to do it once and it made a simple task very difficult. They are many more efficient and intuitive way of visualizing data than XML. XML makes development time very long and costly for some tasks
  
  I think that XML has its uses though. Like for making standard word processor documents, and things like that. But it shouldn't be used everywhere like some people seem to think.
3. Re:The main issue with XML is performance by Ed+Avis · 2003-11-24 08:45 · Score: 3, Informative
  
  If you know your XML will conform to a particular DTD, FleXML can be used to generate a very fast parser for it in the style of lex/yacc. You don't have to mess with all that slow DOM or SAX stuff if you're concerned about speed. It may still be a resource hog compared with binary file formats and protocols but not nearly as sucky as often seen (my own code included).
  
  --
  -- Ed Avis ed@membled.com
Here's the list of 50 by FearUncertaintyDoubt · 2003-11-24 07:32 · Score: 5, Informative

Syntax:
Include an XML Declaration
Mark Up with ASCII if Possible
Stay with XML 1.0
Use Standard Entity References
Comment DTDs Liberally
Name Elements with Camel Case
Parameterize DTDs
Modularize DTDs
Distinguish Text from Markup
White Space Matters

Structure:
Make Structure Explicit through Markup
Store Metadata in Attributes
Remember Mixed Content
Allow All XML Syntax
Build on Top of Structures, Not Syntax
Prefer URLs to Unparsed Entities and Notations
Use Processing Instructions for Process-Specific Content
Include All Information in the Instance Document
Encode Binary Data Using Quoted Printable and/or Base64
Use Namespaces for Modularity and Extensibility
Rely on Namespace URIs, Not Prefixes
Don't Use Namespace Prefixes in Element Content and Attribute Values
Reuse XHTML for Generic Narrative Content
Choose the Right Schema Language for the Job
Pretend There's No Such Thing as the PSVI
Version Documents, Schemas, and Stylesheets
Mark Up According to Meaning

Semantics:
Use Only What You Need
Always Use a Parser
Layer Functionality
Program to Standard APIs
Choose SAX for Computer Efficiency
Choose DOM for Standards Support
Read the Complete DTD
Navigate with XPath
Serialize XML with XML
Validate Inside Your Program with Schemas

Implementation:
Write in Unicode
Parameterize XSLT Stylesheets
Avoid Vendor Lock-In
Hang On to Your Relational Database
Document Namespaces with RDDL
Preprocess XSLT on the Server Side
Serve XML+CSS to the Client
Pick the Correct MIME Media Type
Tidy Up Your HTML
Catalog Common Resources
Verify Documents with XML Digital Signatures
Hide Confidential Data with XML Encryption
Compress if Space Is a Problem
Re:where are the open source XML repositories by Anonymous Coward · 2003-11-24 07:45 · Score: 5, Informative

Maybe here?
Re:where are the open source XML repositories by Arrgh · 2003-11-24 07:50 · Score: 2, Informative

What's the matter, is your Google finger broken?
Let's see... A <digital> element contains zero or more <frame>s, each of which can contain an <image> with a URL.
i second.. by Hooya · 2003-11-24 07:53 · Score: 3, Informative

i use XML for a lot of things and it's been quite decent. but on the other hand, we're using dual pentium IIIs for trivial stuff that was running fine on a PII with c/c++ app without XML.

the fact is that XML is just marshelling and unmarshelling of all computational data to and from strings thereby negating fast numerical performance that a CPU inherently has. you want to add two numbers? create a string representation, pass it around thru a bunch of parsers/transformers as strings then finally convert it back to the number it really is then add then convert it back to string for passing it around all over again... what a waste.
Re:where are the open source XML repositories by GeckoX · 2003-11-24 08:08 · Score: 5, Informative

You have absolutely NO idea what you are talking about, and of course have been modded +3 insightful. Good one mods.

XML is extensible by it's very nature. By itself, an xml file is just that, an xml file, it means absolutely NOTHING without context and definition.

This is what DTD's do. They don't limit xml in any way, rather they describe a particular use of xml. For example: SVG, MathML and XHTML are all languages that use xml. Each one of these languages have a DTD that define the format for a valid xml document FOR THAT LANGUAGE.

Just because a DTD for SVG exists doesn't mean that anything at all has changed with xml itself.

Next, XSLT is a technology with a very specific purpose, simply put: To take an xml file as input and create a new xml file for output based on the rules written into the transform.

So, with all of that said, there is absolutely NO reason why there shouldn't be a DTD repository, and again, there is no reason why there shouldn't be a PhotoAlbum DTD in that repository. What problems would this cause? None. What benefits could be observed? Instead of everyone needing an xml document to describe photo albums rolling their own format, people might just reuse a standard DTD to do so. And application writers just might too. And lo and behold, Application X on platform Y might be able, with no work involved, open Album AA Created by Application BB on platform CC.

Getting some of the big picture?

--
No Comment.
W3C by sielwolf · 2003-11-24 08:08 · Score: 4, Informative

Browse the Technical Reports, Recommendations and Proposed Recommendations at W3C as there are a lot of DTDs and Schemas there. I found a DTD for generic simulation representation there. There's quite a bit if you take the time to look.

--
What is music when you despise all sound?
ID and IDREF, meet the previous poster by holygoat · 2003-11-24 09:00 · Score: 2, Informative

e.g. IBM's take.

You can link between XML entities quite easily.

Also consider that RDF, which describes directed graphs, is quite easily expressed in XML; there's nothing to say that you can't describe a graph and reference actual elements with IDREFs. I don't think you've really thought about this.
more reviews of this book by zontroll · 2003-11-24 09:14 · Score: 3, Informative

VeryGeekyBooks has more reviews of this book.
He missed a couple, IMHO by alispguru · 2003-11-24 09:17 · Score: 2, Informative

And one of them is Just Plain Wrong, also IMHO.

Here are two heuristics for good XML design that I dearly wish more people would take to heart:

1. If processing any text field requires parsing, Something Is Wrong, and you probably need to break it apart into more elements/subelements.

The only exceptions to this rule are fields that are numbers, or maybe date/time stamps that adhere to ISO standards.

2. If you're using attributes, You'll Wish You Hadn't In The Future.

Attributes are supposed to be the way XML seperates metadata from data. The problem with them is that they are also "leaves" of the XML tree, and intended to be simple, flat text. If you ever need more complex structure in attribute metadata, you're screwed - you must either violate rule 1 above, or move the data out into elements, totally breaking your old structure. Just don't use them, OK?

--

To a Lisp hacker, XML is S-expressions in drag.
Sure: by rodentia · 2003-11-24 09:22 · Score: 5, Informative

<?xml version="1.0" ?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/sv g10.dtd"> <svg> <line x1="50" y1="50" x2="300" y2="300" style="stroke:#FF0000; stroke-width:4;stroke-opacity:0.3;"/> <line x1="50" y1="100" x2="300" y2="350" style="stroke:#FF0000; stroke-width:4;stroke-opacity:1;"/> </svg>>

--
illegitimii non ingravare
Re:What are you talking about? by helix_r · 2003-11-24 09:26 · Score: 2, Informative

Have you ever tried storing a picture in it?

Actually, yes.

Its called SVG, it is a very nice way to represent graphics.
Re:apache has a project called Xindice by janbjurstrom · 2003-11-24 09:49 · Score: 2, Informative

True, Xindice (Apache license, has reached version 1.0) looks good (I've no experience with it), but some of the original developers (Tom Bradford - dbXML, see below, and Kimbro Staken - Syncato, also below) of the source donated to Apache think they (Apache) haven't made the most of it. I don't know if this is true, and I don't know nor have any connections with either Bradford or Staken, but they seem like competent developers; they certainly churn out code - positive sign, right?

There is choice :): Check out Kimbro Staken's weblog Inspirational Technology (who also develops Syncato, an XML database weblog system using Berkeley DB XML.):

Consider Berkeley DB XML (currently at v1.1.0). Built on Berkeley DB and identically licensed (open source, free for non-commercial/development use, etc.); tons of APIs - can't get hold of the link but one of the developers (at least I think so) maintains a weblog of 'all' things Berkeley DB XML. Googleit.

Bradford recently released dbXML under GPL (commercial licenses available should you need it), there's a v2.0 beta available at the site.

Another native XML database is eXist, at version 0.9.2, java-based, LGPL licensed, I've only glanced at it, looks alright though I'm not the guy to say..

Then there're several commercial alternatives - X-Hive, Birdstep, Virtuoso, et al. - but this is Slashdot so..
Well, someone called Ron Bourret has compiled a full-bodied overview of XML databases, and have a big list of XML/DB links too (some link-rot). Goto.

--
668.5
Re:Why should XML be text? by Trejkaz · 2003-11-24 13:24 · Score: 2, Informative
From XML 1.0:

The design goals for XML are:
1. XML shall be straightforwardly usable over the Internet.
2. XML shall support a wide variety of applications.
3. XML shall be compatible with SGML.
4. It shall be easy to write programs which process XML documents.
5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
6. XML documents should be human-legible and reasonably clear.
7. The XML design should be prepared quickly.
8. The design of XML shall be formal and concise.
9. XML documents shall be easy to create.
10. Terseness in XML markup is of minimal importance.
I believe you're questioning point 5 while bitching about point 10.

If you want a binary tree representation, check out ASN.1. It has commonly been used as a binary interchange format for the same sort of data, and XML can be mapped to ASN.1 using a schema and a bit of patience.
--
Karma: It's all a bunch of tree-huggin' hippy crap!
Several chapters are online by elharo · 2003-11-25 03:50 · Score: 4, Informative

Nice review. Thanks! It's interesting how many of the comments here relate directly to chapters in the book. For instance, there's a lot of concern about XML's perceived verboseness. This is addressed directly in Item 50, Compress if space is a problem. This chapter and ten others are online at http://www.cafeconleche.org/books/effectivexml/ . Check it out.