Slashdot Mirror


Effective XML

milaf writes "Who doesn't know about XML nowadays? Quite a few people, actually: there has been so much hype around it that some people think that XML is a programming language, a database, or both at the same time. On the other hand, if you are a developer, chances are that you feel that -- no matter its usefulness -- there is not much to XML. After all, it may take just a few hours to get the hang of creating and parsing an XML document. Maybe this is why most of the many and voluminous books discuss numerous XML-related technologies, but say less about the usage of XML itself." Read on for milaf's review of a book that takes the opposite tack. Effective XML: 50 Specific Ways to Improve Your XML author Elliotte Rusty Harold pages 336 publisher Addison-Wesley rating 10/10 reviewer milaf ISBN 0321150406 summary Very well written collection of topics on XML Best Practices

In Effective XML: 50 Specific Ways to Improve Your XML, Elliotte Rusty Harold takes a different approach: know your elements and tags -- they are not the same thing! -- and weigh your choices in a context, because any technology applied for the wrong reasons may fail to deliver on its promises.

Following Scott Myers' groundbreaking Effective C++, the author invites us to re-evaluate seemingly trivial issues to discover that life is not as simple as it seems in the world of XML. In each of the 50 items (chapters), he gets into the inner workings of the language, its usage and related standards, thus giving us specific advice on how to use XML correctly and efficiently. The 300-page book is divided into four parts: Syntax, Structure, Semantics, and Implementation. Yet in the introduction, the author sets the tone by discussing such fundamental issues as "Element versus Tag," "Children versus Child Elements versus Content," "Text versus Character Data versus Markup," etc. On these first pages the author started earning my trust and admiration for his knowledge and ability to get right to the point in a clear and simple language.

The first part, Syntax, contains items covering issues related to the microstructure of the language, and best practices in writing legible,maintainable, and extensible XML documents. (In it, over 19 pages are dedicated to the implications of the XML declaration!) That seems a lot for one XML statement that most people cut-and-paste at the top of their XML documents without giving it much thought, doesn't it? Actually not, if you follow the author's reasoning and examples.

The second part, Structure, discusses issues that arise when creating data representation in XML, i.e. mapping real-world information into trees, elements, and attributes of an XML document; it also talks about tools and techniques for designing and documenting namespaces and schemas.

The third part, Semantics, explains the best ways to convert structural information represented in XML documents into the data with its semantics. It teaches us how to choose the appropriate API and tools for different types of processing to achieve the best effect. This chapter has a lot of good advice for creating solutions that are simple, effective, and robust.

The final part, Implementation, advises the reader on design and integration issues related to the utilization of XML; these issues include data integrity, verification, compression, authentication, caching, etc.

This book will be useful to a professional with any level of experience. It may be used as a tutorial and read from the cover to cover, or one can enjoy reading selected items, depending on the experience and taste. The book's very detailed index makes it an excellent reference on the subject as well. In the prefix to the book, the author writes, "Learning the fundamentals of XML might take a programmer a week. Learning how to use XML effectively might take a lifetime." I'm not sure about the "lifetime" -- that's an awfully long time for using one technology -- but for the most confident of us this still may not be enough :) . Your mileage may vary, but I suspect that you could shave a few months off that time by browsing through this book once in a while. Most importantly, it will make you a better professional and make you proud of the results of your work. Wouldn't this worth your while?

You can purchase Effective XML: 50 Specific Ways to Improve Your XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

27 of 312 comments (clear)

  1. Why do I have the feeling... by ultrabot · · Score: 0, Insightful

    That the book won't mention the "s-exprs on drag" angle...

    --
    Save your wrists today - switch to Dvorak
    1. Re:Why do I have the feeling... by elharo · · Score: 3, Insightful

      You're right about that. It doesn't. Not all technologies that are isomorphic to each other are equally useful, any more than all Turing complete programming languages are the same. The representation matters, and the XML representation has proven more useful and accessible than the S-expression representation.

      I'm not fully convinced that S-expressions are isomorphic to XML either. The proper handling of Unicode and non-English, non-ASCII text presented in multiple encodings is a big advantage of XML compared to S-expressions. I suppose something like this could theoretically be added to S-expressions, but has it been?

  2. XML... by the+man+with+the+pla · · Score: 5, Insightful

    I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it

    --
    The linux hacker
  3. milaf, if you could expand a bit... by Randolpho · · Score: 4, Insightful

    Does the book discuss the pros and cons of XML? Such as, when is it a good idea to use XML? When would a CSV, INI, or other structured text document be a better choice than XML?

    These are issues that need to be solved first, before one creates an effective XML structure. Does the book address them?

    --
    "Times have not become more violent. They have just become more televised."
    -Marilyn Manson
    1. Re:milaf, if you could expand a bit... by LetterJ · · Score: 4, Insightful

      Unfortunately, most Slashdot reviews are little more than book reports with pretty much no analysis. They end up just listing what the chapters contain.

      Incidentally, one of the main reasons to choose XML over either CSV or INI is that both of those formats are pretty driven by rigid "column" type structures. In most INI files there's only room for pairs of names and single values. In CSV records are one row with a set number of fields.

      XML lets you expand the children fully and represent more complex data. For instance, a classical CSV file with address information for customers would have columns for street address, city and then start to have problems when you start having columns for State (when you actually consider the world outside the US), postal codes, etc. If this is in XML, you can have your schema be more flexible and say that each <customer> contains a <shippingaddress> element which can contain either a <state> or a <province> or neither.

      In other words, you can use trees to represent data instead of flat rows. I'm not saying that it's the be-all and end-all that the evangelists say it is. There are still lots of places that simpler text files and other data storage formats are better, but XML can be useful.

  4. My experience with XML by Valar · · Score: 4, Insightful

    It has been my experience with XML that it is like a lot of other things in development: the good developers understand it immediately and have native intuition towards best practices. The bad developers never really get it and spend their time reproducing tricks they saw in a cookbook. That's good and fine until you need something that doesn't quite fit into categories a, b or c. Another example of this is how high school and university data structure/algorithm classes never spend any time of development of new data structures that exactly meet the problem specification. Instead they lay out half a dozen types of linear lists, a couple of trees, and some hashing functions and say, "Well, you can glue just about anything together from this." Perhaps this book takes what is, IMHO, the better approach-- laying out the tools and politely explaining what the implication of each is, rather than attempting to list out pages of cute examples of what each can do.

  5. Server load could be at the root of XML's problems by mrgoatCEO · · Score: 3, Insightful

    I know that as a student maintaining a website I am in the minority of XML users, but I the main thing that stops me from moving my site (small-scale though it may be) over to using more XML is sheer server load. The fact of the matter is that we still don't have true low-bandwidth database solutions, and until this changes, I doubt that much will be done with technologies like XML (at least on smaller, non-corporate sites) no matter how much potential they have.

    --
    --Goat
    CEO, Goat Software
    Goatblog
  6. 5 years in the business... by pong · · Score: 5, Insightful

    ... and it is starting to dawn on me that trends like pervasive XMLization is going to haunt us for ever. The combination of business-minded consultants that push a market to create demand for themselves and a huge number of clueless but enthusiastic developers that will jump on any new idea and push it where it doesn't want to go unsurprisingly leads to this kind of instability.

    I hate XML with a passion. Let me present you with three examples

    1) Programming languages based on XML.

    Yes, it is true. Perverted minds, somewhere on this planet, actually seems to think that this is a neat idea! Since their initial conception the pivotal point of programming languages have been to raise the level of programming. To move from the computers domain to the human domain - to make it more intuitive an natural for a human being to program a computer. With these new XML-based languages we are moving a step backwards, because truely the only benefit of XML in this context is that it is easier for computers to parse, while it is certainly harder for humans.

    2) XSLT

    Have you tried it? I rest my case.

    3) SOAP

    Okay, initially this actually seemed like a good idea to me, but having thought about it, I really think it sucks. Okay, so it is easier to implement SOAP for a particular platform or programming language, but a wire protocol is like a compiler or an OS kernel in a certain sense - it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.

    1. Re:5 years in the business... by Anonymous Coward · · Score: 1, Insightful

      With these new XML-based languages we are moving a step backwards, because truely the only benefit of XML in this context is that it is easier for computers to parse, while it is certainly harder for humans.

      Flashback to the late 60s and early 70s...

      With these new "high-level" languages we are moving a step backwards, because truly the only benefit of HLLs in this contect is is easier for humans to read, while it is certainly slower for computers to execute.

      XSLT. Have you tried it? I rest my case.

      It sounds like a piss-poor case then, as I've had no issues with XSLT. Yes, it takes some learning, but so does anything if you're encountering it for the first time.

      ...it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.

      It's exactly that -- a trade-off for increased performance and stability. Different tools for different purposes, my friend.

  7. Re:The main issue with XML is performance by I8TheWorm · · Score: 5, Insightful

    To put it another way...

    this single record

    Doe, John 1234567 12/1/2001

    took 31 bytes, while it's XML companion (using short, simple tags) took 96 bytes.

    Not all XML files wind up being 3 times the size of their flatfile counterparts, but they are inherintly larger. There really isn't a way to make loading/parsing that data any faster, by the nature of working with ASCII/ANSI files. XML will always be slower.

    --
    Saying Android is a family of phones is akin to saying Linux is a family of PCs.
  8. What are you talking about? by mellon · · Score: 5, Insightful

    XML is just text! If the XML parser is slow, write a faster one! Figure out where the bottlenecks are! Don't give me this XML is slow crap. This is slashdot - you're supposed to be a geek. If you don't like XML, fine, but come up with a geeky reason not to like it, not some problem whose solution is just to roll up your sleeves and do some hacking!

    Oy! :')

    1. Re:What are you talking about? by nat5an · · Score: 5, Insightful

      Okay, fine XML isn't slow by nature. But it's a generalized solution. Not every set of data needs to be stored in a general tree, so putting every set into one will often create a lot of extra work. The benefit of XML is its portablity, and the price is the performance hit you take from packing and unpacking all that data.

      --
      Head down, go to sleep to the rhythm of the war drums...
    2. Re:What are you talking about? by micromoog · · Score: 3, Insightful
      How about the fact that, by definition, it takes something like 10 times as much information to store/transfer data in XML than in a native binary format?

      Having a huge amount of metadata surround every piece of data is not always a good thing. XML is slow, parser issues notwithstanding.

    3. Re:What are you talking about? by larry+bagina · · Score: 3, Insightful

      parsing any text involves character-by-character analysis. No amount of geekdom code rewriting can change that. If an XML file is 3-times as large as a CSV file, it will take 3-times as long to parse. And both will be magnitudes slower than a binary record.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    4. Re:What are you talking about? by gbjbaanb · · Score: 2, Insightful

      wow, wait a minute... you want a geeky reason not to use it... well, how about rolling your own binary parsing data format is a) much, much more difficult for others to understand, b) way faster, c) far more bandwidth efficient.

      there you go - 3 classic geek reasons to do something the hard way instead of the standard, ordinary, easy but OK for mortals way.

      Incidentally, XML really is slow. Sure it looks nice, is easy to understand, easy to create with the simplest of text editors, interoperable, and an industry standard. But it is still a technology that doesn't cut it when you need your data stored in small, fast blobs. A case in point - my previous company used XML everywhere (it was cool, after all), but after a while performance (when sclaed to many users) became an issue. Rewriting the XML-handling object to use a binary format made things much, much, much faster. The XML blobs were then only used for the browser front end, and for debugging on a developer machine. XML is good, but don't ever pretend its all things to all men, in all cases. It isn't. Its slow.

    5. Re:What are you talking about? by GlassHeart · · Score: 2, Insightful
      my previous company used XML everywhere (it was cool, after all), but after a while performance (when sclaed to many users) became an issue. Rewriting the XML-handling object to use a binary format made things much, much, much faster. The XML blobs were then only used for the browser front end, and for debugging on a developer machine.

      No, your company did the exact right thing in choosing XML. When the nascent system is still being actively debugged, you made the process much easier because XML is human readable. As you begin to scale up its use, you proved a performance problem and relied on the modularity of the code to simply replace the XML code with efficient binary formats. If you had not seen a performance problem (perhaps the bottleneck is elsewhere and inevitable anyway), then presumably you'd leave the XML code alone.

      You started with a general solution, and then optimized as necessary. I consider this an example of a job well done.

  9. Re:The main issue with XML is performance by vmfedor · · Score: 3, Insightful
    This is the main reason I personally don't believe XML can be used as a functioning database. I see it being used more as a way to transport data across the internet and across different platforms. If two companies merge and one uses mostly UNIX-based servers and the other uses Microsoft, the two can combine their databases easily using XML.


    I see XML as a nice way to transport data but (at least right now) it's not mature and/or fast enough to serve as a fully functioning database.

    --

    I like my women how I like my sugar.. granulated.

  10. speek kills... by Broadcatch · · Score: 2, Insightful

    ...resource hogs.

    While I'm not an XML zealot, I like the clarity it can bring to many domains of practice. Regarding the performance hit, get a faster computer! If you don't have a fast enough one yet, wait a year.

    Lisp was shunned in the past primarily for speed reasons, too. Now the main reason many don't like Lisp is because they don't understand advanced software engineering concepts and write poor Lisp code.

    --

    The antidote for misuse of freedom of speech is more freedom of speech.
    -- Molly Ivins

  11. Re:5 years in the business... WHERE??? by mikewolf · · Score: 1, Insightful

    i have been in the business for 4 years now, and i use XML on a daily basis.

    not only is it a powerful media for representing (and caching) hierarchy/tree-based data, extensions like XSLT providing tremendous advantages in transforming data for a variety of other purposes (you probably hated lisp/scheme based language, too).

    While programming language based on XML at first sound a little strange, combining an XML based programming language with XSLT could be super powerful, especially with concepts like code generation...

  12. Re:The main issue with XML is performance by Boing · · Score: 4, Insightful
    Doe, John 1234567 12/1/2001

    took 31 bytes, while it's XML companion (using short, simple tags) took 96 bytes.

    Uh huh. Now let me ask you, is that record space-delimited? Comma-delimited? Fixed-width [shudder]? If it's fixed width, and the first name is fixed at four characters, is the person's name "John" or "John-Paul"?

    31 bytes for your record, and 96 for equivalent XML... but how many extra bytes were spent on code to manage your particular flavor of data? How much time was spent in development of that code? How does that time (and associated cost) compare to the extra millisecond/record required to transmit and process the XML data?

    XML is standard. It can fit almost any type of data (though binary data is not currently the most effective thing in the world, but it can be incorporated). Since MS is integrating XML into all of their products, we won't have to worry about many people who don't have a good XML library installed on their systems. So instead of 50 programs with their own (limited and likely buggy) data formatting subsystems, we'll have 50 programs that each call one library on disk, in a standard, robust system with enough exposure to squash the show-stopping bugs.

    XML will always be slower.

    Depends on how you look at it. If the aforementioned widely-available XML parser gets enough of a beating, it will be optimized like you wouldn't believe. Yes, two data processors (one XML, one markupless) with equal amounts of work spent on them will perform in favor of the simpler format... but XML's simplicity and universality will make it so that the XML parsers will have more eyes.

    The same philosophy is why the well known open-source programs (linux, apache, etc) are functional and stable as hell:

    Wide use + Openness = Greatness.

  13. Re:The main issue with XML is performance by Not+The+Real+Me · · Score: 2, Insightful

    XML, when it comes to data and databases, is nothing more than a beefed-up alternative to CSV (comma separated values).

  14. One way to improve it. Don't use it. by wdavies · · Score: 2, Insightful
    Ok, maybe I'm missing a point, but the next time I see an XML file like this...
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    <RECORD NAME=".." ADDRESS=".." AGE = "..">
    instead of this
    ..\t..\t..
    ..\t..\t..
    ..\t..\t..
    I am going to go nuts. Yes, XML is an improvement for truly hierarchical or repeating data, but efficient it isn't and a pain in the butt to use with AWK or anyone of a million Unix utilities. The one downside I have on ESR's Art of Unix is that while espousing how clean is with pipes and text, he then starts waxing lyrical about XML... Winton
    1. Re:One way to improve it. Don't use it. by Anonymous Coward · · Score: 2, Insightful

      The whole idea with XML is that it will catch the error when a user or script writes "RECROD" in one place, or forget a space. Your AWK script will likely just crash without an explanation or miss a record if the user e.g. forgets the carriage return between lines.

      And just assume that six months after releasing your program you realize it would be very useful with an "OCCUPATION" field to. What do you do now? Maintain a separate collection of databases for each generation of your software?

      The XML-database would be guaranteed to be backwards comaptible, in contrast to your simple solution.

      The reason why XML is good in "the real world" is quite simple: Programmer time is expensive. Testing is expensive. Compatibility between versions is important, but expensive to maintain. Storage is cheap. CPU-power is cheap.

  15. Re:The main issue with XML is performance by helix_r · · Score: 2, Insightful


    "Doe, John 1234567 12/1/2001 "

    If you think about it that is a useless piece of information without lots and lots of context surrounding it.

    * What is Doe?
    * What is " John"?
    * What is 1234567
    * 12/1/2001 looks like a date. Is it Dec 1 or Jan 12?
    * How do I know if this record is complete?
    * Is my field separator a " " or ","?

    Problem: The year is 2023, we now use format "x" in our records, you need to onvert all records to format "x" -- there are 233 different types of records. 7,220,134 records need to be translated in 2 weeks. Which formats will be the easiest to convert??

    XML allows you to beat the above problems by being a somewhat self-describing format. For a few extra bytes you get a lot more functionality, interoperability and future-proof-ness

  16. XML is just tagged s-lists. by BrittPark · · Score: 4, Insightful

    XML is highly overrated and generally over-used. Admittedly XML + CSS is better than html, but beyond that its only reasonable use is as a generalized syntax for configuration files, and as such does a good job, or at least I've had success using it that way in the past. Many (if not most) of its other uses are just poor program design. Soap is an extremely silly idea. Why use XML for a marshalling syntax for RPC? It's slower, bulkier, and just a bad choice in comparison to a binary marshalling mechanism. Now as a syntax for an RPC's IDL XML makes a lot of sense, but not as a transport.

    Glad to get that off my chest. I have a bitter history with XML. I was the first person at my former company to bring XML in as a uniform configuration file format for our product, but then found myself a couple of years later forced into adding XML specific features to the filesystem that was the core of our company's product. I spent a week thinking about the idea, and concluded that it was a bad one. Thus followed a long (and fruitless) battle with management to scratch the plan. The end result was a technically nifty but useless set of features. The work remains unreleased for lack of customer interest. At least I get a bit of "I told you so." pleasure.

  17. Re:L is for Lousy... by gbrayut · · Score: 2, Insightful

    >And config files, simpler parsing like 'property=value' is easier and faster.
    A Gnome config file that has 4 tags, and 1 tag had 80! attributes is just stupid. Yet this is how people use XML.

    There are many cases where a simple property=value is much better then full scale XML, but when used correctly XML can be much more efficient.

    Take your everyday INI file, containing simple property=value strings. Sure it works, but all those properties have other information as well such as a description, data type, valid parameters, default settings... you get the point.

    Try adding that into an INI file and you will end up with a mess. XML can be used to incorporate all the additional information into one file and in doing so program configuration user interfaces can be dynamically created.

    Most programs add and remove features with every release and it is convenient to store settings in an XML file so that interfaces to those settings can be dynamically generated. Simply populate a list box or table with the name/value property pairs, have a text area display the description for a selected property, and have input data validated to the corresponding input parameters and data type.

    It might take longer to plan, but if implemented correctly it can save time and confusion. In the end, it will be a larger file, but if done correctly that data actually means something!

  18. Re:The main issue with XML is performance by RetroGeek · · Score: 2, Insightful

    doesn't matter what the current/target system is

    Well yes it does matter. It must be .NET

    With XML, I can create it in DOS version 1 using an 8bit utility, put it onto a diskette and have a user read it on a Linux, Windows, OS/2, ... system.

    --

    - - - - - - - - - - -
    I am a programmer. I am paid to produce syntax not grammar. Deal with it.