Effective XML
In Effective XML: 50 Specific Ways to Improve Your XML, Elliotte Rusty Harold takes a different approach: know your elements and tags -- they are not the same thing! -- and weigh your choices in a context, because any technology applied for the wrong reasons may fail to deliver on its promises.
Following Scott Myers' groundbreaking Effective C++, the author invites us to re-evaluate seemingly trivial issues to discover that life is not as simple as it seems in the world of XML. In each of the 50 items (chapters), he gets into the inner workings of the language, its usage and related standards, thus giving us specific advice on how to use XML correctly and efficiently. The 300-page book is divided into four parts: Syntax, Structure, Semantics, and Implementation. Yet in the introduction, the author sets the tone by discussing such fundamental issues as "Element versus Tag," "Children versus Child Elements versus Content," "Text versus Character Data versus Markup," etc. On these first pages the author started earning my trust and admiration for his knowledge and ability to get right to the point in a clear and simple language.
The first part, Syntax, contains items covering issues related to the microstructure of the language, and best practices in writing legible,maintainable, and extensible XML documents. (In it, over 19 pages are dedicated to the implications of the XML declaration!) That seems a lot for one XML statement that most people cut-and-paste at the top of their XML documents without giving it much thought, doesn't it? Actually not, if you follow the author's reasoning and examples.
The second part, Structure, discusses issues that arise when creating data representation in XML, i.e. mapping real-world information into trees, elements, and attributes of an XML document; it also talks about tools and techniques for designing and documenting namespaces and schemas.
The third part, Semantics, explains the best ways to convert structural information represented in XML documents into the data with its semantics. It teaches us how to choose the appropriate API and tools for different types of processing to achieve the best effect. This chapter has a lot of good advice for creating solutions that are simple, effective, and robust.
The final part, Implementation, advises the reader on design and integration issues related to the utilization of XML; these issues include data integrity, verification, compression, authentication, caching, etc.
This book will be useful to a professional with any level of experience. It may be used as a tutorial and read from the cover to cover, or one can enjoy reading selected items, depending on the experience and taste. The book's very detailed index makes it an excellent reference on the subject as well. In the prefix to the book, the author writes, "Learning the fundamentals of XML might take a programmer a week. Learning how to use XML effectively might take a lifetime." I'm not sure about the "lifetime" -- that's an awfully long time for using one technology -- but for the most confident of us this still may not be enough :) . Your mileage may vary, but I suspect that you could shave a few months off that time by browsing through this book once in a while. Most importantly, it will make you a better professional and make you proud of the results of your work. Wouldn't this worth your while?
You can purchase Effective XML: 50 Specific Ways to Improve Your XML from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
If you want to read any book for free, just ask your local library to order it and they will. Libraries guess at what books people want to read, so if anyone shows any interest in any book, they order it. They loose their federal funding if they don't spend the money they are allocated, so they are generally VERY willing to buy as much as possible.
----
Squirrel
I think one of the main problems with the embedding of XML architecture into office productivity software is unfortunately the end user. I mean, how long have programmes like MS Word had "document properties" contained in them, and how many people are actually using them? I'm currently working on a project to retrieve documents accross a company's backed-up data from the past 10 years, and there is very very little metadata available for us to do any searching on. Unless the embedded XML contained within office suites is brought more "to the fore" and in the face of users, instead of being a behind the scenes 'option', people just are not going to use it
The linux hacker
Include an XML Declaration
Mark Up with ASCII if Possible
Stay with XML 1.0
Use Standard Entity References
Comment DTDs Liberally
Name Elements with Camel Case
Parameterize DTDs
Modularize DTDs
Distinguish Text from Markup
White Space Matters
Structure:
Make Structure Explicit through Markup
Store Metadata in Attributes
Remember Mixed Content
Allow All XML Syntax
Build on Top of Structures, Not Syntax
Prefer URLs to Unparsed Entities and Notations
Use Processing Instructions for Process-Specific Content
Include All Information in the Instance Document
Encode Binary Data Using Quoted Printable and/or Base64
Use Namespaces for Modularity and Extensibility
Rely on Namespace URIs, Not Prefixes
Don't Use Namespace Prefixes in Element Content and Attribute Values
Reuse XHTML for Generic Narrative Content
Choose the Right Schema Language for the Job
Pretend There's No Such Thing as the PSVI
Version Documents, Schemas, and Stylesheets
Mark Up According to Meaning
Semantics:
Use Only What You Need
Always Use a Parser
Layer Functionality
Program to Standard APIs
Choose SAX for Computer Efficiency
Choose DOM for Standards Support
Read the Complete DTD
Navigate with XPath
Serialize XML with XML
Validate Inside Your Program with Schemas
Implementation:
Write in Unicode
Parameterize XSLT Stylesheets
Avoid Vendor Lock-In
Hang On to Your Relational Database
Document Namespaces with RDDL
Preprocess XSLT on the Server Side
Serve XML+CSS to the Client
Pick the Correct MIME Media Type
Tidy Up Your HTML
Catalog Common Resources
Verify Documents with XML Digital Signatures
Hide Confidential Data with XML Encryption
Compress if Space Is a Problem
... and it is starting to dawn on me that trends like pervasive XMLization is going to haunt us for ever. The combination of business-minded consultants that push a market to create demand for themselves and a huge number of clueless but enthusiastic developers that will jump on any new idea and push it where it doesn't want to go unsurprisingly leads to this kind of instability.
I hate XML with a passion. Let me present you with three examples
1) Programming languages based on XML.
Yes, it is true. Perverted minds, somewhere on this planet, actually seems to think that this is a neat idea! Since their initial conception the pivotal point of programming languages have been to raise the level of programming. To move from the computers domain to the human domain - to make it more intuitive an natural for a human being to program a computer. With these new XML-based languages we are moving a step backwards, because truely the only benefit of XML in this context is that it is easier for computers to parse, while it is certainly harder for humans.
2) XSLT
Have you tried it? I rest my case.
3) SOAP
Okay, initially this actually seemed like a good idea to me, but having thought about it, I really think it sucks. Okay, so it is easier to implement SOAP for a particular platform or programming language, but a wire protocol is like a compiler or an OS kernel in a certain sense - it is okay that it is very hard to write, as long as it is stable and high performance, because it is such a central component.
Maybe here?
To put it another way...
this single record
Doe, John 1234567 12/1/2001
took 31 bytes, while it's XML companion (using short, simple tags) took 96 bytes.
Not all XML files wind up being 3 times the size of their flatfile counterparts, but they are inherintly larger. There really isn't a way to make loading/parsing that data any faster, by the nature of working with ASCII/ANSI files. XML will always be slower.
Saying Android is a family of phones is akin to saying Linux is a family of PCs.
XML is just text! If the XML parser is slow, write a faster one! Figure out where the bottlenecks are! Don't give me this XML is slow crap. This is slashdot - you're supposed to be a geek. If you don't like XML, fine, but come up with a geeky reason not to like it, not some problem whose solution is just to roll up your sleeves and do some hacking!
:')
Oy!
You have absolutely NO idea what you are talking about, and of course have been modded +3 insightful. Good one mods.
XML is extensible by it's very nature. By itself, an xml file is just that, an xml file, it means absolutely NOTHING without context and definition.
This is what DTD's do. They don't limit xml in any way, rather they describe a particular use of xml. For example: SVG, MathML and XHTML are all languages that use xml. Each one of these languages have a DTD that define the format for a valid xml document FOR THAT LANGUAGE.
Just because a DTD for SVG exists doesn't mean that anything at all has changed with xml itself.
Next, XSLT is a technology with a very specific purpose, simply put: To take an xml file as input and create a new xml file for output based on the rules written into the transform.
So, with all of that said, there is absolutely NO reason why there shouldn't be a DTD repository, and again, there is no reason why there shouldn't be a PhotoAlbum DTD in that repository. What problems would this cause? None. What benefits could be observed? Instead of everyone needing an xml document to describe photo albums rolling their own format, people might just reuse a standard DTD to do so. And application writers just might too. And lo and behold, Application X on platform Y might be able, with no work involved, open Album AA Created by Application BB on platform CC.
Getting some of the big picture?
No Comment.
illegitimii non ingravare