Domain: w3.org
Stories and comments across the archive that link to w3.org.
Stories · 458
-
W3C Releases XForms
An anonymous reader submits: "On the heels of several other releases, the W3C has released XForms as a Candidate Recommendation. Coverage here and here. XForms is the way-better version of HTML forms -- it's XML-based and includes built-in client-side validation and calculations, without scripting. It is expected to replace old-fashioned HTML forms in XHTML 2.0. It's also being viewed by many as the standards-based alternative to Microsoft's XDocs. Now's your chance to try it out and submit your comments, before the official Recommendation comes out in a few months." -
W3C Releases Drafts For DOM L2 And More
TobiasSodergren writes "People at W3C seem to have had a busy Friday, according to their website. They have released no less than 4 working drafts (Web Ontology Language (OWL) Guide, the QA Working group - Introduction, Process and Operational Guidelines, Specification Guidelines) and 2 proposed recommendations: XML-Signature XPath Filter 2.0 and HTML DOM 2. Does the this mean that one can expect browsers to behave in a predictable manner when playing around with HTML documents? Hope is the last thing to leave optimistic people, right?" -
W3C Releases Drafts For DOM L2 And More
TobiasSodergren writes "People at W3C seem to have had a busy Friday, according to their website. They have released no less than 4 working drafts (Web Ontology Language (OWL) Guide, the QA Working group - Introduction, Process and Operational Guidelines, Specification Guidelines) and 2 proposed recommendations: XML-Signature XPath Filter 2.0 and HTML DOM 2. Does the this mean that one can expect browsers to behave in a predictable manner when playing around with HTML documents? Hope is the last thing to leave optimistic people, right?" -
W3C Releases Drafts For DOM L2 And More
TobiasSodergren writes "People at W3C seem to have had a busy Friday, according to their website. They have released no less than 4 working drafts (Web Ontology Language (OWL) Guide, the QA Working group - Introduction, Process and Operational Guidelines, Specification Guidelines) and 2 proposed recommendations: XML-Signature XPath Filter 2.0 and HTML DOM 2. Does the this mean that one can expect browsers to behave in a predictable manner when playing around with HTML documents? Hope is the last thing to leave optimistic people, right?" -
W3C Releases Drafts For DOM L2 And More
TobiasSodergren writes "People at W3C seem to have had a busy Friday, according to their website. They have released no less than 4 working drafts (Web Ontology Language (OWL) Guide, the QA Working group - Introduction, Process and Operational Guidelines, Specification Guidelines) and 2 proposed recommendations: XML-Signature XPath Filter 2.0 and HTML DOM 2. Does the this mean that one can expect browsers to behave in a predictable manner when playing around with HTML documents? Hope is the last thing to leave optimistic people, right?" -
W3C Releases Drafts For DOM L2 And More
TobiasSodergren writes "People at W3C seem to have had a busy Friday, according to their website. They have released no less than 4 working drafts (Web Ontology Language (OWL) Guide, the QA Working group - Introduction, Process and Operational Guidelines, Specification Guidelines) and 2 proposed recommendations: XML-Signature XPath Filter 2.0 and HTML DOM 2. Does the this mean that one can expect browsers to behave in a predictable manner when playing around with HTML documents? Hope is the last thing to leave optimistic people, right?" -
W3C Releases Drafts For DOM L2 And More
TobiasSodergren writes "People at W3C seem to have had a busy Friday, according to their website. They have released no less than 4 working drafts (Web Ontology Language (OWL) Guide, the QA Working group - Introduction, Process and Operational Guidelines, Specification Guidelines) and 2 proposed recommendations: XML-Signature XPath Filter 2.0 and HTML DOM 2. Does the this mean that one can expect browsers to behave in a predictable manner when playing around with HTML documents? Hope is the last thing to leave optimistic people, right?" -
Is W3C's P3P Good Privacy?
nileshch asks: "A very important development in recent times with regards to website users' privacy has happened with the W3C introducing the Platform for Privacy Preferences(P3P). P3P allows websites to create and maintain XML-based privacy policies for the entire website or sub sections of the site. These machine readable policies document what information is collected from users and how it is going to be used. Today, a few browsers like Mozilla/Netscape & Internet Explorer are committed to giving support for P3P (Mozilla here, IE here) . Although that support seems only skin-deep. I also find very few big sites adopting P3P seriously. Isn't it like the classic chicken-and-egg situation? Websites wait for full P3P support on browsers, browsers go slow on development because there isn't much feature demand happening on this front. Do you have P3P policies for your website? If not, what stops you from creating one? We all create hoopla over tiny privacy issues, user profiling and doubleclick.net . Then why isn't there much enthusiasm for P3P support in browsers?" -
XML 1.1 Spec Hits Some Snags
oever writes "News.com reports that the new XML 1.1 specification defines a new newline character, making it incompatible with the 1.0 specifiation. Apparently, IBM has been pushing the new character to avoid having to modify their software, thereby invalidating everybody else's XML software." -
XML 1.1 Spec Hits Some Snags
oever writes "News.com reports that the new XML 1.1 specification defines a new newline character, making it incompatible with the 1.0 specifiation. Apparently, IBM has been pushing the new character to avoid having to modify their software, thereby invalidating everybody else's XML software." -
W3C Patent Board Recommends Royalty-Free Policy
Bruce Perens writes "A year ago, the World Wide Web Consortium proposed a policy to allow royalty-generating patents to be embedded in web standards. This would have been fatal to the ability of Free Software to implement those standards. There was much protest, including over 2000 emails to the W3C Patent Policy Board spurred on by a call to arms published on Slashdot. As a result of the complaints, I was invited to join W3C's patent policy board, representing Software in the Public Interest (Debian's corporation) -- but really the entire Free Software community. I was later joined in this by Eben Moglen, for FSF, and Larry Rosen, for the Open Source Initiative." Bruce has written more below - it's well worth reading. After a year of argument and see-sawing, W3C's patent policy board has voted to recommend a royalty-free patent policy. This recommendation will be put in the form of a draft and released for public comment. There will probably be a dissenting minority report from some of the large patent holders. Tim Berners-Lee and the W3C Advisory Committee, composed of representatives from all of the consortium's members, will eventually make the final decision on the policy. My previous interaction with the Advisory Committee and Berners-Lee lead me to feel that they will approve the royalty-free policy.
The policy will require working group members to make a committment to royalty-free license essential claims - those which you can not help infringing if you are to implement the standard at all. There is also language prohibiting discriminatory patent licenses. The royalty-free grant is limited to the purpose of implementing the standard, and does not extend to any other application of the patent. And there is a requirement to disclose whether any patent used, even a non-essential one, is available under royalty-free terms, so that troublesome patents can be written out of a standard. The limitation of the scope-of-use on patents, and some other aspects of the policy, are less than I would like but all that I believed we could reasonably get. Eben Moglen may have some discussion regarding how GPL developers should cope with scope-of-use-limited patent grants from other parties. For now, it should suffice to say that while this is less than desirable, is will not block GPL development.
I'm not allowed to disclose how individual members voted, but I'll note that the vote did not follow "friends-vs-enemies" lines that the more naive among us might expect - so don't make assumptions.
Now, we must take this fight elsewhere. Although IETF has customarily been held up as the paragon of openness, they currently allow royalty-bearing patents to be embedded in their standards. This must change, and IETF has just initiated a policy discussion to that effect. We must pursue similar policies at many other standards bodies, and at the governments and treaty organizations that persist in writing bad law.
For me, this process has included two trips to France (no fun if you have to work every day) and an appearance at a research meeting in Washington, a week in Cupertino, innumerable conference calls and emails, and upcoming meetings in New York and Boston. That's a lot of time away from my family. Larry Rosen has shouldered a similar burden while nobody has been paying him for his time and trouble, and Eben Moglen put in a lot of time as well. Much of the time was spent listening to royalty-bearing proposals being worked out in excrutiating detail, which fortunately did not carry in the final vote. We also had help from a number of people behind the scenes, notably John Gilmore, and the officers and members of the organizations we represent.
I'd like to give credit to HP. Because I was representing SPI, and HP had someone else representing them at W3C, I made it clear to my HP managers that they would not be allowed to influence my role at W3C - that would have created a conflict-of-interest for me, as well as giving HP unfair double-representation. HP managers understood this, and were supportive. During all but the very end of the process, HP paid my salary and travel expenses while they knew that I was functioning as an independent agent who would explicitly reject their orders. Indeed, HP allowed me to influence their policy, rather than the reverse. This was the result of enlightened leadership by Jim Bell, Scott K. Peterson, Martin Fink, and Scott Stallard.
For most of the existence of Free Software, technology has been of primary importance. It will remain so, but the past several years have seen the emergence of the critical supporting role of political involvement simply so that we can continue to have the right to use and develop Free Software. I do not believe that we will consistently be able to code around bad law - we must represent what is important about our work and involve ourselves in policy-making worldwide, or what we do will not survive. I hope to continue to serve the Free Software Community in this role.
Respectfully Submitted
Bruce Perens
" -
Perl & LWP
When direct database access to the information you need isn't available, but web pages with the right data are, you might pursue "screen-scraping" -- fetching a web page and scanning its text for the appropriate pieces of text in order to do further processing. LWP (Library for WWW access in Perl) is a collection of module to help you do this. mir writes: " Perl & LWP is a solid, no-nonsense book that will teach you how to do screen-scraping using Perl. It describes how to automatically retrieve and use information from the web. An introduction to LWP and related modules from simple to advanced uses and various ways to extract information from the returned HTML." Perl & LWP author Sean M. Burke pages 264 publisher O'Reilly and Associates rating 9 reviewer mir ISBN 0596001789 summary Excellent introduction to extracting and processing information from web sites.
The good: The book has a nice style and good coverage of the subject, includes introduction to all the modules used, reference material and includes good, well-developed examples. I really liked the way the authors describe the basic methodology to develop screen-scraping code, from analyzing an HTML page to extracting and displaying only what you are interested in.
The bad: Not much is bad, really. Some chapters are a little dry, though, and sometimes the reference material could be better separated from the rest of the text. The book covers only simple access to web sites; I would have liked to see an example where the application engages in more dialogue with the server. In addition, the appendixes are not really useful. More Info:If it had not been published by O'Reilly, Perl and LWP could have been titled Leveraging the Web: Object-Oriented techniques for information re-purposing, or Web Services, Generation 0. An even better title would have been Screen-scraping for fun and profit: one day we might all use Web Services and easily get the information we need from various providers using SOAP or REST, but in the meantime the common way to achieve this goal is just to write code to connect to a web server, retrieve a page and extract the information from the HTML. In short, "screen-scraping." This will teach you all about using Perl to get Web pages and extract their "substantifique moëlle" (the pith essence, the essentials) for your own usage. It showcases the power of Perl for that kind of job, from regular expressions to powerful CPAN modules.
At 200 pages, plus 40 pages of appendices and index, this one is part of that line of compact O'Reilly books which covers only a narrow topic in each volume but which covers those topics well. Just like Perl & XML , its target audience is Perl programmers who need to tackle a new domain. It gives them a toolbox and basic techniques that to provide a jump start and avoid many mistakes.
Perl & LWP starts from the basics: installing LWP, using LWP::Simple to retrieve a file from a URL, then goes on to a more complete description of the advanced LWP methods for dealing with forms and munging URLs. It continues with five chapters on how to process the HTML you get, using regular expressions, an HTML tokenizer and HTML::TreeBuilder, a powerful module that builds a tree from the HTML. It goes on with an explanation of how to allow your programs to access sites that require cookies, authentication or the use of a specific browser. The final chapter wraps it all up in a bigger example: a web-spider.
The book is well-written and to-the-point. It is structured in a way that mimics what a programmer new to the field would do: start from the docs for a module, play with it, write snippets of code that use the various functions of the module, then go on to coding real-life examples. I particularly liked the fact that the author often explains the whys, and not only the hows, of the various pieces of code he shows us.
It is interesting to note that going from regular expressions to ever more powerful modules is a path followed also by most Perl programmers, and even by the language itself: when Perl starts being applied to a new domain first there are no modules, then low-level ones start appearing, then, as the understanding of the problem grows, easier-to-use modules are written.
Finally I would like to thank the author for following his own advice by including interesting examples and above all for not including anything about retrieving stock-quotes.
Another recommended book on the subject is Network Programming with Perl by Lincoln D. Stein, which covers a wider subject but devotes 50 pages to this topic and is also very good.
Breakdown by chapter:- Introduction to Web Automation (15 pages): an overview of what this book will teach you, how to install Gisle Aas' LWP, some interesting words of caution about the brittleness of screen-scraping code, copyright issues and respect for the servers you are about to hammer, and finally a very simple example that shows the basic process of web automation.
-
Web Basics (16p): describes how to use LWP::Simple, an easy way to do some simple processing.
-
The LWP Class Model (17p): a slightly steeper read, closer to a reference than to a real introduction that lays out the ground work for the good stuff ahead.
-
URLs (10p): another reference chapter, this one will teach you all you can do with URLs using the URI module. Although the chapter is clear and complete it includes little explanation as to why you will need to process URLs and it is not even mentioned in the introduction roadmap.
-
Forms (28p): a complete and easy to read chapter. It includes a long description of HTML form fields that can be used as a reference, 2 fun examples (how to get the number of people living in any city in the US from the Census web site and how to check that your dream vanity plate is available in California) and how to use LWP to upload files to a server. It also describes the limits of the technique. I appreciated a very educative section showing how to go from a list of fields in a form to more and more useful code that queries that form.
-
Simple HTML processing with Regular Expressions (15p): how to extract info from an HTML page using regexps. The chapter starts with short sections about various useful regexp features, then presents excellent advice on troubleshooting them, the limits of the technique and a series of examples. An interesting chapter, but read on for more powerful ways to process HTML. On the down side, I found the discussion of the s and m regexp modifiers a little confusing.
-
HTML processing with Tokens (19p): using a real HTML parser is a better (safer) way to process HTML than regexps. This chapter uses HTML::TokeParser. It starts with a short, reference-type intro, then a detailed example. Another reference section describes the methods an alternate way of using the module, with short examples. This is the kind of reference I find the most useful, it is the simplest way to understand how to use a module.
-
Tokenizing walkthrough (13p) a long Example showing step-by-step how to write a program that extracts data from a web site, using HTML::TokeParser. The explanations are very good, showing _why_ the code is built this way and including alternatives (both good and bad ones). This chapter describes really well the method readers can use to build their code.
-
HTML processing with Trees (16p): even more powerful than an HTML tokenizer: HTML::TreeBuilder (written by the author of the book) builds a tree from the HTML. This chapter starts with a short reference section, then revisits 2 previous examples of extracting information from HTML using HTML::TreeBuilder.
-
Modifying HTML with Trees (17p): More on the power of HTML::TreeBuilder: a reference/howto on the modification functions of HTML::TreeBuilder, with snippets of code for each function I really like HTML::TreeBuilder BTW, it is simple yet powerful.
-
Cookies, Authentication and Advanced Requests (13p): Back to that LWP business... this chapter is simple and to-the-point: how to use cookies, authentication and referer to access even more web-sites. I just found that it lacked a description on how to code a complete session with cookies.
-
Spiders (20p): a long example describing how to build a link-checking spider. It uses most of the techniques previously described in the book, plus some additional ones to deal with redirection and robots.txt files.
-
Appendices
I think the Appendices are actually the weakest part of the book, most of them are not really useful, apart from the ASCII table (every computer book should have an ASCII table IMHO ;--).
- A. LWP modules (4p): the list and one line description of all modules in the LWP library, long and impressive! But not very useful,
- B. HTTP status (2p): available elsewhere but still pretty useful,
- C. Common MIME types (2p): lists both the usual extension and the MIME type,
- D. Language Tags (2p): the author is a linguist ;--)
- E. Common Content Encodings (2p): character set codes,
- F. ASCII Table (13p): a very complete table, includes the ascii/unicode code, the corresponding HTML entity, description and glyph,
- G. User's View of Object-Oriented Modules (11p): this is a very good idea. A lot of Perl programmers are not very familiar with OO, and in truth they don't need to be. They just need the basics of how to create an object in an existing class and call methods on it. I found the text too be sightly confusing though, in fact I believe it is a little too detailed and might confuse the reader.
- Index (8p): I did not think the index was great (code is listed with references to 5 seemingly random pieces of code, type=file, HTML input element is listed twice, with and without the comma...), but this is not the kind of book where the index is the primary way to access the information. The Table of Content is complete and the chapters are focused enough that I have never needed to use the index.
-
Ibiblio Director Paul Jones Answers
Okay, here are answers from Paul Jones, director of ibiblio.org. You asked, and he responded -- and not always as seriously as you'd expect from someone who can ask us to call him "Professor Jones" or "Doctor Jones." But he's really "Just Paul," he says, "even in class." We hope a whole lot of you have a chance to meet Paul in person one day, because he's not only a warm and friendly guy, but one who has done a whole lot of good for Linux -- and for the Internet in general.Paul:
Let me start out with a little overview of sunsite.unc.edu/metalab.unc.edu. Or better yet to point you to our annotated timeline. Then say that ibiblio.org began and has continued to be a way for the University of North Carolina (the original and still the best) to explore information sharing in the context of our missions of education, research and outreach. You folks using and contributing are the outreach part. In particular, we "acquire, discover, preserve, synthesize, and transmit knowledge" with all of your help.We are a joint project of the School of Information and Library Science (there we are involved in digital archives and digital libraries), The School of Journalism and Mass Communication (there we are involved in electronic publishing and multimedia sharing), and the Vice Chancellor for Information Technology.
Except for one and occasionally two full time employees, our entire staff consists of students or in my case part time (as I have faculty responsibilities). So be nice to all of us, we're always learning. No matter what Robin said in the article introducing me, none of this would have happened without some very good people on staff and contributing content.
But that brings us to:
Question of Money
by too_bad
One of the things that people frequently ask about sites like ibiblio.org is "They are great. But how long will they be around?" Do you see this as a concern (esp. after the LWN announcement) and do you have any comments regarding this. Are there any good approaches you suggest (like augmenting free usership with voluntary subscriptions, etc) for such free sites in general?Paul:
We have been very lucky, since our beginning, to have generous and understanding support from The University of North Carolina and from sponsors large and small including Sun, IBM, Red Hat, VA Linux^h^h^h^h^hSoftware, Mandrake, Cisco and others.We also do get some research contracts and grants, but most importantly for us in the past two years has been a large gift from the founders of Red Hat and the Center for the Public Domain.
We have some top secret international funding sources as well. At the moment, we actually have a small endowment that if spent wisely should last several years. It is my hope that we will never have to charge the patrons of our digital archives.
BUT this brings me to my favorite question, which only got a rating of 4:
Donations?
by Anonymous Coward
Where do I send the cheque?Paul:
Send your or your organization's tax-deductible contributions to:Ibiblio.org
Moving on to:
Campus Box 3456
University of North Carolina
Chapel Hill, NC 27599-3456Typical Questions
by suwain_2
I've downloaded my share of things, and find that the 3 Mbps cap on my cable modem is almost always my bottleneck. So my question is fairly simple (albeit broad) -- can you describe your setup a bit, in terms of bandwidth (both what you have for an Internet connection, and how much traffic you actually use), servers, storage (I'd venture to guess it's to the tune of several terabytes?), etc.Paul:
We're on UNC's network. Our connections to the commodity and Internet2 networks are served by UNC's OC-48 network connection. We maintain a constant throughput of network traffic outbound in the 160-180Mbits/sec range.Our current main servers were donated by IBM and serve content from a central fileserver with 2TB of disk attached. In our racks, we have approximately 5TB of space (with system disks, Sourceforge and an Internet2/Distributed Storage Initiative node). We do some load balancing between streaming services, web services, and large downloads like distros. On a typical day, we move over 1.5 terabytes of data off our servers. (Thanks to Fred Stutzman for much of this info.)
Backups
by Chris Pimlott
What's your backup strategy? I imagine it's hard to deal with both so much data as well as being under constant bombardment from clients around the world. How often is data archived? Have you had any major data loss incidents and, if so, how well were you able to deal with them?Paul:
Like everyone else we rely on Archive.org, but seriously... (Fred answers this since he did the restore).We run managed backups on UNC's enterprise storage facilities. We run them every night and have incremental backups for three months. UNC uses StorageTek machines and Tivoli Distributed Storage Manager for enterprise backups. We have had major data loss incidents, in which a raid card failed and lost the array's configuration. One of the disks in the array died simultaneously, we were unable to re-import the configuration to the new card, so we had to restore from backup, which took a number of days.
I, Paul, can only say that in the past things were much worse and we did have one famous meltdown in 1995 that was not pretty. Since then the UNC enterprise backup has been our friend - and for the most part disks and RAID arrays have been increasingly more reliable.What's your biggest area?
by Otter
I know ibiblio (I still think of it as SunSite) as a) a repository of Unix software, especially useful for pre-Freshmeat apps and b) a mirror provider. "Free online publisher" wouldn't have made the list, but looking at your main page I see all sorts of things I didn't realize you hosted. Which ones get the most traffic?Paul:
For sheer bytes, ISOs rule. But then it doesn't take too many downloads to get a lot of bytes for an ISO. Source-based distros like Gentoo have seen a lot of activity lately.One of our most visited sites is also one of our oldest, Nicholas Pioch's WebMuseum (originally WebLouvre). An amusing reason may be that, as Nicolas writes:
"I've just found out that Microsoft Encarta Deluxe 2001 (the copy I just happened to find out and install) has direct links ('Web Links') from each artist's article to the webmuseum (on metalab.unc.edu at the time) and that's actually the only weblink provided in that 2001 edition."
Among other favorites are:- The Linux Documention Project, which began on sunsite
- Documenting the American South
- Hong Kong Picture Archive
- Henriette's Herbal Homepage
- Hyperwar A hypertext history of the Second World War
What about content producers?
by Fluid Donkey
In general how supportive have you found the producers of such content to be of your services? Do many if any really believe that something like this will cause them to starve to death?Paul:
First, they are all with us voluntarily and can leave any time, taking their stuff with them. That alone pretty much says that they believe in what we are helping them do.I should say also that not all material is copyleft. But all of it is free to view, listen to and to reference. We are working with Creative Commons, which we also host, to develop a small but viable set of licenses for folks including our contributors who want to share their work on various terms (attribution, home or personal use, educational use, etc).
One important contributor, Roger McGuinn, has been making one folk song a month available for download since November 1995 on his Folk Den. He also sells CDs and performs concerts. He seems to be doing pretty well. Many contributors are scholars or students who understand the importance of sharing information.
Dave Farley, who does the wonderful Dr Fun, has a book contract with Plan 9, and we're looking forward to seeing what we've seen in electrons in print.
Relative importance of different material?
by kafka93
What is the center's view on the publishing of material that might be considered "offensive" or "dangerous", and does the center make subjective judgements upon the importance of one piece of intellectual property over another on the basis of 'artistic worth', 'decency', etc.? With only limited resources available to promote the archiving of data, is there the risk that important fringe documents may be left by the wayside, or ignored due to political/social concerns?Paul:
Like non-digital archives and libraries, we have a Collection Policy. You'll note that we do not explicitly ban materials for content nor do we plan to. We do not maintain materials that are illegal, slanderous, libelous, or otherwise prohibited by law. Ultimately the contributors are responsible for their content and we do not review the content once a project is taken on.Most rejections of content come about because the content is too commercial, just personal, or relies on advertising.
Metadata and easy searching
by RyanMuldoon
iBiblio stands out as an excellent repository for a wide range of culturally valuable resources. As it and other sites grow in size, the importance of good searching and indexing becomes extremely relevant. Have you given any thought to how you might want to cope with this? Specifically, are there any metadata schemata that you are considering using? I would love to see iBiblio be used more like a content feed to research/cross-referencing applications.Paul:
Interesting that you asked about this as this is an area that we've been working in for the past couple of years. Actually we go way back to pre-Web metadata to the Internet Anonymous FTP Archive (IAFA) files which were the model for the Linux Software Map (LSM). Thanks to Jonathan Magid for this innovation and for suggesting that we host Linux in the very beginning.When we designed our contributor-maintained Collection Index, we designed it to create and display metadata that could be shared via the Open Archives Initiative (OAI). Please note that this metadata is at the collection level - not at the item level. Item level metadata is for future work. Also since you asked: Miles Efron and I will be presenting a paper at the Digital Resource in the Humanities conference in September on the Problem of Access in Contributor-Run Digital Libraries. Serena Fenton is co-author to this paper.
On the Linux Documentation Project front, we worked with several others to create the Open Source Metadata Framework (OMF).
The OMF aims to collect data about Open Source documentation, or metadata, that will be used to describe the documentation. The idea is that the OMF will act as a sophisticated card catalog type of system for the numerous Open Source documentation projects that exist. The OMF offers a number of advantages over standard card catalog type systems, however. Chief among these is the fact that the OMF has been designed from the ground up to be completely open, standards based, and sharable. We will accomplish this by using pre-defined standards (XML and the Dublin Core description for metadata) and allowing all metadata generated to be accessed by anyone that wants it. Because the metadata itself is to be stored in XML files, anyone should be able to use it.
OMF support is included in the Scrollkeeper project. Note that none of these metadata designs are overly complex. That is by design. The idea is to keep the metadata simple enough to be understood by the creator of the digital item or collection that it describes. If I could make one strong point about metadata design it is that simplicity is the key - and the hardest thing to pull off.
Trust metric and online publishing
by Creosote
I heard you talk at the Southern Presses conference last year about the use of trust metrics (like Slashdot's karma and Advogato's peer certification) as a possible alternative to the "top-down" means of filtering that scholarly and commercial publishers use, namely formal peer review and mass marketing, respectively. Are you more or less optimistic about the long-term viability of this model then you were then? (Especially in light of the powerful efforts to keep control of the gates we're seeing these days from Hollywood, the recording industry, and their political allies...)Paul:
Beginning here I am speaking personally and not on behalf of ibiblio.org or any of its sponsors or supporters including but not limited to the University of North Carolina.The Blog is one example of creator-empowerment that has gotten more attention since that talk and I think there will be plenty more examples to come. I still believe that people in constant communications will result in "Smart Mobs" (thank you, Howard Rheingold, for naming and noticing and writing on this). This is not just about music or movies or about one country or even one age group. While I don't think that we will completely replace our reliance, however reluctant, on Mickey Mouse, I do think that we are entering a time in which there are new opportunities for us to share information and to work together. The slew of misguided efforts by media and information cartels, especially the RIAA, which demonize their customers and clients, will make things tough but they also are signs that the old solutions are not working well and that newer, and I hope more inclusive and more open, solutions are on the horizon.
GeekPAC and "When Congress Attacks"
by lunenburg
I noticed that you are one of the founders of the American Open Technology Consortium and/or GeekPAC - the lobbying group that got a bit of fanfare a few months back when it was formed, but has been pretty quiet since then. With Congress launching seemingly daily attacks on our technological freedom in order to support the revenue models of a few huge businesses, the need for a voice in Washington is growing urgent. Is the AOTC/GeekPAC working to get our voices heard? Is there a need for an umbrella group to tie together various groups like GeekPAC, Public Knowledge, Digital Consumer, etc.?Paul:
Yes, (again speaking only as Paul) I am an officer of the American Open Technology Consortium (AOTC). But for various complex reasons, I am not a member of GeekPAC. As you might have guessed, getting these projects going has been no simple matter. Jeff Gerhard has been doing a wonderful job of making sure the legal and procedural steps are properly taken. So far, what you are seeing is some very motivated but very busy people learning how to work together to get the projects off the ground. The good news is that folks like Jeff, Doc Searles and others on the boards are smart, dedicated and experienced people who can and will play well with others (including Public Knowledge and Digital Consumer and EFF). We hope to represent slightly different voices than those already represented. If you are reading this, you know who you are and we need your help.About the umbrella group, I think that a summit conference (or at least a summit listserv) would make more sense. This kind of looser structure, often called an Action Committee or Organizing Committee, has been very successfully used by both ends of the political spectrum in the past half century.
Two words...
by Anonymous Coward
DRM? Palladium?What's your take on these two technologies?
Are you afraid they'll ultimately destroy what you have been working for, for the past 10 years? If not, why?
Optional question: What about the copyright extension we have seen?
Another optional question: Linux... or BSD? =)
Paul:
Not Linux vs BSD, but Digital Rights Management and Microsoft's Palladium. DMR is the general term for the groups of solutions to the need for creators to be compensated for their work while allowing their audience to easily access those works. Or at least that would be ideally what DRM should do.When DRM goes wrong, it tramples on the rights of the citizens to have access to information that they have legally purchased, want to criticize, parody, legally reuse or share.
When DRM goes wrong, it creates barriers to innovation and creativity. It biases access and reproduction of information to only certain technologies.
When DRM goes wrong, it creates and perpetrates closed markets and monopolies.
When DRM goes wrong, everyone suffers. It takes us back to the Stationers Guild, a response to the printing press. "The Stationers Guild obtained monopoly rights in the printing and probably distribution of all books, a monopoly codified by the Tudors in a licensing system aimed at censoring religious dissent" which lasted until the early 1700s.
When DRM goes wrong, it is called Palladium.
The good news is that Palladium is vaporware - so far.
What is your greatest success/failure?
by burgburgburg
Simple enough question in two parts:Looking back on 10 years of doing this, what would classify as your greatest success, and your greatest failure?
Paul:
The simplest question is the hardest, of course. Luckily, you've narrowed the success/failure question to deal only with sunsite/metalab/ibiblio and not the past 10 years of my life.One mark of great success is that we are still here hosting some of the original collections of information to be shared on the Net including the first 7/24 radio simulcast on the net, WXYC. We've been a part of many innovations and I, personally, have been able to work with some brilliant folks who often surprised themselves with what they had accomplished. We're also funded and we enjoy support from some wonderful and diverse faculties at UNC.
There is no question in my mind that the most significant decision that I made in those ten years was to listen to Jonathan Magid when he suggested that we become the US site for an operating system that didn't even work yet - Linux. If you are reading this far and are happy, you owe Jonathan. If you are unhappy, blame me.
In research, there is no such thing as failure. As I was explaining to our Interim Vice Chancellor, we are supposed to make mistakes. As Ms. Frizzle says, "Take chances, get messy and EXPLORE! Wahoo!".
Still, I do wish that we had found a way to use WAIS or another distributed search engine in a way that is still useful. There still seems to me to be something unfinished in that area. Killing gopher. That was more fun than Wack-a-mole.
And one final answer:
Slack.
by dsb3
You host a slew of subgenius content, so it must be asked ... do you have slack?Paul:
While I do not profess to completely comprehend slack, I have been assured by members of the Church that I do have it. -
XHTML 2.0 Working Draft
Rytsarsky writes: "W3C has released the first public working draft of XHTML 2.0. 'XHTML 2 is a markup language intended for rich, portable web-based applications. While the ancestry of XHTML 2 comes from HTML 4, XHTML 1.0, and XHTML 1.1, it is not intended to be backward compatible with its earlier versions.' Some notable changes are the introduction of navigational lists (<nl>), sectional hierarchy with <section>, and the long-awaited deprecation of <br> in favor of <line>." -
XHTML 2.0 Working Draft
Rytsarsky writes: "W3C has released the first public working draft of XHTML 2.0. 'XHTML 2 is a markup language intended for rich, portable web-based applications. While the ancestry of XHTML 2 comes from HTML 4, XHTML 1.0, and XHTML 1.1, it is not intended to be backward compatible with its earlier versions.' Some notable changes are the introduction of navigational lists (<nl>), sectional hierarchy with <section>, and the long-awaited deprecation of <br> in favor of <line>." -
XHTML 2.0 Working Draft
Rytsarsky writes: "W3C has released the first public working draft of XHTML 2.0. 'XHTML 2 is a markup language intended for rich, portable web-based applications. While the ancestry of XHTML 2 comes from HTML 4, XHTML 1.0, and XHTML 1.1, it is not intended to be backward compatible with its earlier versions.' Some notable changes are the introduction of navigational lists (<nl>), sectional hierarchy with <section>, and the long-awaited deprecation of <br> in favor of <line>." -
XHTML 2.0 Working Draft
Rytsarsky writes: "W3C has released the first public working draft of XHTML 2.0. 'XHTML 2 is a markup language intended for rich, portable web-based applications. While the ancestry of XHTML 2 comes from HTML 4, XHTML 1.0, and XHTML 1.1, it is not intended to be backward compatible with its earlier versions.' Some notable changes are the introduction of navigational lists (<nl>), sectional hierarchy with <section>, and the long-awaited deprecation of <br> in favor of <line>." -
XHTML 2.0 Working Draft
Rytsarsky writes: "W3C has released the first public working draft of XHTML 2.0. 'XHTML 2 is a markup language intended for rich, portable web-based applications. While the ancestry of XHTML 2 comes from HTML 4, XHTML 1.0, and XHTML 1.1, it is not intended to be backward compatible with its earlier versions.' Some notable changes are the introduction of navigational lists (<nl>), sectional hierarchy with <section>, and the long-awaited deprecation of <br> in favor of <line>." -
XHTML 2.0 Working Draft
Rytsarsky writes: "W3C has released the first public working draft of XHTML 2.0. 'XHTML 2 is a markup language intended for rich, portable web-based applications. While the ancestry of XHTML 2 comes from HTML 4, XHTML 1.0, and XHTML 1.1, it is not intended to be backward compatible with its earlier versions.' Some notable changes are the introduction of navigational lists (<nl>), sectional hierarchy with <section>, and the long-awaited deprecation of <br> in favor of <line>." -
W3C Ponders RAND Again
simonstl writes "Three unnamed W3C participants have suggested a new RAND policy that would let the W3C into the business of charging royalties for patent-encumbered specs. No consensus yet, but they sure seem to keep trying." -
Return of the WaSP
No_Weak_Heart writes "After a brief hiatus, the Web Standards Project (WaSP) has returned. Here's the story at Wired about this grassroots coalition which works to promote the adoption of web standards by authors, tool makers and in browsers. In a related vein, the Boston Globe has a comfy chat with Tim Berners-Lee, the guiding force behind many of those standards." -
Return of the WaSP
No_Weak_Heart writes "After a brief hiatus, the Web Standards Project (WaSP) has returned. Here's the story at Wired about this grassroots coalition which works to promote the adoption of web standards by authors, tool makers and in browsers. In a related vein, the Boston Globe has a comfy chat with Tim Berners-Lee, the guiding force behind many of those standards." -
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
XML Namespaces and How They Affect XPath and XSLT
Dare Obasanjo writes: "XML namespaces are an integral aspect of most of the W3C's XML recommendations and working drafts, including XPath, XML Schema, XSLT, XQuery, SOAP, RDF, DOM, and XHTML. Understanding how namespaces work and how they interact with a number of other W3C technologies that are dependent on them is important for anyone working with XML to any significant degree." Some heavy reading below, as Dare completes the thought.This article explores the ins and outs of XML namespaces and their ramifications on a number of XML technologies that support namespaces. What follows is a shortened version of my first Extreme XML column.
Overview of XML NamespacesAs XML usage on the Internet became more widespread, the benefits of being able to create markup vocabularies that could be combined and reused similarly to how software modules are combined and reused became increasingly important. If a well defined markup vocabulary for describing coin collections, program configuration files, or fast food restaurant menus already existed, then reusing it made more sense than designing one from scratch. Combining multiple existing vocabularies to create new vocabularies whose whole was greater than the sum of its parts also became a feature that users of XML began to require.
However, the likelihood of identical markup, specifically XML elements and attributes, from different vocabularies with different semantics ending up in the same document became a problem. The very extensibility of XML and the fact that its usage had already become widespread across the Internet precluded simply specifying reserved elements or attribute names as the solution to this problem.
The goal of the W3C XML namespaces recommendation was to create a mechanism in which elements and attributes within an XML document that were from different markup vocabularies could be unambiguously identified and combined without processing problems ensuing. The XML namespaces recommendation provided a method for partitioning various items within an XML document based on processing requirements without placing undue restrictions on how these items should be named. For instance, elements named
<template>,<output>, and<stylesheet>can occur in an XSLT stylesheet without there being ambiguity as to whether they are transformation directives or potential output of the transformation.An XML namespace is a collection of names, identified by a Uniform Resource Identifier (URI) reference, which are used in XML documents as element and attribute names.
Namespace DeclarationsA namespace declaration is typically used to map a namespace URI to a specific prefix. The scope of the prefix-namespace mapping is that of the element that the namespace declaration occurs on as well as all its children. An attribute declaration that begins with the prefix
xmlns:is a namespace declaration. The value of such an attribute declaration should be a namespace URI which is the namespace name.Here is an example of an XML document where the root element contains a namespace declaration that maps the prefix
<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreand its child element contains aninventoryelement that contains a namespace declaration that maps the prefixinvto the namespace nameurn:xmlns:25hoursaday-com:inventory-tracking.
<bk:book>
<bk:title>Lord of the Rings</bk:title>
<bk:author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</bk:book>
</bk:bookstore>
In the above example, the scope of the namespace declaration for the
urn:xmlns:25hoursaday-com:bookstorenamespace name is the entirebk:bookstoreelement, while that of theurn:xmlns:25hoursaday-com:inventory-trackingis theinv:inventoryelement. Namespace aware processors can process items from both namespaces independently of each other, which leads to the ability to do multi-layered processing of XML documents. For instance, RDDL documents are valid XHTML documents that can be rendered by a Web browser but also contain information using elements from thehttp://www.rddl.orgnamespace that can be used to locate machine readable resources about the members of an XML namespace.It should be noted that by definition the prefix
Default Namespacesxmlis bound to the XML namespace name and this special namespace is automatically predeclared with document scope in every well-formed XML document.The previous section on namespace declarations is not entirely complete because it leaves out default namespaces. A default namespace declaration is an attribute declaration that has the name
xmlnsand its value is the namespace URI that is the namespace name.A default namespace declaration specifies that every unprefixed element name in its scope be from the declaring namespace. Below is the bookstore example utilizing a default namespace instead of a prefix-namespace mapping.
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">
<book>
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
All the elements in the above example except for the
inv:inventoryelement belong to theurn:xmlns:25hoursaday-com:bookstorenamespace. The primary purpose of default namespaces is to reduce the verbosity of XML documents that utilize namespaces. However, using default namespaces instead of utilizing explicitly mapped prefixes for element names can be confusing because it is not obvious that the elements in the document are namespace scoped.Also, unlike regular namespace declarations, default namespace declarations can be undeclared by setting the value of the xmlns attribute to the empty string. Undeclaring default namespace declarations is a practice that should be avoided because it may lead to a document that has unprefixed names that belong to a namespace in one part of the document, but don't in another. For example, in the document below only the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookstoreelement is from theurn:xmlns:25hoursaday-com:bookstorewhile the other unprefixed elements have no namespace name.
<book xmlns="">
<title>Lord of the Rings</bk:title>
<author>J.R.R. Tolkien</bk:author>
<inv:inventory status="in-stock" isbn="0345340426"
xmlns:inv="urn:xmlns:25hoursaday-com:inventory-tracking" />
</book>
</bookstore>
This practice should be avoided because it leads to extremely confusing situations for readers of the XML document. For more information on undeclaring namespace declarations, see the section on Namespaces Future.
Qualified and Expanded NamesA qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character. The XML names used as the prefix and the local name must match the NCName production, which means that they must not contain a colon character. The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.
Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType id="123" name="fooType"/>
</xs:schema>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType id="123" name="fooType"/>
</xsd:schema>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
<complexType id="123" name="fooType"/>
</schema>
The W3C XML Path Language recommendation describes an expanded name as a pair consisting of a namespace name and a local name. A universal name is an alternate term coined by James Clark to describe the same concept. A universal name consists of a namespace name in curly braces and a local name. Namespaces tend to make more sense to people when viewed through the lens of universal names. Here are the three XML documents from the previous example with the QNames replaced by universal names. Note that the syntax below is not valid XML syntax.
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}schema>
<{http://www.w3.org/2001/XMLSchema}complexType id="123" name="fooType"/>
</{http://www.w3.org/2001/XMLSchema}schema>
To many XML applications, the universal name of the elements and attributes in an XML document are what is important, and not the values of the prefixes used in specific QNames. The primary reason the Namespaces in XML recommendation does not take the expanded name approach to specifying namespaces is due to its verbosity. Instead, prefix mappings and default namespaces are provided to save us all from developing carpal tunnel syndrome from typing namespace URIs endlessly.
Namespaces and AttributesNamespace declarations do not apply to attributes unless the attribute's name is prefixed. In the XML document shown below the
titleattribute belongs to thebk:bookelement and has no namespace while thebk:titleattribute hasurn:xmlns:25hoursaday-com:bookstoreas its namespace name. Note that even though both attributes have the same local name the document is well formed.<bk:bookstore xmlns:bk="urn:xmlns:25hoursaday-com:bookstore"></bk:bookstore>
<bk:book title="Lord of the Rings, Book 3" bk:title="Return of the King"/>
In the following example, the
titleattribute still has no namespace and belongs thebookelement even though there is a default namespace specified. In other words, attributes cannot inherit the default namespace.<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore"></bookstore>
<book title="Lord of the Rings, Book 3" />
Namespace URIsA namespace name is a Uniform Resource Identifier (URI) as specified in RFC 2396. A URI is either a Uniform Resource Locators (URLs) or a Uniform Resource Names (URNs). URLs are used to specify the location of resources on the Internet, while URNs are supposed to be persistent, location-independent identifiers for information resources. Namespace names are considered to be identical only if they are the same character for character (case-sensitive). The primary justification for using URIs as namespace names is that they already provide a mechanism for specifying globally unique identities.
The XML namespaces recommendation states that namespace names are only to act as unique identifiers and do not have to actually identify network retrievable resources. This has led to much confusion amongst authors and users of XML documents, especially since the usage of HTTP based URLs as namespace names has grown in popularity. Because many applications convert such URIs to hyperlinks, it is irritating to many users that these "links" do not lead to Web pages or other network retrievable resource. I remember one user who likened it to being given a fake phone number in a social situation.
One solution to avoid confusing users is to use a namespace-naming schema that does not imply network retrievability of the resource. I personally use the
urn:xmlns:scheme for this purpose and create namespace names similar tourn:xmlns:25hoursaday-comwhen authoring XML documents for personal use. The problem with homegrown namespace URIs is that they may run counter to the intent of the Names in XML recommendation by not being globally unique. I get around the globally unique requirement by using my personal domain name http://www.25hoursaday.com as part of the namespace URI.Another solution is to leave a network retrievable resource at the URI that is the namespace name, such as is done with the XSLT and RDDL namespaces. Typically, such URIs are actually HTTP URLs. A good way to name such URLs is by using the format favored by the W3C, which is as follows:
http://my.domain.example.org/product/[year/month][/area]
See the section on Namespaces and Versioning for more information on using similarly structured namespace names as a versioning mechanism.
DOM, XPath, and the XML Information Set on NamespacesThe W3C has defined a number of technologies that provide a data model for XML documents. These data models are generally in agreement, but sometimes differ in how they treat various edge cases due to historic reasons. Treatment of XML namespaces and namespace declarations is an example of an edge case that is treated differently in the three primary data models that exist as W3C recommendations. The three data models are the XPath data model, the Document Object Model (DOM), and the XML information set.
The XML information set (XML infoset) is an abstract description of the data in an XML document and can be considered to be the primary data model for an XML document. The XPath data model is a tree-based model that is traversed when querying an XML document and is similar to the XML information set. The DOM precedes both data models but is also similar to both data models in a number of ways. Both the DOM and the XPath data model can be considered to be interpretations of the XML infoset.
Namespaces in the Document Object Model (DOM)The XML namespace section of the DOM Level 3 specification considers namespace declarations to be regular attribute nodes that have http://www.w3.org/2000/xmlns/ as their namespace name and
xmlnsas their prefix or qualified name.Elements and attributes in the DOM have a namespace name that cannot be altered after they have been created regardless of whether their location within the document changes or not.
Namespaces in the XPath Data ModelThe W3C XPath recommendation does not consider namespace declarations to be attribute nodes and does not provide access to them in that capacity. Instead, in XPath every element in an XML document has a number of namespace nodes that can be retrieved using the XPath namespace navigation axis.
Each element in the document has a unique set of namespace nodes for each namespace declaration in scope for that particular element. Namespace nodes are unique to each element in that namespace. Thus namespace nodes for two different elements that represent the same namespace declaration are not identical.
Namespaces in the XML Information SetThe XML infoset recommendation considers namespace declarations to be attribute information items.
In addition, similar to the XPath data model, each element information item in an XML document's information set has a namespace information item for each namespace that is in scope for the element.
XPath, XSLT and NamespacesThe W3C XML Path Language also known as XPath is used to address parts of an XML document and is used in a number of W3C XML technologies including XSLT, XPointer, XML Schema, and DOM Level 3. XPath uses a hierarchical addressing mechanism similar to that used in file systems and URLs to retrieve pieces of an XML document. XPath supports rudimentary manipulation of strings, numbers, and Booleans.
XPath and NamespacesThe XPath data model treats an XML document as a tree of nodes, such as element, attribute, and text nodes, where the name of each node is a combination of its local name and its namespace name (that is, its universal or expanded name).
For element and attribute nodes without namespaces, performing XPath queries is fairly straightforward. The following program, which can be used to query XML documents using the command line, shall be used to demonstrate the impact of namespaces on XPath queries.
using System.Xml.XPath;
using System.Xml;
using System;
using System.IO;
class XPathQuery{
public static string PrintError(Exception e, string errStr){
if(e == null)
return errStr;
else
return PrintError(e.InnerException, errStr + e.Message );
}
public static void Main(string[] args){
if((args.Length == 0) || (args.Length % 2)!= 0){
Console.WriteLine("Usage: xpathquery source query <zero or more
prefix and namespace pairs>");
return;
}
try{
//Load the file.
XmlDocument doc = new XmlDocument();
doc.Load(args[0]);
//create prefix<->namespace mappings (if any)
XmlNamespaceManager nsMgr = new XmlNamespaceManager(doc.NameTable);
for(int i=2; i < args.Length; i+= 2)
nsMgr.AddNamespace(args[i], args[i + 1]);
//Query the document
XmlNodeList nodes = doc.SelectNodes(args[1], nsMgr);
//print output
foreach(XmlNode node in nodes)
Console.WriteLine(node.OuterXml + "\n\n");
}catch(XmlException xmle){
Console.WriteLine("ERROR: XML Parse error occured because " +
PrintError(xmle, null));
}catch(FileNotFoundException fnfe){
Console.WriteLine("ERROR: " + PrintError(fnfe, null));
}catch(XPathException xpath){
Console.WriteLine("ERROR: The following error occured while querying
the document: "
+ PrintError(xpath, null));
}catch(Exception e){
Console.WriteLine("UNEXPECTED ERROR" + PrintError(e, null));
}
}
}
Given the following XML document that does not declare any namespaces, queries are fairly straightforward as seen in the examples following the code.
<?xml version="1.0" encoding="utf-8" ?>
<bookstore>
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<book genre="novel">
<title>The Confidence Man</title>
<author>
<first-name>Herman</first-name>
<last-name>Melville</last-name>
</author>
<price>11.99</price>
</book>
</bookstore>
Example 1-
xpathquery.exe bookstore.xml /bookstore/book/titleSelects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns:
<title>The Autobiography of Benjamin Franklin</title>
<title>The Confidence Man</title> -
xpathquery.exe bookstore.xml //@genreSelect all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman" and returns:
<title>The Confidence Man</title>
However, once namespaces are added to the mix, things are no longer as simple. The file below is identical to the original file except for the addition of namespaces and one attribute to one of the
<bookstore xmlns="urn:xmlns:25hoursaday-com:bookstore">bookelements.
<book genre="autobiography">
<title>The Autobiography of Benjamin Franklin</title>
<author>
<first-name>Benjamin</first-name>
<last-name>Franklin</last-name>
</author>
<price>8.99</price>
</book>
<bk:book genre="novel" bk:genre="fiction"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">
<bk:title>The Confidence Man</bk:title>
<bk:author>
<bk:first-name>Herman</bk:first-name>
<bk:last-name>Melville</bk:last-name>
</bk:author>
<bk:price>11.99</bk:price>
</bk:book>
</bookstore>
Note that the default namespace is in scope for the whole XML document, while the namespace declaration that maps the prefix
bkto the namespace nameurn:xmlns:25hoursaday-com:bookstoreis in scope for the second book element only.
-
xpathquery.exe bookstore.xml /bookstore/book/title
Selects all the title elements that are children of the
bookelement whose parent is thebookstoreelement, which returns NO RESULTS. -
xpathquery.exe bookstore.xml //@genreSelects all the
genreattributes in the document and returns:
genre="autobiography"
genre="novel" -
xpathquery.exe bookstore.xml //title[(../author/first-name = 'Herman')]Selects all the titles where the author's first name is "Herman," which returns NO RESULTS.
The first query returns no results because unprefixed names in an XPath query apply to elements or attributes with no namespace. There are no
bookstore,book, ortitleelements in the target document that have no namespace. The second query returns all attribute nodes that have no namespace. Although namespace declarations are in scope for both attribute nodes returned by the query, they have no namespace because namespace declarations do not apply to attributes with unprefixed names. The third query returns no results for the same reasons the first query returns no results.The way to perform namespace-aware XPath queries is to provide a prefix to namespace mapping to the XPath engine, then use those prefixes in the query. The prefixes provided do not need to be the same as the namespace to prefix mappings in the target document, and they must be non-empty prefixes.
-
xpathquery.exe bookstore.xml /b:bookstore/b:book/b:title b urn:xmlns:25hoursaday-com:bookstoreSelect all the title elements that are children of the
bookelement whose parent is thebookstoreelement and returns the following:
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of Benjamin Franklin</title>
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title> -
xpathquery.exe bookstore.xml //@b:genre b urn:xmlns:25hoursaday-com:bookstoreSelects all thegenreattributes from the "urn:xmlns:25hoursaday-com:bookstore" namespace in the document that returns:
bk:genre="fiction" -
xpathquery.exe bookstore.xml //bk:title[(../bk:author/bk:first-name = 'Herman')] bk urn:xmlns:25hoursaday-com:bookstore
Selects all the titles where the author's first name is "Herman" and returns:
<bk:title xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence Man</bk:title>
Note This last example is the same as the previous examples but rewritten to be namespace aware.
For more information on using XPath, read Aaron Skonnard's article Addressing Infosets with XPath and view the examples at the ZVON.org XPath tutorial.
XSLT and NamespacesThe W3C XSL transformations (XSLT) recommendation describes an XML-based language for transforming XML documents into other XML documents. XSLT transformations, also known as XML style sheets, utilize patterns (XPath) to match aspects of the target document. Upon matching nodes in the target document, templates that specify the output of a successful match can be instantiated and used to transform the document.
Support for namespaces is tightly integrated into XSLT, especially since XPath is used for matching nodes in the source document. Using namespaces in your XPath expressions inside XSLT is much easier than using the DOM.
The example that follows contains:
- A program for use in executing transforms from the command line.
- An XSLT stylesheet that prints
all the
titleelements from theurn:xmlns:25hoursaday-com:bookstorenamespace in the source XML document when run against thebookstoredocument from theurn:xmlns:25hoursaday-com:bookstorenamespace. - The resulting output.
Imports System.Xml
Imports System
Imports System.IO
Class Transformer
Public Shared Function PrintError(e As Exception, errStr As String) As String
If e Is Nothing Then
Return errStr
Else
Return PrintError(e.InnerException, errStr + e.Message)
End If
End Function 'PrintError
'Entry point which delegates to C-style main Private Function
Public Overloads Shared Sub Main()
Run(System.Environment.GetCommandLineArgs())
End Sub 'Main
Overloads Public Shared Sub Run(args() As String)
If args.Length <> 2 Then
Console.WriteLine("Usage: xslt source stylesheet")
Return
End If
Try
'Create the XslTransform object.
Dim xslt As New XslTransform()
'Load the stylesheet.
xslt.Load(args(1))
'Transform the file.
Dim doc As New XmlDocument()
doc.Load(args(0))
xslt.Transform(doc, Nothing, Console.Out)
Catch xmle As XmlException
Console.WriteLine(("ERROR: XML Parse error occured because " +
PrintError(xmle, Nothing)))
Catch fnfe As FileNotFoundException
Console.WriteLine(("ERROR: " + PrintError(fnfe, Nothing)))
Catch xslte As XsltException
Console.WriteLine(("ERROR: The following error occured while
transforming the document: " + PrintError(xslte, Nothing)))
Catch e As Exception
Console.WriteLine(("UNEXPECTED ERROR" + PrintError(e, Nothing)))
End Try
End Sub
End Class 'Transformer
XSLT stylesheet <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<xsl:template match="b:bookstore">
<book-titles>
<xsl:apply-templates select="b:book/b:title"/>
</book-titles>
</xsl:template>
<xsl:template match="b:title">
<xsl:copy-of select="." />
</xsl:template>
</xsl:stylesheet>
Output <?xml version="1.0" ?>
<book-titles xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:ext="urn:my_extensions" xmlns:b="urn:xmlns:25hoursaday-com:bookstore">
<title xmlns="urn:xmlns:25hoursaday-com:bookstore">The Autobiography of
Benjamin Franklin</title>
<bk:title xmlns="urn:xmlns:25hoursaday-com:bookstore"
xmlns:bk="urn:xmlns:25hoursaday-com:bookstore">The Confidence
Man</bk:title>
</book-titles>
Note that the namespace declarations from the stylesheet end up on the root node of the output XML document. Also to note is the fact that the XSLT namespace is not included in the output XML document.
Generating XSLT stylesheets from the output of your XSLT transforms is slightly cumbersome because the processor has to be able to determine the output elements from the actual stylesheet directives. There are two ways I have found to deal with this issue, both of which I'll illustrate by showing stylesheets that generate the following XMLT stylesheet as output.
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
The first method involves creating a variable containing the stylesheet to be created, and then using
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">value-ofin combination with thedisable-output-escapingattribute to create the stylesheet.
<xsl:output method="xml" encoding="utf-8"/>
<xsl:variable name="stylesheet">
<xslt:stylesheet version="1.0"
xmlns:xslt="http://www.w3.org/1999/XSL/Transform">
<xslt:output method="text"/>
<xslt:template match="/"><xslt:text>HELLO
WORLD</xslt:text></xslt:template>
</xslt:stylesheet>
</xsl:variable>
<xsl:template match="/">
<xsl:value-of select="$stylesheet" disable-output-escaping="yes" />
</xsl:template>
</xsl:stylesheet>
This first method works best if the stylesheet being created can be easily partitioned so that it can be placed in variables. While this technique is quick and easy, it also falls into the category of gross hack, which typically tend to become unmanageable when faced with any situation requiring flexibility. For instance, when creation of the new stylesheet involves lots of dynamic creation of text and is intertwined with the stylesheet directives, the following method is preferable to the aforementioned gross hack.
<xslt:stylesheet version="1.0" xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
xmlns:alias="http://www.w3.org/1999/XSL/Transform-alias">
<xslt:output method="xml" encoding="utf-8"/>
<xslt:namespace-alias stylesheet-prefix="alias" result-prefix="xslt"/>
<xslt:template match="/">
<alias:stylesheet version="1.0">
<alias:output method="text"/>
<alias:template match="/"><alias:text>HELLO
WORLD</alias:text></alias:template>
</alias:stylesheet>
</xslt:template>
</xslt:stylesheet>
The above document uses the
namespace-aliasdirective to substitute thealiasprefix and namespace name it is bound to with thexsltprefix and the namespace name to which it is bound.Namespaces are also used to specify mechanisms for the extension of XSLT. Namespace prefixed functions can be created that are executed in the same manner as XSLT functions. Similarly, elements from certain namespaces can be treated as extensions to XSLT and executed as if they were transformation directives like
<stylesheet version="1.0"template,copy,value-of, and so on. Below is an example of a Hello World program that uses namespace-based extension functions to print the signature greeting.
xmlns="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:newfunc="urn:my-newfunc">
<output method="text"/>
<template match="/">
<value-of select="newfunc:SayHello()" />
</template>
<msxsl:script language="JavaScript" implements-prefix="newfunc">
function SayHello() {
return "Hello World";
}
</msxsl:script>
</stylesheet>
XML Namespace CaveatsNamespaces in XML, like any useful tool, can be used improperly and have various subtleties that may cause problems if users are unaware of them. This section focuses on areas where users of XML namespaces typically have problems or face misconceptions.
Versioning and NamespacesThere are two primary mechanisms used in practice to create different versions of an XML instance document. One method is to use a version attribute on the root element as is done in XSLT, while the other method is to use the namespace name of the elements as the versioning mechanism. Versioning based on namespaces is currently very popular, especially with the W3C, who have used this mechanism for various XML technologies including SOAP, XHTML, XML Schema, and RDF. The namespace URI for documents that are versioned using the namespace is typically in the following format:
http://my.domain.example.org/product/[year/month][/area]
The primary problem with versioning XML documents by altering the namespace name in subsequent versions is that it means XML namespace-aware applications that process the documents will no longer work with the documents, and will have to be upgraded. This is primarily beneficial with document formats whose versions change infrequently, but upon changing alter the semantics of elements and attributes, thus requiring that all processors no longer work with the newer versions for fear of misinterpreting them.
On the other hand, there are a number of scenarios where an XML document versioning mechanism based on a version attribute on the root element is sufficient. A version attribute is primarily beneficial when changes in the document's structure are backwards compatible. The following situations are all areas where using a version attribute is a wise choice:
- Semantics of elements and attributes will not be altered.
- Changes to the document involves the addition of elements and attributes, but rarely removal.
- Interoperability between applications with various versions of the processing software is necessary.
Both versioning techniques are not mutually exclusive and can be used simultaneously. For instance, XSLT uses both a version attribute on the root element, as well as a versioned namespace URI. The version attribute is used for incremental, backwards-compatible changes to the XML document's format, while altering the namespace name is done for significant changes in the semantics of the document.
Document TypesThe term document type is misleading as discussed in several philosophical debates on various XML related mailing lists . In many cases, the namespace name of the root element can be used to determine how to process the document, however, this is hardly a general rule and stating it as such violates the spirit of XML namespaces as they were designed exactly so that developers could mix and match XML vocabularies.
A succinct post that captures the essence of why thinking that root element namespace URI are equivalent to a notion of document type is this post by Rick Jelliffe on XML-DEV. The essence of the post is that there are many different types that an XML document could have, including its document type as specified by its Document Type Definition (DTD), its MIME media type, its schema definition as specified by the xsi:schemaLocation attribute, its file extension, as well as the namespace name of its root element. Thus it is quite likely that in many cases a document will have many different types depending on what perspective one decides to take when examining the document.
Two examples of XML documents in which actual document types can be misconstrued by simply looking at the namespace URI of the root element are RDDL documents (sample, notice that its root element is from the XHTML namespace) and annotated mapping schemas, which have their root element is from the W3C XML Schema namespace.
In a nutshell, the type of a document cannot conclusively be determined by looking at the namespace URI of its root element. Thinking otherwise is folly.
Namespaces FutureThere are a number of developments in the XML world focused on tackling some of the issues that have developed around XML namespaces. Firstly, the current draft of the W3C XML namespaces recommendation does not provide a mechanism for undeclaring namespaces that have been mapped to a prefix. The W3C XML namespaces v1.1 working draft is intended to rectify this oversight by providing a mechanism for undeclaring prefix namespace mappings in an instance document.
The debate on what should be returned on an attempt to dereference the contents of a namespace URI has lead to contentious debate in the XML world and is currently the focus of deliberations by the W3C's Technical Architecture Group. The current version of the XML namespaces recommendation does not require the namespace URI to actually be resolvable because a namespace URI is supposed to merely be a namespace name that is used as a unique identifier, and not the location of a resource on the Internet.
Tim Bray (one of the original editors of both the XML Language and XML namespaces recommendations) has written an exhaustive treatise on the issues around namespace URIs and the namespace documents that may or may not be retrieved from them. This document contains much of the reasoning that was behind his creation of the Resource Directory Description Language (RDDL), which is designed to be used for creating namespace documents.
-
-
Slashback: Favoritism, Alternacy, Moo
Slashback with more on handheld everything-boxes, a softer review of the new Sharp Zaurus, raising money for open technologies, Gateway's singing cow, and getting around with alternative root servers -- all below. Enjoy. Update: 04/12 06:41 GMT by T : There's an update below in the part on alternate root servers, too. A double-barrel of Mossberg. Dave Aiello (author of our recent review of Handspring's Treo all-in-one handheld) writes with nice update for anyone thinking of shelling out for one: "Walter Mossberg did a comparative overview of the Handspring, Kyocera, Samsung, and RIM integrated PDAs and phones in the first edition of 'The Mossberg Solution' (a new column he is writing)."Speak of the devil -- Arrgh writes: "PC Magazine has posted a more favourable review (4 out of 5 stars) of the Zaurus--they had none of the sync problems Walt Mossberg wrote about."
Give money to these guys, please. Jeff Gerhardt of the American Open Technology Consortium writes after the post about this "GeekPAC" on Slashdot.
"Although the last 24 hours was one hell of a pain in the ass, at 4:00 am we were through with that second draft and in large measure due to the constructive comments from the /. community. Yes I got a lot of nutty emails about how I should be working on more important issues like global warming and ending "greed" (can you believe that one??? how the hell can we do that.), but for the most part the comments were well thought out. As a whole I think that the whole /. community should be proud.
In particular I have pages of operational suggestions and contact names across the US. The suggestion that has tickled me the most is a suggestion for a fund raising methodology for the "PAC" organization. This came from a couple guys who were debating the idea between the two of them, until it really solidified into a plan. And, we are going to do it. The plan is simple and uses the thing we love so much, technology.
We will set up a series of paypal account links, having created a category for every House or Senate member that appeals to our overall goals and objectives. If then there is a news item about an issue and one of these "good guy" politicos does something to help the cause, the PAC will write a 2-3 sentence quote that will happen to have the paypal link included inside the quote. Media sites will then be able to include the link as a part of the quote, because afterall its news right (wink wink)!!!!
This would then facilitate the people _out there_ to throw a buck at the good guy as a impulse purchase to show gratitude. It need some refinement, but I think it provides portals an opportunity to provide a political opportunity to their communities, without looking too overtly political in the process."
No more Portable Monopoly. Dr.Jones writes "...well, not really. It seems Portable Monopoly is being forced to give up their web address 'Due to legal issues with Hasbro over the usage of the word "monopoly"'. Fortunately, they will have a new site up next week (Triton Labs), and they're still on target to ship the lighting kit next month. Seems like a bit of a stretch on Hasbro's part though."
Not as much of a stretch maybe as Parker Brothers claiming the word clue.com.;)
Do cows wake up and smell the Rosen? prostoalex writes: "Newsfactor has a story on Hillary Rosen expressing dissatisfaction with Gateway's ad campaign. Who would have thought?"
... and routing around it. With a nice detailed followup to a recent Ask Slashdot post, Dr. Zowie writes: "For those who want to use alternative DNS roots but are stuck behind port-80 proxies, a simple solution may exist, thanks to several folks who wrote in to suggest it. Section 5 of RFC 2068 gently deprecates using relative URI's in HTTP requests, and in fact most web clients generate absolute URI's even though relative URI's are allowed by the standard. My ISP's not-quite-transparent proxy directs outbound port 80 packets correctly if (and only if) there's a relative URI in the request. A little 10-line local proxy that munges absolute URI's into relative URI's before emitting them to the ISP seems to solve the problem for now: I can retrieve all the nice goodies that most of you can't at www.dev.null, , www.computer.geek, and paradox.null.
Oh, and if you live near the Colorado front range and aren't a purist about routing, Peak to Peak is a pretty good outfit for dialup and DSL service. Their tech support is extremely accessible and quite good (though our views differ on the correctness of payload-switched routing)."
Update: 04/12 06:41 GMT by T : Richard Sexton writes: "While it's great to see your continued coverage of Open Roots can I just put in a quick plug for ORSC? We're older and have way more tlds.
The coordination amongst Open Roots takes place at IRON; for lack of a better term, it's the Open IANA."
Kissing and making nice. panker writes "Sun had previously given JavaRanch a cease and desist order because of a trademark issue. Sun is now backing down and being friends. Slashdot covered the first half of this issue earlier."
-
SOAP Security Problems
LarryWest42 writes: "This article lists a number of sobering security problems with SOAP (not only the avoidable one of tunneling through HTTP). I found it thanks to Bruce Schneier's latest Crypto-Gram newsletter." -
Sites Wary of Adopting P3P
technogamy writes: "CNN is reporting on the industry's take on P3P, the W3C's Platform for Privacy Preferences.According to the article, the W3C is expected by April to formally adopt P3P -- of course, as many of you are aware, Microsoft's IE6 already includes an implementation of the client side of P3P. 'Because Microsoft's browser checks for P3P, sites risk getting flagged if they don't adopt it.' P3Pizing (or 'pethripizing') a complex site can evolve into a Herculean task...! (See also EPIC's critique of P3P.)" -
Java RMI
Reader amoon writes: "With the rise of XML-based RPC (e.g. SOAP, XML-RPC, APEX), the distributed computing world is starting to really unsettle from the CORBA-RMI-DCOM oligopoly of the 1980s and 1990s. Yet, XML-based RPC is not a panacea (though it is quite cool), especially for those of us involved in the legacy and client-server worlds. Now, what is fascinating: the publishing world is revving up the engines on not only the XML-based RPC stuff, but also the RMI and CORBA stuff -- while rarely applied to the tech industry, the old adage, "what was old is new again," seems to fit well here. This review describes this über-cool trend from the RMI perspective, with a focus on Java RMI (O'Reilly) by William Grosso." Read on for the rest of the review. Java RMI author William Grosso pages 545 publisher O'Reilly rating 8 reviewer amoon ISBN 1-56592-452-5 summary Solid practical insight into the nitty-gritty details of RMI.
The ScoopRemote Method Invocation (RMI) is the object-oriented remote procedure call (ORPC) facility for distributed programming in Java, since the 1.1 days. RMI also served as motivation and a proof-of-concept for jini, javaspaces, and numerous other solid distributed networking technologies. Of course, anyone from the academic distributed programming world knows Wollrath, Waldo, and Riggs.
Yet, despite a myriad of books over the past five years on network programming, RMI always seemed to be the stepchild: relegated to a single chapter (buried on page 496, of course) that always said that RMI was "better" than sockets and "worse" than CORBA. Now, granted that RMI is operationally rather trivial compared with CORBA and was (prior to RMI/IIOP) a unilanguage distributed ORPC technology -- but still. For those of us who have to interoperate with RMI (whether welcome from the Java world or not), the lack of in-depth technical analysis (beyond the spec) has been a hindrance.
Fortunately, this trend is finally starting to buckle with the release of several in-depth RMI books including: Java RMI, Java.rmi, and Mastering RMI: Developing Enterprise Applications in Java and EJB. As evidence of this problem, Grosso states the same in his introduction – and actually pulls it off without sounding self-serving.
I chose Grosso's text because of the cute squirrel (aka the O'Reilly brand), Grosso's recent series of articles on the hashbelt algorithm, and his unadulterated academic knowledge management and mathematics bent. Fortunately, I was rewarded: this animal returns to O'Reilly's pre-bubble quality. Koodoos to both Grosso and his editors (Knudsen, Loukides, and Eckstein) for getting the train back on the track.
What's to LikeBottom line is that Grosso simply covers the topics and does so with solid conceptual and code coherence – even by O'Reilly standards (over 40 animals grace my shelves). His prose and explanatory patterns make it clear that he has actually gotten into the real-world of RMI, and doesn't hesitate to highlight both good and bad parts. You cannot be dozing off when you read this (at least not if you expect to understand it) -- this is written by someone with solid analytic thinking skills and it shows. After too many years of "there are no caveats" journalism and publishing, this is a nice reversion. Further, I can only imagine that his current employment is a testament to his real-world knowledge of RMI.
Grosso hits on a vein which is not well-appreciated: when not smoothed over by marketing people, RMI is actually a mostly-capable ORPC technology. Certainly activation and RMI/IIOP really began to make things interesting, from Java2 and EJB respectively. Discussion of reference-counted distributed garbage collection, a feature missing from CORBA and other popular ORPC standards, also contributes a nice bonus (although Grosso's ardent attempt to debunk the "RMI doesn't scale" argument is rather weak, even going so far as to rehash the definition of Threads and threadpools – this complexity mismatch is an ugly giveaway that a well-intentioned editor went astray).
What sets this text apart is the tight focus on nitty-gritty implementation details of RMI itself. After all, these RMI texts are way too late to the game to reteach how to write "baby RMI" code: 5 years after the original spec, you either know how to write RMI or you don't. Grosso simply gives you a solid in-depth analysis of all the obscurities of the RMI runtime, custom sockets, dynamic classloading, activation, MarshalledObjects, and HTTP tunneling. In other words, all the interesting real-world topics whose official documentation is poor and which the various RMI tutorials (written many years ago) ignored.
While canonical, the single banking example followed through the text was well-executed, although authors continue to underestimate the prevalence of readers who consume textbooks non-linearly.
What's Not to LikeRMI/IIOP is shaping up to be a fascinating contributor to the "cleanup the EJB mess" discussion. Dedicating a measly 13 pages (beginning on page 503, no less) to this critical topic seems a bit of an oversight – but maybe that is just my CORBA sentiments speaking. Either way, the mechanics of CORBA are sufficiently intricate in real-world deployments that saying "if you can build an RMI system, you can build a CORBA system" (p. 511) is a bit brazen (or naïve) for my tastebuds. I can only chalk up this oversight to deadline pressure, which is probably a Good Thing, since the book was supposedly in production over almost 2 years.
A minor point: the top-level organization of the book (Part I, II, III) is arbitrary, ignore it -- use the chapter organization instead.
The SummaryQuality: solid practical insight into the nitty-gritty operational implementation details of RMI in the real-world. You simply are not going to find solid O'Reilly-quality coverage of the topics elsewhere.
Relevance: If you are responsible for making RMI actually work in production systems, this might well be the next animal on your shelf – either now or later. If you want a breezy afternoon saunter around RMI, skip this. Instead, google one (of the many) free tutorials online."
You can purchase Java RMI from Fatbrain. Want to see your own review here? Just read the book review guidelines, then use Slashdot's handy submission form.