Do XML-based Databases Live Up to the Hype?
douthitb asks: "I have recently started work as a contractor with a company developing/improving an application for exchanging large amounts of data. The current solution exchanges data via XML, but the data itself is stored in a SQL Server database. There is a concern about the overhead involved with wrapping and unwrapping the XML to get the data in and out of a relational database. The proposed solution is to use Tamino, an XML-based database. Neither I nor any of the other developers have any experience with Tamino, but the desired result is to remove the bottleneck of converting the XML back and forth. Does anyone have experience using Tamino (or any other XML-based database)? What benefits and/or difficulties did you have in using an XML database, as opposed to its relational counterpart? How large of a learning curve should be expected with a product like this? Do XML databases really live up to the hype? A similar topic was discussed on Slashdot way back when, so I was hoping to get some more up-to-date feedback on the subject."
"Sales reps from Software AG, the makers of Tamino, were brought in to discuss the benefits of their product with us. They, of course, presented Tamino as the end all, cure all database system (it will even clear your acne and make you popular with the girls!). The management of the company I'm contracting with were basically eating out of the sales reps' hands, without asking any of the "tough" questions about what the product can do; I was less convinced. Doing some initial searching on the Internet, I have had trouble finding much information about Tamino outside of the Software AG website."
The first question to answer is, why is this data in a relational database to begin with? More to the point, is this application the only one that accesses the data, or are there other, non-XML centric databases that make use of the same data? The relational model gives you flexibility that XML does not for dealing with the data in arbitrary and unforeen ways (XML can be quite flexible with XSLT, but a programmer must still intervene for each and every new way you want to use tha data, with a much bigger performance hit). The normalised relational database stores your data in a mathematically sound way that puts the priority on integrity of data independently from its past, present or future structure; XML preserves data structure based on its present use while leaving the door open to moving from that to any arbitrary future use... which of the two ideals is more attractive depends on the nature of the data and how many applications need to use it.
Relational databases with good XML support (my background is DB2 but most major databases should be able to do this) reach a good compromise by giving you acces to normalised relational data as XML (which you can compliment with XSLT it if that's what needs to be done), while preserving it internally reduced to its bare essence as data (according to relational calculus' idea of what constitutes the bare essence of data, anyway.)
On the other hand, for single-app applications, or data that is more file oriented than datum-oriented (databases of XML documents where the document rarely or never needs to be abstracted from the data it contains), XML databases offer simplicity and efficiency by removing the need to work out a relational data model. Why break up your structured documents into a DBA's hand-tuned data model when 99.9% of your queries will just build these data sets back into XML documents (even when DB2, Oracle, and I assume SQL Server can automate this last task)? An XML database can give you more flexibility in querying than an all-XSLT solution, while saving a lot of unnecessary work over an SQL-to-XML solution for what is really an XML-to-XML application.
As I see it, that's the big picture. The actual decision has to come down to your applications. An XML database will be less efficient for non-XML applictions, plain and simple. Querying XML cannot be made as fast as querying relational tables, meaning extra overhead for non-XML apps. But *your* application encurs overhead in turning relational tables into XML (probably via the RDBMS's internal facility), and in transforming it if necessary. The question is therefore: who makes more queries on the database, this application or other non-XML ones? Who will make more queries in 5 years?
If you answer 'others' to either question, use a relational database--their XML support is decent now and will only get better, and they're far more popular in business which is an important CYA factor. If you answer 'your app' or 'other XML-based apps' for both questions, it's time to check out what XML databases have to offer right now. I expect other posts to comment on the current state of the art right now, but you can expect things to only get better as industry support for XQuery et al. improves--but don't expect them to *ever* pass up the relational databases in terms of raw performance, it's impossible. But as the evolution from Assembler to C to Java has shown in programming languages, the day may come when raw performance takes a back seat to other concerns.