Choosing the Right XML Database?
Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"
Berkley DB XML is a new product. i have not tested it though... so this is not a reccomendation.
Even if the data you're storing is XML formatted, it might be better to map certain tags to relational columns and just store the XML doc itself as part of a normal relational table. The searches are guaranteed to be more efficient, especially with decent indexing. This won't work if you really need to do searches involving parent/child/sibling relationships between nodes.
At the minimum make sure there's good XQuery support. XPath just won't cut it if you need to scale.
DB2 has decent XML support currently, and great XML support coming along the pipe at some point afaik. My experiences with it have been very positive.
For more opinions to make you think:
http://www.dbazine.com/pascal9.html
http://www.dbazine.com/pascal8.html
And here, C.J. Date argues that a truly relational DBMS should be able to support an XML data type:
http://www.dbdebunk.com/lauri1.htm
(PostgreSQL is an example of a DBMS with extensible types)
It's interesting that you bring this up.
.NET. You might want to look into Matisse. It's got bindings for all the popular languages, it's an object database, and it's got SQL interfaces. Nice.
I just finished writing an article for an online magazine on object databases and
And I'll point everybody to my article when it's published.
>wouldn't that just be storing a string?
Oh no. I mean, yes, you could just store XML as a string in a BLOB column, but that's no better than just storing as a file.
A custom XML datatype would not treat the XML as a blob, but actually parse the XML upon input into the table, storing an internal representation (probably as an associative array) which would allow custom operators to traverse the tree, visit nodes, etc...
But, it would also allow you to perform relational queries and place integrity constraints on your XML documents.
To explain further, I will use a specific example: PostgreSQL allows you to create custom datatypes, even importing C functions to handle the input and output of these types. Thus, theoretically, you should be able to create a custom datatype called "xmldoc", and use code from a standard C library which handles XML, such as libxml, or expat, which will parse the XML string into an internal data structure, and vice-versa upon output.
(I must stress, this is theoretical. I haven't had the time or need for such a thing, but according to the documentation, it should be possible)