Choosing the Right XML Database?

← Back to Stories (view on slashdot.org)

Choosing the Right XML Database?

Posted by Cliff on Wednesday March 12, 2003 @06:39AM from the storing-data-with-new-tech dept.

Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"

7 of 65 comments (clear)

Min score:

Reason:

Sort:

your xml by Anonymous Coward · 2003-03-12 06:42 · Score: 5, Funny

<post> first </post> 
why an xml database? by jeffdill · 2003-03-12 07:00 · Score: 5, Insightful

To pick the right database, you need to analyze the structure of your data and the operations you intend to perform on it. XML is a useful general format for interchange of serialized data, but just because you have some data represented in XML doesn't mean you should store it in XML. What is the structure of the data? What will you do with it? Why is a relational database or a object database a bad choice for your application?
1. Re:why an xml database? by Anonymous Coward · 2003-03-12 07:44 · Score: 5, Informative
  
  Even if the data you're storing is XML formatted, it might be better to map certain tags to relational columns and just store the XML doc itself as part of a normal relational table. The searches are guaranteed to be more efficient, especially with decent indexing. This won't work if you really need to do searches involving parent/child/sibling relationships between nodes.
  
  At the minimum make sure there's good XQuery support. XPath just won't cut it if you need to scale.
  
  DB2 has decent XML support currently, and great XML support coming along the pipe at some point afaik. My experiences with it have been very positive.
Re:Berkley DB XML also an option by Anonymous Coward · 2003-03-12 07:00 · Score: 5, Informative

Yup I was going to mention that one. I've tested it and it works great. Basically regular Berkeley DB which rocks the house already, with an XML-aware layer on top.

If you have lots of small XML documents this is definitely the best choice. Dunno about big reports. Berkeley scales to any size, but maybe he should split his big documents into "metadata.xml" and "report.xml".. then store and index metadata.xml in the database and put report.xml on disk. I believe there is a standard for XML Includes now, so he could have the metadata.xml actually point to the report.

Lots of ideas. Check out Berkeley DB though, it beats Xindice (especially since it's not written Java, which pretty much ruled it out for my purposes.)
Take the easyst way by angel'o'sphere · 2003-03-12 07:02 · Score: 5, Insightful

Thre are the things I would question first:

a) does it use XQuerry/XPath to access the DB or an other standard way or is it proprietary?
b) does it support your programming language of choice?
c) Where do you get fastest a running prototype?

C) is the most important point IMHO. If you have chosen the right DB you only know AFTER you have implemented your application. (( well, you can try to find test cases and try to predict if the DB is the right one by trying to scale tests up)) Note: I used the word try several times, because such an approach is only trial and error.

Ok, if you can just start coding, that was point c), and a standard like a) is supported, then you should be easyly able to hide the actual DB behind an suiting interface.

b) is only a matter of your flexibility ....

I would guess the appliacation has more constraints which will likely limit you or challange you to overcome than the DB used behind it.

I once read an article in a german magazine, they have put a DOM writer and a DOM reader as stored procedures into an SQL data base.

And all XML was stored in a few tables, element, attribute and such ... it was very fast ... and well, you programmed your XML manipulation by directly manipulation "virtual" DOM trees inside of the DB. In SQL and in a relational DB, of course.

So much to "relational" wont fit your needs :-)

Regards,
angel'o'sphere

P.S. You gave not many hints why you need an XML database. A XML database makes only sense if your natural document format is ... XML. It makes no sense when you think you need to use XML because of hype or something ....

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Berkley DB XML also an option by Anonymous Coward · 2003-03-12 08:02 · Score: 5, Informative

It does have a java API! Did you check it out? Comes with C/C++, Java, Perl, Python, and TCL support out of the box. It's just not *written* in Java which makes it more flexible. since it's still "prerelease" you have to sign up to get the software but that's not a big deal.
Re:someone please explain... by Anonymous Coward · 2003-03-12 08:08 · Score: 5, Insightful

Well, if you're just sticking the entire XML document into the table like a blob of text, then yeah, there's no problem. But then you can't really do anything with it other than retrieve it (i.e., you can't run a query on parts of it).

But if you want the database to be aware of the *structure* of the data, you have to decompose the data into pieces, stick them in various tables, keep the integrity between the tables, and, oh yeah, write some code to convert the data back into XML when you want to get the whole document.

For instance if you are storing an XML document that's made with one-or-more Chapters, Paragraphs, or Sentences and each Chapter has one-or-more Paragraphs, and each Paragraph can contain Sentences .. etc.. you have a complex many-to-one structure you have to store in multiple tables .. how would you do it? Well, you'd make a document table, a chapter table, a sentence table, and link them all together with unique id's .. etc.. you get the point I hope, that the XML doc's rich structure has to be decomposed into rows and columns.

XML databases take care of this automatically and also can *index* the various parts of the document so that queries (XPath or otherwise) run faster (i.e., give me the documents that contain sentences beginning with "Hello").