Choosing the Right XML Database?
Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"
<post>
first
</post>
<!-- take that beyotches -->
Berkley DB XML is a new product. i have not tested it though... so this is not a reccomendation.
To pick the right database, you need to analyze the structure of your data and the operations you intend to perform on it. XML is a useful general format for interchange of serialized data, but just because you have some data represented in XML doesn't mean you should store it in XML. What is the structure of the data? What will you do with it? Why is a relational database or a object database a bad choice for your application?
Thre are the things I would question first:
....
... it was very fast ... and well, you programmed your XML manipulation by directly manipulation "virtual" DOM trees inside of the DB. In SQL and in a relational DB, of course.
:-)
... XML. It makes no sense when you think you need to use XML because of hype or something ....
a) does it use XQuerry/XPath to access the DB or an other standard way or is it proprietary?
b) does it support your programming language of choice?
c) Where do you get fastest a running prototype?
C) is the most important point IMHO. If you have chosen the right DB you only know AFTER you have implemented your application. (( well, you can try to find test cases and try to predict if the DB is the right one by trying to scale tests up)) Note: I used the word try several times, because such an approach is only trial and error.
Ok, if you can just start coding, that was point c), and a standard like a) is supported, then you should be easyly able to hide the actual DB behind an suiting interface.
b) is only a matter of your flexibility
I would guess the appliacation has more constraints which will likely limit you or challange you to overcome than the DB used behind it.
I once read an article in a german magazine, they have put a DOM writer and a DOM reader as stored procedures into an SQL data base.
And all XML was stored in a few tables, element, attribute and such
So much to "relational" wont fit your needs
Regards,
angel'o'sphere
P.S. You gave not many hints why you need an XML database. A XML database makes only sense if your natural document format is
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Frankly, I don't think i understand why relational is considered a poor choice for this. would someone please explain this? (this is not a troll, i really don't know) is it just the work involved in storing an object in a set of tables?
http://builder.com.com/article.jhtml?id=u003200303 06gcn01.htm
http://www.devx.com/xml/article/9796
Does anyone know if any of the above can maintain XSLT transformations of the data as views? Much like you can create SQL views etc? That would be a usefull feature.
It's interesting that you bring this up.
.NET. You might want to look into Matisse. It's got bindings for all the popular languages, it's an object database, and it's got SQL interfaces. Nice.
I just finished writing an article for an online magazine on object databases and
And I'll point everybody to my article when it's published.
If you have a lot of money, try Tamino.
It seems that nowadays most people have a great problem distinguishing between the logical and the physical representation/storage of data. (Personally, I think that XML sucks from a logical point of view, because its semantics are rather weak and limited.) What we lack is tools for mapping logical representations to physical representations. I think that the main reason why we do not have such tools is that from a marketing perspective they would be very undesirable. (No serious commercial company likes to adhere to an open standard, as that would make it very easy for a customer to switch.)
From the ground up, Object Store was built as a purely XML databse.
Take the simple instance of a BOM relationship.
For those not sure what a BOM is, it stands for bill of materials. In those relationships, you have a part. It is made up of other parts. Each of those parts is made up of parts, etc. etc. The end result of large complex parts is a non-determinant SQL join. Say you need to find how many screws you need for a car. It's a nasty issue for relationals. XML systems, OTOH, handle it beautifully. XPath would do that query simply, pulling out a single part throughout wherever it's used. Bingo, you have the part count.
That type of indeterminantly nested relationship, it brings relational dbs to their knees -- all of them.
I teach this subject to MsC and PhD students at a university, and I tell them not to buy into the hype. Use the best DB type for your data. Relationals are one type, not the only type.
...tizzyd
The relational model is a logical model and I challenge you to find any example of data that cannot be represented quite easily in the relational model. In your example, you have traded any notion of data integrity for what you assume will be faster data access. In fact, since the relational model makes no recommendations on how data is physically stored, this is not necessarily the case.
How would XPath enforce your rules on how parts can relate to other parts? Why don't you just try flat files and grep?
XML is a perfectly acceptable means of data representation. It does not however by any means provide a formal, coherent theory of data management.
I really hope you were kidding about teaching anyone anything. You have a lot of learning to do.
IMS is a hierarchical database from IBM. The structure of the DB matches up with XML nicely and it is super fast. Of course it is also one of the oldest software products in existence...
Lasers Controlled Games!
Take the example of a BOM, given above, with subassembiles (components which are not raw materials) This type of layout is common in manufacturing situations.
I can build something called WIDGET1, for example.
WIDGET1 uses 10 screws and 3 of WIDGET2.
WIDGET2 uses another 20 screws and 4 more of WIDGET3.
Write SQL query which can express this information. The only requirement placed upon your tables is they must be at least 2nf.
You're going to end up rejoining your BOM file at least a couple times for that one. The problem is, the subassemblies can recurse an arbitrary number of times. That kind of thing cannot be expressed in a single query, since SQL can't recurse.
That's the kind of situation where you need something like an XMLDB.
OTOH. You can express this problem with a series of queries, and if that is the only problem in your app that an RDB can't solve, but there are many other places where it's a better fit, you make it work. Thus is the way of the real world.
I believe we've found our solution (hope I'm not speaking too soon). But we happened upon eXist for an XML database solution. While sourceforge lists it as alpha, the currunt version number is 0.9 and it seems rather mature, and FAR faster than Xindice. It looks to be a really good solution, and is easy to administrate. It also boasts Cocoon interoperability. Since you're going to be using Java anyways, it shouldn't be a problem that it's based on Java 1.3/1.4, thus being cross platform to boot.
Also, you said you were worrying about DBs having a Java API. Quoted from the eXist homepage:Cheers.
--I hate big sigs.
I don't see exactly where I would need that kind of XML-Database... My applications usually have a big load of model-objects witch represent the structure of my data at "work-time". This is a very beautiful and elegant way of building applications.
The real Problem (in terms of flexibility and time) is the massive work needed for fetching data from relational db (Everything is working in Java, using JDBC2 compatible Connection-Pools) and getting it into the data-model and the way back...
So there are two choices: Using an object-oriented db or using some xml-db features. The latter would be great, having some nifty utilities to produce xml out of object's data and then send this xml directly to the db.
But I definitely don't see why a XML-DB should automatically do everything (It's like saving Word-XML to a Database). I think the entire concept of fast querys and stability would be destroyed at once. Those XML-Documents are application-specific-data written in standard XML (XML is only a structured language, it's like writing a document in english and hungarian; of course english will be more "useful" for the world), so it's the business of the application to care about it.
There is virtually no product out there being able to do that magic trick. The only cool software I know that does this (not with xml) is Lotus Notes and even there a load of additional information is required...
Try the generic IBM XML page also.
"Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech."--Benjamin Franklin
It's not relational, it's been described as 'document oriented' which is perfect for storing and retrieving XML docs. It's also extremely flexible, extremely secure (NSA, CIA, FAA, and 80+ million other users), and fast to program with (RAD), and supports tons of open standards. For you fans of "View Model Controller" - Domino has been using this architecture for over 15 years now...
The XML classes are built in (or easily extend your own classes using LotusScript, Java, C++, COM, anything really!!!) There is an intro on the dev site that described the classses. Check out the demo code in the sandbox, or surf from the main product page. By the way, it runs on almost any OS/Platform (AIX, OS/400, Linux, Solaris, Windows).
Personally, I would use Domino if I was going to create a repository fo reports in XML. The model fits like a glove and it's a pleasure to program/maintain.
Here's a few random Domino related URLs for you...
Gary's Devendorf is the Product Manager for the AppDev portion of the Domino product and he has a section on one of the dev sites with XML references.
Off topic, but you can run your blog on the side (graphically challenged site warning) check out the links to Domino people, especially Libby !
And there's even an Open Source group of Domino developers.
"Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech."--Benjamin Franklin
First, you use SQL and relational interchangeably. That is incorrect.
Second, you fail to provide a coherent logical model of your data - something that is necessary regardles of your preference for a RDBMs or a "XMLDB".
For example, your refer to WIDGET1 as an entity when really it is a type. In your database, you will need to track the instances of WIDGET1. Something is a WIDGET1 because it needs to relate to 10 screws and 3 instances of WIDGET2.
So far we are talking about
1 WIDGET1s
3 WIDGET2s
4 WIDGET3s
70 screws (assuming the screws used by in a WIDGET1 and a WIDGET2 are the same)
Perhaps some clarification would sort this out. Do you mean to say that a WIDGET is a WIDGET and that it becomes a "WIDGET1" only when it is put together with 10 screws and 3 other WIDGETs? In that case, why are the 3 other WIDGETs not miraculously transformed into "WIDGET1s. After all, they too are related to 3 other WIDGETS and 10 screws.
You must create a logical model of this - what are the components of your data, how do they relate, what are the rules that govern data manipulation. This is where relational technology comes in. Relational technology allows you to use the predicate logic to describe the specifics of your data to a RDBMs.
When I worked on the Ananova project, we started off using Tamino by Software AG, which was great while we were in development, but we had trouble scaling from tens of stories per day to dealing with thousands of stories per day when we went live. Backing up, moving data between versions, and restoring onto higher spec boxes proved to be a nightmare, and we soon moved to Oracle instead. This was 3 years ago however, and the product may have matured since then. It would meet your requirements as stated certainly, and would be worth checking out. There are also Netbeans modules to aid development in Java.
re-invent wheels
Is an XML database the right tool for the job?
I'm not a relational-zealot like the sorts found at dbdebunk.com; I don't worship the table and the join, but neither do I worship the DTD and the entity. If you're just starting a project, think long and hard about your options. Maybe an XML database will be the best tool for the job, or maybe a relational database will, or maybe an OODBMS will work better, or maybe you'd be better off with an object-persistence system such as Prevayler.
I can't know the answer, and neither can anyone else here, since you haven't supplied (and probably wouldn't be allowed to supply) enough data on which to base a decision. So just think about it. You have an opportunity to do it right the first time, so make sure you know what that is before doing anything.