Slashdot Mirror


Choosing the Right XML Database?

Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"

6 of 65 comments (clear)

  1. someone please explain... by tongue · · Score: 3, Interesting

    Frankly, I don't think i understand why relational is considered a poor choice for this. would someone please explain this? (this is not a troll, i really don't know) is it just the work involved in storing an object in a set of tables?

    1. Re:someone please explain... by alienmole · · Score: 2, Interesting
      Actually, I've implemented relational databases with schemas exactly like the one in your example. Of course, you'd have Customer, Order, and OrderItem tables. The Product table would be generic and primarily contain a unique ID for each product, whether it's a car, apple, orange, whatever. This table might also have some other generic fields like Description, Price, etc.

      To handle the specific attributes of each product, one way to do it is to have a separate table for each product type that has unique attributes, and use a type selector field in the Product table. However, this is somewhat non-relational and may not scale well to large numbers of products. Another way to do it, which is more flexible, is to have a generic ProductAttribute table with fields like (ProductID, AttributeID, Value) - details would vary depending on what you're trying to achieve, e.g. whether you want a distinct ProductID for each unique set of attributes, or want to select attributes only per order (if you're custom-building based on orders).

      All the queries you mention are perfectly doable. Orders with 2 items? "select * from OrderItems group by OrderID having count(OrderID)=2". Orders with yellow items? "select ProductID from ProductAttributes where AttributeID=COLOR_ATTRIBUTE and Value='yellow'" would give a list of all yellow products. You could extend this request with joins into the Order table, or whatever it is you need. "The items that a customer who bought a chair and a yellow apple in possibly different trips has bought" : pretty simple, determine the product ids as above, join to the Orders table, and filter on the customer you're interested using "where CustomerID=$custid".

      Sometimes, it is best to to define the schema later in the game, after you figure out what you are doing.

      More likely, this is the road to disaster. I've seen companies that have painted themselves into some seriously small corners by doing this, and then spent millions on maintaining a system that just doesn't do what they need. Careful and detailed upfront analysis can save a huge amount of time and money. What you're really saying is that XML can be a substitute for upfront design. Maybe in small systems, but otherwise, that's just irresponsible.

      I use XML plenty - as a transmission format for data in web apps, as a metadata representation format, for small domain-specific languages, and for document-oriented applications. But thinking of XML as a way to avoid having to actually figure out what you're doing - I guess it'll lead to job security for someone in the future, when all that has to be thrown out and replaced. Probably won't be fun jobs though.

  2. two articles on the subject by DevilM · · Score: 2, Interesting

    http://builder.com.com/article.jhtml?id=u003200303 06gcn01.htm

    http://www.devx.com/xml/article/9796

  3. logical versus physical by Frans+Faase · · Score: 2, Interesting

    It seems that nowadays most people have a great problem distinguishing between the logical and the physical representation/storage of data. (Personally, I think that XML sucks from a logical point of view, because its semantics are rather weak and limited.) What we lack is tools for mapping logical representations to physical representations. I think that the main reason why we do not have such tools is that from a marketing perspective they would be very undesirable. (No serious commercial company likes to adhere to an open standard, as that would make it very easy for a customer to switch.)

  4. I don't see the point... by megajini · · Score: 2, Interesting

    I don't see exactly where I would need that kind of XML-Database... My applications usually have a big load of model-objects witch represent the structure of my data at "work-time". This is a very beautiful and elegant way of building applications.

    The real Problem (in terms of flexibility and time) is the massive work needed for fetching data from relational db (Everything is working in Java, using JDBC2 compatible Connection-Pools) and getting it into the data-model and the way back...

    So there are two choices: Using an object-oriented db or using some xml-db features. The latter would be great, having some nifty utilities to produce xml out of object's data and then send this xml directly to the db.

    But I definitely don't see why a XML-DB should automatically do everything (It's like saving Word-XML to a Database). I think the entire concept of fast querys and stability would be destroyed at once. Those XML-Documents are application-specific-data written in standard XML (XML is only a structured language, it's like writing a document in english and hungarian; of course english will be more "useful" for the world), so it's the business of the application to care about it.
    There is virtually no product out there being able to do that magic trick. The only cool software I know that does this (not with xml) is Lotus Notes and even there a load of additional information is required...

  5. Tamino by Software AG by munkinut · · Score: 3, Interesting

    When I worked on the Ananova project, we started off using Tamino by Software AG, which was great while we were in development, but we had trouble scaling from tens of stories per day to dealing with thousands of stories per day when we went live. Backing up, moving data between versions, and restoring onto higher spec boxes proved to be a nightmare, and we soon moved to Oracle instead. This was 3 years ago however, and the product may have matured since then. It would meet your requirements as stated certainly, and would be worth checking out. There are also Netbeans modules to aid development in Java.

    --
    re-invent wheels ... you never know