Slashdot Mirror


Choosing the Right XML Database?

Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"

9 of 65 comments (clear)

  1. Berkley DB XML also an option by kzeddy · · Score: 4, Informative

    Berkley DB XML is a new product. i have not tested it though... so this is not a reccomendation.

    1. Re:Berkley DB XML also an option by Anonymous Coward · · Score: 5, Informative

      Yup I was going to mention that one. I've tested it and it works great. Basically regular Berkeley DB which rocks the house already, with an XML-aware layer on top.

      If you have lots of small XML documents this is definitely the best choice. Dunno about big reports. Berkeley scales to any size, but maybe he should split his big documents into "metadata.xml" and "report.xml".. then store and index metadata.xml in the database and put report.xml on disk. I believe there is a standard for XML Includes now, so he could have the metadata.xml actually point to the report.

      Lots of ideas. Check out Berkeley DB though, it beats Xindice (especially since it's not written Java, which pretty much ruled it out for my purposes.)

    2. Re:Berkley DB XML also an option by stonebeat.org · · Score: 2, Informative

      looks good. but doesnt have a Java API. My app is going to use Apache Cocoon which runs on Tomcat. So I would prefer a DB that has Java API

    3. Re:Berkley DB XML also an option by Anonymous Coward · · Score: 5, Informative

      It does have a java API! Did you check it out? Comes with C/C++, Java, Perl, Python, and TCL support out of the box. It's just not *written* in Java which makes it more flexible. since it's still "prerelease" you have to sign up to get the software but that's not a big deal.

    4. Re:Berkley DB XML also an option by beowulf_26 · · Score: 2, Informative

      For what it's worth, at my workplace at the moment, we're doing the exact same thing, but already have a ton of data that we need to get ingested. The pointy haired boss hired These Guys who know there stuff pretty well, and prefer to use Xindice. The only problem is that it's well.. quite slow.

      Other commercial alternatives are Ipedo or Tamino if your development house has the cash. Education discounts of 99% are availible I believe from Tamino, but the Ipedo people aren't as forthcoming with what they're willing to deal on.

      Sadly, there just isn't a hands down winner in this market, but if you're looking for something to go with Cocoon, Xindice looks to be the best OSS solution for the moment.

      --

      --I hate big sigs.
  2. Re:why an xml database? by Anonymous Coward · · Score: 5, Informative

    Even if the data you're storing is XML formatted, it might be better to map certain tags to relational columns and just store the XML doc itself as part of a normal relational table. The searches are guaranteed to be more efficient, especially with decent indexing. This won't work if you really need to do searches involving parent/child/sibling relationships between nodes.

    At the minimum make sure there's good XQuery support. XPath just won't cut it if you need to scale.

    DB2 has decent XML support currently, and great XML support coming along the pipe at some point afaik. My experiences with it have been very positive.

  3. Re:why an xml database? by rycamor · · Score: 4, Informative

    For more opinions to make you think:

    http://www.dbazine.com/pascal9.html
    http://www.dbazine.com/pascal8.html

    And here, C.J. Date argues that a truly relational DBMS should be able to support an XML data type:

    http://www.dbdebunk.com/lauri1.htm

    (PostgreSQL is an example of a DBMS with extensible types)

  4. Don't count out object databases by mattc58 · · Score: 3, Informative

    It's interesting that you bring this up.

    I just finished writing an article for an online magazine on object databases and .NET. You might want to look into Matisse. It's got bindings for all the popular languages, it's an object database, and it's got SQL interfaces. Nice.

    And I'll point everybody to my article when it's published.

  5. Re:why an xml database? by rycamor · · Score: 2, Informative

    >wouldn't that just be storing a string?

    Oh no. I mean, yes, you could just store XML as a string in a BLOB column, but that's no better than just storing as a file.

    A custom XML datatype would not treat the XML as a blob, but actually parse the XML upon input into the table, storing an internal representation (probably as an associative array) which would allow custom operators to traverse the tree, visit nodes, etc...

    But, it would also allow you to perform relational queries and place integrity constraints on your XML documents.

    To explain further, I will use a specific example: PostgreSQL allows you to create custom datatypes, even importing C functions to handle the input and output of these types. Thus, theoretically, you should be able to create a custom datatype called "xmldoc", and use code from a standard C library which handles XML, such as libxml, or expat, which will parse the XML string into an internal data structure, and vice-versa upon output.

    (I must stress, this is theoretical. I haven't had the time or need for such a thing, but according to the documentation, it should be possible)