Slashdot Mirror


Choosing the Right XML Database?

Saqib Ali asks: "Later this year, I will be starting a project, that will involve storing XML data in a database. I understand why a Relational DB is not a good choice. I also understand why a pure OODB like Objectivity is not a good option either. So I started doing some research into various XML DBs like Apache Xindice, exist-db, Oracle 9i, and others, but I am unable to decide which XML DB to use. What criteria should one use when evaluating whether an XML DB will be a good option for a particular application? I would prefer using an Open Source solution. Initially my application wil involve storing reports in an XML repository, for retrieval via XPath, but the reports will get larger with time. Any suggestions on how to decide which database to use?"

24 of 65 comments (clear)

  1. your xml by Anonymous Coward · · Score: 5, Funny

    <post>
    first
    </post>
    <!-- take that beyotches -->

  2. Berkley DB XML also an option by kzeddy · · Score: 4, Informative

    Berkley DB XML is a new product. i have not tested it though... so this is not a reccomendation.

    1. Re:Berkley DB XML also an option by Anonymous Coward · · Score: 5, Informative

      Yup I was going to mention that one. I've tested it and it works great. Basically regular Berkeley DB which rocks the house already, with an XML-aware layer on top.

      If you have lots of small XML documents this is definitely the best choice. Dunno about big reports. Berkeley scales to any size, but maybe he should split his big documents into "metadata.xml" and "report.xml".. then store and index metadata.xml in the database and put report.xml on disk. I believe there is a standard for XML Includes now, so he could have the metadata.xml actually point to the report.

      Lots of ideas. Check out Berkeley DB though, it beats Xindice (especially since it's not written Java, which pretty much ruled it out for my purposes.)

    2. Re:Berkley DB XML also an option by stonebeat.org · · Score: 2, Informative

      looks good. but doesnt have a Java API. My app is going to use Apache Cocoon which runs on Tomcat. So I would prefer a DB that has Java API

    3. Re:Berkley DB XML also an option by Anonymous Coward · · Score: 5, Informative

      It does have a java API! Did you check it out? Comes with C/C++, Java, Perl, Python, and TCL support out of the box. It's just not *written* in Java which makes it more flexible. since it's still "prerelease" you have to sign up to get the software but that's not a big deal.

    4. Re:Berkley DB XML also an option by beowulf_26 · · Score: 2, Informative

      For what it's worth, at my workplace at the moment, we're doing the exact same thing, but already have a ton of data that we need to get ingested. The pointy haired boss hired These Guys who know there stuff pretty well, and prefer to use Xindice. The only problem is that it's well.. quite slow.

      Other commercial alternatives are Ipedo or Tamino if your development house has the cash. Education discounts of 99% are availible I believe from Tamino, but the Ipedo people aren't as forthcoming with what they're willing to deal on.

      Sadly, there just isn't a hands down winner in this market, but if you're looking for something to go with Cocoon, Xindice looks to be the best OSS solution for the moment.

      --

      --I hate big sigs.
  3. why an xml database? by jeffdill · · Score: 5, Insightful

    To pick the right database, you need to analyze the structure of your data and the operations you intend to perform on it. XML is a useful general format for interchange of serialized data, but just because you have some data represented in XML doesn't mean you should store it in XML. What is the structure of the data? What will you do with it? Why is a relational database or a object database a bad choice for your application?

    1. Re:why an xml database? by Anonymous Coward · · Score: 5, Informative

      Even if the data you're storing is XML formatted, it might be better to map certain tags to relational columns and just store the XML doc itself as part of a normal relational table. The searches are guaranteed to be more efficient, especially with decent indexing. This won't work if you really need to do searches involving parent/child/sibling relationships between nodes.

      At the minimum make sure there's good XQuery support. XPath just won't cut it if you need to scale.

      DB2 has decent XML support currently, and great XML support coming along the pipe at some point afaik. My experiences with it have been very positive.

    2. Re:why an xml database? by rycamor · · Score: 4, Informative

      For more opinions to make you think:

      http://www.dbazine.com/pascal9.html
      http://www.dbazine.com/pascal8.html

      And here, C.J. Date argues that a truly relational DBMS should be able to support an XML data type:

      http://www.dbdebunk.com/lauri1.htm

      (PostgreSQL is an example of a DBMS with extensible types)

    3. Re:why an xml database? by rycamor · · Score: 2, Informative

      >wouldn't that just be storing a string?

      Oh no. I mean, yes, you could just store XML as a string in a BLOB column, but that's no better than just storing as a file.

      A custom XML datatype would not treat the XML as a blob, but actually parse the XML upon input into the table, storing an internal representation (probably as an associative array) which would allow custom operators to traverse the tree, visit nodes, etc...

      But, it would also allow you to perform relational queries and place integrity constraints on your XML documents.

      To explain further, I will use a specific example: PostgreSQL allows you to create custom datatypes, even importing C functions to handle the input and output of these types. Thus, theoretically, you should be able to create a custom datatype called "xmldoc", and use code from a standard C library which handles XML, such as libxml, or expat, which will parse the XML string into an internal data structure, and vice-versa upon output.

      (I must stress, this is theoretical. I haven't had the time or need for such a thing, but according to the documentation, it should be possible)

  4. Take the easyst way by angel'o'sphere · · Score: 5, Insightful

    Thre are the things I would question first:

    a) does it use XQuerry/XPath to access the DB or an other standard way or is it proprietary?
    b) does it support your programming language of choice?
    c) Where do you get fastest a running prototype?

    C) is the most important point IMHO. If you have chosen the right DB you only know AFTER you have implemented your application. (( well, you can try to find test cases and try to predict if the DB is the right one by trying to scale tests up)) Note: I used the word try several times, because such an approach is only trial and error.

    Ok, if you can just start coding, that was point c), and a standard like a) is supported, then you should be easyly able to hide the actual DB behind an suiting interface.

    b) is only a matter of your flexibility ....

    I would guess the appliacation has more constraints which will likely limit you or challange you to overcome than the DB used behind it.

    I once read an article in a german magazine, they have put a DOM writer and a DOM reader as stored procedures into an SQL data base.

    And all XML was stored in a few tables, element, attribute and such ... it was very fast ... and well, you programmed your XML manipulation by directly manipulation "virtual" DOM trees inside of the DB. In SQL and in a relational DB, of course.

    So much to "relational" wont fit your needs :-)

    Regards,
    angel'o'sphere

    P.S. You gave not many hints why you need an XML database. A XML database makes only sense if your natural document format is ... XML. It makes no sense when you think you need to use XML because of hype or something ....

    --
    Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
  5. someone please explain... by tongue · · Score: 3, Interesting

    Frankly, I don't think i understand why relational is considered a poor choice for this. would someone please explain this? (this is not a troll, i really don't know) is it just the work involved in storing an object in a set of tables?

    1. Re:someone please explain... by Anonymous Coward · · Score: 5, Insightful

      Well, if you're just sticking the entire XML document into the table like a blob of text, then yeah, there's no problem. But then you can't really do anything with it other than retrieve it (i.e., you can't run a query on parts of it).

      But if you want the database to be aware of the *structure* of the data, you have to decompose the data into pieces, stick them in various tables, keep the integrity between the tables, and, oh yeah, write some code to convert the data back into XML when you want to get the whole document.

      For instance if you are storing an XML document that's made with one-or-more Chapters, Paragraphs, or Sentences and each Chapter has one-or-more Paragraphs, and each Paragraph can contain Sentences .. etc.. you have a complex many-to-one structure you have to store in multiple tables .. how would you do it? Well, you'd make a document table, a chapter table, a sentence table, and link them all together with unique id's .. etc.. you get the point I hope, that the XML doc's rich structure has to be decomposed into rows and columns.

      XML databases take care of this automatically and also can *index* the various parts of the document so that queries (XPath or otherwise) run faster (i.e., give me the documents that contain sentences beginning with "Hello").

    2. Re:someone please explain... by Tablizer · · Score: 3, Insightful

      XML databases take care of this automatically

      Take care of what? Parsing? That is a parser, not a database. How about a specific example.

      Relational is pretty flexible if you just know how to use it. (I agree that existing commercial relational systems could use some adjustments, but lets not throw the Cray out with the bathwater.)

      Too much of this XML database stuff sounds like a return to the "navigational" databases of the 1960's. Do we really want that? Dr. Codd rescued us from those. Now you want to be un-rescued?

    3. Re:someone please explain... by alienmole · · Score: 2, Interesting
      Actually, I've implemented relational databases with schemas exactly like the one in your example. Of course, you'd have Customer, Order, and OrderItem tables. The Product table would be generic and primarily contain a unique ID for each product, whether it's a car, apple, orange, whatever. This table might also have some other generic fields like Description, Price, etc.

      To handle the specific attributes of each product, one way to do it is to have a separate table for each product type that has unique attributes, and use a type selector field in the Product table. However, this is somewhat non-relational and may not scale well to large numbers of products. Another way to do it, which is more flexible, is to have a generic ProductAttribute table with fields like (ProductID, AttributeID, Value) - details would vary depending on what you're trying to achieve, e.g. whether you want a distinct ProductID for each unique set of attributes, or want to select attributes only per order (if you're custom-building based on orders).

      All the queries you mention are perfectly doable. Orders with 2 items? "select * from OrderItems group by OrderID having count(OrderID)=2". Orders with yellow items? "select ProductID from ProductAttributes where AttributeID=COLOR_ATTRIBUTE and Value='yellow'" would give a list of all yellow products. You could extend this request with joins into the Order table, or whatever it is you need. "The items that a customer who bought a chair and a yellow apple in possibly different trips has bought" : pretty simple, determine the product ids as above, join to the Orders table, and filter on the customer you're interested using "where CustomerID=$custid".

      Sometimes, it is best to to define the schema later in the game, after you figure out what you are doing.

      More likely, this is the road to disaster. I've seen companies that have painted themselves into some seriously small corners by doing this, and then spent millions on maintaining a system that just doesn't do what they need. Careful and detailed upfront analysis can save a huge amount of time and money. What you're really saying is that XML can be a substitute for upfront design. Maybe in small systems, but otherwise, that's just irresponsible.

      I use XML plenty - as a transmission format for data in web apps, as a metadata representation format, for small domain-specific languages, and for document-oriented applications. But thinking of XML as a way to avoid having to actually figure out what you're doing - I guess it'll lead to job security for someone in the future, when all that has to be thrown out and replaced. Probably won't be fun jobs though.

    4. Re:someone please explain... by battjt · · Score: 2, Insightful

      You can store anything in a SQL database, but you do have to take the time to design it and migrate the data as the schema changes.

      Spending lots of time and money designing a system that the customer can not imagine is a waste of money, because you will have to change the design as the business units focus on what they want, normally after they see your initial results.

      Sometimes you have to use duct tape.

      I have one app in production that uses XML files as data stores. There are about 24 users. I also have apps in production with 1000s of users that use 20GB+ SQL databases. I use apps that utilize lisp dumps for fast read only datastores (in addition to emacs). There is a place for everything.

      Joe

      --
      Joe Batt Solid Design
    5. Re:someone please explain... by battjt · · Score: 2, Insightful

      In my observation, if you don't have enough info to create a starting schema, then you need to do some more analysis.

      This is exactly the problem. How do you get any analysis if the customer doesn't know what to ask for. Applictions evolve. The flexibility offered by an unstructed data store like XML lets you eveolve the data model like the rest of the application.

      You gloss over the hard part with "etc..." Attributes or even structured child tags can not be anticipated and built into the schema or else by the time you do then you've just built an XML database.

      --
      Joe Batt Solid Design
    6. Re:someone please explain... by dpt · · Score: 2, Insightful

      If you look at the needs of most "complex hierarchical structures", it often turns out that trees are the wrong "structure" to begin with

      What about cases where entities can contain instances of *themselves*? Or where the depth and width of the nesting is not necessarily known up front?

      You end up creating these artificial "id" fields, and in so doing build a "tree" on top of the relational database, which is a very silly thing to do.

      And what about cases where ordering of contained elements is important?

      We would probably have to explore a specific case study to settle this.

      I don't think it can be "settled". Apples and oranges, and all that. Sure, you can *force* a relational database to represent anything, it's just often like putting a round peg in a square hole, particularly for engineering, scientific, and mathematical problem domains.

      If you wish to take this elsewhere as to not bloat up slashdot, I would be happy to

      Take it where? I suggest your journal.

      The worst case I have had was one in which ordering was important. Relational databases just can't handle that very well, being based on set theory IIRC. Sure, you can add a field containing an ordinal to represent the "order", but that breaks badly if you need to insert or remove items.

  6. two articles on the subject by DevilM · · Score: 2, Interesting

    http://builder.com.com/article.jhtml?id=u003200303 06gcn01.htm

    http://www.devx.com/xml/article/9796

  7. Don't count out object databases by mattc58 · · Score: 3, Informative

    It's interesting that you bring this up.

    I just finished writing an article for an online magazine on object databases and .NET. You might want to look into Matisse. It's got bindings for all the popular languages, it's an object database, and it's got SQL interfaces. Nice.

    And I'll point everybody to my article when it's published.

  8. logical versus physical by Frans+Faase · · Score: 2, Interesting

    It seems that nowadays most people have a great problem distinguishing between the logical and the physical representation/storage of data. (Personally, I think that XML sucks from a logical point of view, because its semantics are rather weak and limited.) What we lack is tools for mapping logical representations to physical representations. I think that the main reason why we do not have such tools is that from a marketing perspective they would be very undesirable. (No serious commercial company likes to adhere to an open standard, as that would make it very easy for a customer to switch.)

  9. Re:why an xml database? -- There are many reasons by HunkyBrewster · · Score: 2, Insightful
    How is this at all a problem for the relational model?

    The relational model is a logical model and I challenge you to find any example of data that cannot be represented quite easily in the relational model. In your example, you have traded any notion of data integrity for what you assume will be faster data access. In fact, since the relational model makes no recommendations on how data is physically stored, this is not necessarily the case.

    How would XPath enforce your rules on how parts can relate to other parts? Why don't you just try flat files and grep?

    XML is a perfectly acceptable means of data representation. It does not however by any means provide a formal, coherent theory of data management.

    I really hope you were kidding about teaching anyone anything. You have a lot of learning to do.

  10. I don't see the point... by megajini · · Score: 2, Interesting

    I don't see exactly where I would need that kind of XML-Database... My applications usually have a big load of model-objects witch represent the structure of my data at "work-time". This is a very beautiful and elegant way of building applications.

    The real Problem (in terms of flexibility and time) is the massive work needed for fetching data from relational db (Everything is working in Java, using JDBC2 compatible Connection-Pools) and getting it into the data-model and the way back...

    So there are two choices: Using an object-oriented db or using some xml-db features. The latter would be great, having some nifty utilities to produce xml out of object's data and then send this xml directly to the db.

    But I definitely don't see why a XML-DB should automatically do everything (It's like saving Word-XML to a Database). I think the entire concept of fast querys and stability would be destroyed at once. Those XML-Documents are application-specific-data written in standard XML (XML is only a structured language, it's like writing a document in english and hungarian; of course english will be more "useful" for the world), so it's the business of the application to care about it.
    There is virtually no product out there being able to do that magic trick. The only cool software I know that does this (not with xml) is Lotus Notes and even there a load of additional information is required...

  11. Tamino by Software AG by munkinut · · Score: 3, Interesting

    When I worked on the Ananova project, we started off using Tamino by Software AG, which was great while we were in development, but we had trouble scaling from tens of stories per day to dealing with thousands of stories per day when we went live. Backing up, moving data between versions, and restoring onto higher spec boxes proved to be a nightmare, and we soon moved to Oracle instead. This was 3 years ago however, and the product may have matured since then. It would meet your requirements as stated certainly, and would be worth checking out. There are also Netbeans modules to aid development in Java.

    --
    re-invent wheels ... you never know