Slashdot Mirror


Do XML-based Databases Live Up to the Hype?

douthitb asks: "I have recently started work as a contractor with a company developing/improving an application for exchanging large amounts of data. The current solution exchanges data via XML, but the data itself is stored in a SQL Server database. There is a concern about the overhead involved with wrapping and unwrapping the XML to get the data in and out of a relational database. The proposed solution is to use Tamino, an XML-based database. Neither I nor any of the other developers have any experience with Tamino, but the desired result is to remove the bottleneck of converting the XML back and forth. Does anyone have experience using Tamino (or any other XML-based database)? What benefits and/or difficulties did you have in using an XML database, as opposed to its relational counterpart? How large of a learning curve should be expected with a product like this? Do XML databases really live up to the hype? A similar topic was discussed on Slashdot way back when, so I was hoping to get some more up-to-date feedback on the subject." "Sales reps from Software AG, the makers of Tamino, were brought in to discuss the benefits of their product with us. They, of course, presented Tamino as the end all, cure all database system (it will even clear your acne and make you popular with the girls!). The management of the company I'm contracting with were basically eating out of the sales reps' hands, without asking any of the "tough" questions about what the product can do; I was less convinced. Doing some initial searching on the Internet, I have had trouble finding much information about Tamino outside of the Software AG website."

13 of 105 comments (clear)

  1. I've worked with the Tamino kit... by (H)elix1 · · Score: 5, Insightful

    The thing the XML databases are nice for is if folks can't really lock down the schema. Often you have the case where you are mapping attributes to columns, which works fine in a relational database. Then things change over time.... Usually turning a nice relational design into a mess. Being able to use Xpath is great when you are searching for nodes too, once you get your arms around the syntax and assuming the stuff you are storing is XML. Some of the other bits in their toolkit were interesting.

    If things are fixed, there are a lot of other options out there for faster manipulation. XMLBeans (now an Apache project, formally BEA) is good stuff. Hibernate is lovely kit for mapping objects to a relational DB.

  2. Thumbs Down on XML Databases. by rossifer · · Score: 5, Insightful

    XML databases are possibly useful if you think about them as: an elaborate bucket for storing non-normalized data via an XML interface.

    If your current relational database schema is either 1) small flat files or 2) a few big tables with most/all of the data stored in "blob" columns: i.e. blobs, clobs, byte arrays, or big varchars. You might be a candidate for an XML database. I'd get two experienced DBA's to agree there was no realistic way to normalize the data, first, but that's me.

    If you actually need a database (as opposed to a few files, XML or flat) and your data can be normalized (it almost always can), then a relational database will tend to provide important advantages in three areas: unforseen query handling (OLAP, data mining, etc.), scalable performance, and availability of people with the skills to maintain it.

    As for the tradeoff of converting to XML, a number of the commercial RDBMS's allow you to obtain query results as XML. Though I don't know for certain how they handle inserts and updates, I suspect that there are XML equivalents for those as well. However, even if you have to completely roll your own conversion from SQL to XML, that cost is minimal against the cost of accessing the disk to fulfill the query, which both RDBMS and XMLDBMS will have to do.

    In general, after working with a commercial XML database and attempting to work with another XML database written in house, I'm categorically unimpressed. I think that a lot of engineers have discounted the relational programming model without first understanding it. In my opinion, people familiar with functional and object programming models would do well to learn about relational programming with an eye to determining the appropriate model for different kinds of problems.

    Regards,
    Ross

  3. XML DB? In my expert technical opinion.... by AntsInMyPants · · Score: 2, Insightful

    Ick. I suppose you could do it that way if you want to. Maybe its just me, but I like to keep data in relational DBs and keep the XML stuff for when I need to provide a way of sending information to outside people who will not have direct access to the DB. Most of the time the DB is being accessed, it is for internal applications which can access the tables via accessor methods. Now I suppose you could just write accessor methods against the XML DB..... Relational DBs for storage, XML as a transmission format. But the types of things I tend to build are quite small, so YMMV.

  4. Obvious by Pan+T.+Hose · · Score: 5, Insightful

    What benefits and/or difficulties did you have in using an XML database, as opposed to its relational counterpart?

    Benefits: XML is new and trendy.

    Difficulties: Ignorance of the decades of scientific research and engineering experience in the field of relational database management systems, relational algebra, set theory and predicate calculus; lack of real atomicity of transactions, lack of guaranteed consistency of data, lack of isolated operations, lack of real durability in the ACID sense, and in short, the lack of relational model; scalability, portability, SQL standard, access to your data after two years and after twenty years; to name just a few.

    How large of a learning curve should be expected with a product like this?

    Certainly smaller than a real, relational database.

    Do XML databases really live up to the hype?

    No.

    I believe that you are confusing an RDBMS with an object store. You should read this excellent comment posted almost three years ago by Frater 219. I understand that you may be inexperienced but you should not be ignorant. Literally decades of scientific research has been put into relational database management systems. Of course you are perfectly free to forget about computer science, jump on the bandwagon and choose whatever buzzword is trendy these days (yesterday it was OOP, today it is XML, tomorrow it will be .NET) but then you have to realise that you are gambling with your data that may be rendered inaccessible in few years (and that is if you are lucky and don't lose its consistency before) and those unfortunate enough to inherit the responsibility of maintenance of your system will curse you to no end wishing you were dead, and not without a reason. You can be fancy with your applications and front-ends, but RDBMSs are probably the most mature computer systems known to man. Ignoring it is foolish, to say the very least. You may say: but my application will always be the only front-end to that data and it will always be an optimal way to work with it! To which I say: Kids these days!

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
    1. Re:Obvious by swillden · · Score: 4, Insightful

      Excellent post, as is the Frater 219 post that you referenced.

      I think that both of you stopped short of pushing your arguments to their conclusions, though, so I'd like to add a bit.

      Frater 219 is exactly right that objects and tuples are fundamentally different, but he focused on both from a purely data-oriented point of view, which caused him to understate the issue a bit. A better understanding of the real goals of objects and tuples helps, IMO, to clarify why they're so different -- and the arguments can be extended to consider XML as well.

      Consider the goals behind relational database normalization. It's obvious that the primary goal is one of flexibility, ensuring that the data can be sliced and diced in any way imaginable, easily (which is not always the same as efficiently). A good relational design provides total "transparency", so that no matter what future demands are made, if the information is in the database it can be retrieved, just by asking the right, simple, question.

      Obviously, relational database technology was created because in the past there were systems that structured data in ways that limited the ways in which it could be retrieved and analyzed. RDBMSs solve that problem admirably well.

      So, if data transparency is such a wonderful thing, why does another computing tool, Object-Oriented Software structure, place so much emphasis on data abstraction and even data "hiding"? The answer is: because OO is about behavior, not data.

      The tenets of good OO design are all about partitioning the problem into compact components that interact in flexible ways. Objects have data, but only, really, to provide these fundamentally behavioral entities with the data elements they need in order to function "independently". This doesn't mean that object architectures can be defined without consideration of data, or that none of the ideas about data relationships which would be at home in a relational design have a place in object design, because they do, but the core ideas of object-oriented design are about entities that act in response to stimuli, allowing internal details (like what the supporting data looks consists of) to be hidden, and allowing subtitution of other entities that accomplish the same abstract goals, but may do it in different ways, using different data.

      This is the real fundamental "impedance mismatch" between OO design and relational design, IMO. Relational design focuses almost purely on data, with little attention paid to how the data will be used (well, in practice, that gets a lot of attention when it becomes clear that the nicely normalized model is simply too slow, but that's separate), and object design focuses mostly on behavior, paying attention to data only as needed to point out obviously bad factorings. This means that if you design a very nice object-oriented application and then try to simply persist those objects in relational tables, the result will be a very poor relational database. On the other hand, if you create a nice relational design and then try to create a class for each table, the result will be a painfully sub-optimal OO design.

      So, as Frater 219 pointed out, if you want a database, use an RDBMS, if you want a persistent object store, use an OODBMS. If you want both (as is common), well, you have to deal with the impedance mismatch, and it'll nevery be pretty, or very efficient. IMO, the best approach is to do the OO and relational designs more or less separately, then work out a solution to translate between them.

      So what about XML? Well, let's look at the goals behind XML.

      One problem with doing that is that there are at least two uses of XML. The first is as markup, in the sense that the document content is really not intended to be understood or processed by machines so much as people. The tags are only used to make machines ablee to grab hold and manipulate bits of it, without any understanding of the rest of the stuff. HTML is like this. An HTML document is ulti

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    2. Re:Obvious by flockofseagulls · · Score: 2, Insightful
      swillden wrote:
      Consider the goals behind relational database normalization. It's obvious that the primary goal is one of flexibility, ensuring that the data can be sliced and diced in any way imaginable, easily (which is not always the same as efficiently).
      No. Normalization eliminates duplicate information, and insures that non-key attributes are dependent on (correctly grouped with and referenced by) key fields. Normalization is not primarily about flexibility, it's primarily about data integrity. Data can be "sliced and diced in any way imaginable" in both normalized and unnormalized databases, but data integrity can only be guaranteed in normalized databases.
  5. don't waste your time with XML by Anonymous Coward · · Score: 4, Insightful

    XML is a file format. Repeat after me. A *text file format*.

    It is not a database, nor a data model, nor should it have anything to do with data storage and manipulation. You can store XML documents *in* a database (just like you can store dates, IP addresses, or JPG data). You can index and join on XPath components of an XML file. And you get XML documents *from* a database. But the database itself has little to do with XML. A well-designed XML database is just a well-designed relational database, and XML is just another data type.

    People are now reverse-engineering a hierarchic data model from XML text files. But the hierarchic data model is less general than the relational model, and in fact was used and rejected *40 years ago* as not being general or powerful enough. Funny how history repeats itself.

    Example: for simplicity, the relational model specifies that ALL data must be stored explicitly in the database. For instance if you have three rows of data, you can't assume any particular order unless the order can be calculated from the contents of each row. But XML nodes have implicit order, which means even the simplest XML document mixes data with metadata. Even a simple query requires dealing with both.

    I recommend anyone who has ever uttered the term "XML database" with straight face to go back and learn some basic relational principles. I think you will agree that all data models are either 1) flawed and incomplete; or 2) reduce to the relational model.

    In CS we don't have a lot of formal models to guide us, as in engineering or other science. Much of CS is entirely ad-hoc. However we do have a sound and complete model for data storage (relational model) and hardly anyone uses it. It boggles my mind. Do people not *want* their programs to work predictably?

  6. You're pre-optimizing... by Anonymous Coward · · Score: 1, Insightful

    The current solution exchanges data via XML, but the data itself is stored in a SQL Server database. There is a concern about the overhead involved with wrapping and unwrapping the XML to get the data in and out of a relational database.

    Premature optimization is the root of all evil.

    You say "you're concerned". That means you don't know.

    Why don't you find out?

    If you have a schema and some of your major transactions speced out, then do some performance testing and see where your bottleneck may be. For Gods sake, don't guess. Know. Find out.

    I will bet that you will find that one solution isn't overly performant than the other, and the benefits of a good RDBMS will make up for any delta in performance from a specialized DB.

  7. Don't bother with an XML "database" by Randolpho · · Score: 3, Insightful

    There are two possible reasons you're using XML to transport your data from one database to the other.

    The first is that you just heard XML is a great way to transport data, and decided to use it.

    The second is that you're using the XML for more than just transporting data from one database to another; you're using it at some point with your application.

    In either case, the bottom line is that XML is not good for you. If your data fits in a relational database, you should USE RELATIONAL MEANS TO ACCESS YOUR DATA. Don't use that nifty new XML reader to access your data. It's not nearly as fast or flexible as basic SQL; it's actually much more trouble than it's worth.

    If you're just transporting data from one relational database server to another, use a flat file, or better yet raw SQL dumps. If you're accessing the data with an application, use SQL or the underlying API.

    The only reason you *ever* need to use an XML database is when your data doesn't fit into a standard relational schema. In fact, if you try to fit standard data into an XML database, you're much more likely to end up with a ton of overhead, both in storage and speed.

    Fortunately, non-relational data is extremely rare. So rare, in fact, that I've yet to see a non-contrived-proof-of-concept "real life" example.

    --
    "Times have not become more violent. They have just become more televised."
    -Marilyn Manson
  8. Probably a stupid question, but... by ArtStone · · Score: 3, Insightful

    When dealing with XML, you need a DTD that defines the data contained in the XML expression in order to parse the string into meaningful data structures (right?)

    When an "XML database" is changed, is the data prior to the change left in its old XML format pointing to the original DTD, or does it require conversion of all existing data? How can the data be accessed while that conversion is going on?

    How would the method of implementing a schema change be communicated to other places which have already archived copies of an old XML data entity? DTD only defines current state information - it doesn't communicate "If XYZ = 1 in DTD.v1 then set XYZ2 to "A" and set new field ABC to "foo" for DTD.v2". Each iteration of change would become increasingly more complex unless the data is converted.

    This is not to say that the same issues don't exist with SQL or relational databases - but just abstracting the organization of the data doesn't mean that your problems are solved.

    Lately, I've been using mySQL - and the developers have some curious ideas about the "real world". Even the most trivial changes to the database schema require mySQL to copy and rebuild the entire table... like adding a new index or adding a new field at the end of the table. When tables start having millions of rows, that means this becomes a much less attractive product.

    The rationale for doing things this way had two reasons - first, it was the easiest way to implement schema changes. Second, "People should never be changing data schemas in a production environment".

    Oh, really? When did we regress to the idea that databases can go down overnight in order to back them up and to implement schema changes?

    --
    Final 2006 "Proof of Global Warming" US Hurricane Count -> 0
  9. half of one / six a dozen of another by Doc+Ruby · · Score: 2, Insightful

    The SQL DB doesn't store its data internally as "SQL". It's stored internally in some proprietary binary format. Which is optimized for the peculiar performance profile of that RDBMS. Relational DBs use different algorithms for working with their data, and the data is stored with either redundancy or precomputed values, depending on the unique algorithms. From which they derive their higher performance. SQL is just a high-level (more "human") language interface between programmers and the DB engine. Which was specified in such a way that it's not interchangeable across different DBs, partially because it does not specify a schema description which can be packaged with the data to be decoded with the context of that schema.

    XML is designed to package schema info with the data exchanged between DB instances. It's higher level, more verbose, and not optimized for data processing (except for the import/export). So you'd better be absolutely certain that your overall system performance is bottlenecked by your interchange processing performance, more than it will be bottlenecked by the "XML-native" DB processing XML data, which isn't optimized for performance.

    --

    --
    make install -not war

  10. Where's your bottleneck? by PizzaFace · · Score: 2, Insightful

    You say you are concerned about the overhead of wrapping and unwrapping XML, so you are considering using a database that keeps everything in XML all the time. I think you are trying to solve the wrong problem.

    Have you timed the job of wrapping/unwrapping XML? My guess is that on modern hardware, that task is trivial. Bandwidth is a more common bottleneck for XML data transfers, and that problem is usually mitigated by compressing the XML before transfer. But I never heard anyone complain about a CPU taking too much time to extract the data from XML.

    If your application queries the data selectively, you will probably find that the difference in query-processing time, between a traditional SQL database and a native XML database, more then makes up for any difference in format-conversion time.

    Let your database use its own, efficient, optimized internal data formats. XML is much more suitable for data transfer than for data manipulation.

  11. Re:Adding MORE XML Won't Fix It by Anonymous Coward · · Score: 2, Insightful

    Because XML and relational data essentially represent two different data models. If a DB was designed to support XML from scratch, it should be able to perform much better than an existing relational solution. Rewriting XML to relations is a slow process. Rewriting XQuery to SQL is a slow process (mind you, anything you could possibly gain by optimizing XQuery is lost once you hit the SQL layer). Additionally, with a hybrid solution, now you need people that are well-versed in both SQL and XML. Sure, that may not sound like much, but it's hard enough to find someone that really knows SQL, let alone XMLSchema, XPath, XQuery, etc.