Do XML-based Databases Live Up to the Hype?

← Back to Stories (view on slashdot.org)

Do XML-based Databases Live Up to the Hype?

Posted by Cliff on Saturday March 12, 2005 @09:21AM from the successful-implementations-in-practice dept.

douthitb asks: "I have recently started work as a contractor with a company developing/improving an application for exchanging large amounts of data. The current solution exchanges data via XML, but the data itself is stored in a SQL Server database. There is a concern about the overhead involved with wrapping and unwrapping the XML to get the data in and out of a relational database. The proposed solution is to use Tamino, an XML-based database. Neither I nor any of the other developers have any experience with Tamino, but the desired result is to remove the bottleneck of converting the XML back and forth. Does anyone have experience using Tamino (or any other XML-based database)? What benefits and/or difficulties did you have in using an XML database, as opposed to its relational counterpart? How large of a learning curve should be expected with a product like this? Do XML databases really live up to the hype? A similar topic was discussed on Slashdot way back when, so I was hoping to get some more up-to-date feedback on the subject." "Sales reps from Software AG, the makers of Tamino, were brought in to discuss the benefits of their product with us. They, of course, presented Tamino as the end all, cure all database system (it will even clear your acne and make you popular with the girls!). The management of the company I'm contracting with were basically eating out of the sales reps' hands, without asking any of the "tough" questions about what the product can do; I was less convinced. Doing some initial searching on the Internet, I have had trouble finding much information about Tamino outside of the Software AG website."

15 of 105 comments (clear)

I've worked with the Tamino kit... by (H)elix1 · 2005-03-12 09:34 · Score: 5, Insightful

The thing the XML databases are nice for is if folks can't really lock down the schema. Often you have the case where you are mapping attributes to columns, which works fine in a relational database. Then things change over time.... Usually turning a nice relational design into a mess. Being able to use Xpath is great when you are searching for nodes too, once you get your arms around the syntax and assuming the stuff you are storing is XML. Some of the other bits in their toolkit were interesting.

If things are fixed, there are a lot of other options out there for faster manipulation. XMLBeans (now an Apache project, formally BEA) is good stuff. Hibernate is lovely kit for mapping objects to a relational DB.

--
+++ UGUCAUCGUAUUUCU
yeah, i support a tamino server at work. by Anonymous Coward · 2005-03-12 09:37 · Score: 4, Informative

it runs in tomcat or similar. it's really crashy. we can't wait to get rid of it.
Berkeley DB XML by SchnauzerGuy · 2005-03-12 09:41 · Score: 4, Informative

I haven't tried it, but the regular Berkeley DB is highly regarded, and both are open source and (depending on your situation) free, so it is definitely worth a look.

Berkeley DB XML 2.0
Oracle and XSQL by Rich · 2005-03-12 09:51 · Score: 4, Interesting

Oracle and XSQL/XSLT works fine for the database we use at work. The overhead of wrapping and unwrapping the data doesn't seem to be any problem.
The devil's in the details by LeninZhiv · 2005-03-12 10:04 · Score: 5, Informative

The first question to answer is, why is this data in a relational database to begin with? More to the point, is this application the only one that accesses the data, or are there other, non-XML centric databases that make use of the same data? The relational model gives you flexibility that XML does not for dealing with the data in arbitrary and unforeen ways (XML can be quite flexible with XSLT, but a programmer must still intervene for each and every new way you want to use tha data, with a much bigger performance hit). The normalised relational database stores your data in a mathematically sound way that puts the priority on integrity of data independently from its past, present or future structure; XML preserves data structure based on its present use while leaving the door open to moving from that to any arbitrary future use... which of the two ideals is more attractive depends on the nature of the data and how many applications need to use it.

Relational databases with good XML support (my background is DB2 but most major databases should be able to do this) reach a good compromise by giving you acces to normalised relational data as XML (which you can compliment with XSLT it if that's what needs to be done), while preserving it internally reduced to its bare essence as data (according to relational calculus' idea of what constitutes the bare essence of data, anyway.)

On the other hand, for single-app applications, or data that is more file oriented than datum-oriented (databases of XML documents where the document rarely or never needs to be abstracted from the data it contains), XML databases offer simplicity and efficiency by removing the need to work out a relational data model. Why break up your structured documents into a DBA's hand-tuned data model when 99.9% of your queries will just build these data sets back into XML documents (even when DB2, Oracle, and I assume SQL Server can automate this last task)? An XML database can give you more flexibility in querying than an all-XSLT solution, while saving a lot of unnecessary work over an SQL-to-XML solution for what is really an XML-to-XML application.

As I see it, that's the big picture. The actual decision has to come down to your applications. An XML database will be less efficient for non-XML applictions, plain and simple. Querying XML cannot be made as fast as querying relational tables, meaning extra overhead for non-XML apps. But *your* application encurs overhead in turning relational tables into XML (probably via the RDBMS's internal facility), and in transforming it if necessary. The question is therefore: who makes more queries on the database, this application or other non-XML ones? Who will make more queries in 5 years?

If you answer 'others' to either question, use a relational database--their XML support is decent now and will only get better, and they're far more popular in business which is an important CYA factor. If you answer 'your app' or 'other XML-based apps' for both questions, it's time to check out what XML databases have to offer right now. I expect other posts to comment on the current state of the art right now, but you can expect things to only get better as industry support for XQuery et al. improves--but don't expect them to *ever* pass up the relational databases in terms of raw performance, it's impossible. But as the evolution from Assembler to C to Java has shown in programming languages, the day may come when raw performance takes a back seat to other concerns.
Thumbs Down on XML Databases. by rossifer · 2005-03-12 10:46 · Score: 5, Insightful

XML databases are possibly useful if you think about them as: an elaborate bucket for storing non-normalized data via an XML interface.

If your current relational database schema is either 1) small flat files or 2) a few big tables with most/all of the data stored in "blob" columns: i.e. blobs, clobs, byte arrays, or big varchars. You might be a candidate for an XML database. I'd get two experienced DBA's to agree there was no realistic way to normalize the data, first, but that's me.

If you actually need a database (as opposed to a few files, XML or flat) and your data can be normalized (it almost always can), then a relational database will tend to provide important advantages in three areas: unforseen query handling (OLAP, data mining, etc.), scalable performance, and availability of people with the skills to maintain it.

As for the tradeoff of converting to XML, a number of the commercial RDBMS's allow you to obtain query results as XML. Though I don't know for certain how they handle inserts and updates, I suspect that there are XML equivalents for those as well. However, even if you have to completely roll your own conversion from SQL to XML, that cost is minimal against the cost of accessing the disk to fulfill the query, which both RDBMS and XMLDBMS will have to do.

In general, after working with a commercial XML database and attempting to work with another XML database written in house, I'm categorically unimpressed. I think that a lot of engineers have discounted the relational programming model without first understanding it. In my opinion, people familiar with functional and object programming models would do well to learn about relational programming with an eye to determining the appropriate model for different kinds of problems.

Regards,
Ross
I agree by Tangurena · 2005-03-12 10:54 · Score: 4, Interesting

Having worked with a business partner who claimed total XMLosity in their database, I had to rework the parser almost every time we got a data feed from them. Their idea of the data model changed from day to day. Even when we sent nailed down, will never change specs for the structure. They really didn't like the idea that I tossed the raw XML into a memo field every time my components received a message, so when there were nasty fingerpointing meetings, I could drum up a simple SELECT statement and show everyone what was changing each and every week.
XML is kinda nice for some things, and really rotten for some things. Please do yourself a favor and sit down and try to decide what problem you are trying to solve. XML really stinks when it comes to sets: something that SQL based databses excel at.
I think that with the XML fetish we have these days, that we are reverting to the preSQL days of CODASYL or IMS (pre 1980s for those of you young'uns).
Obvious by Pan+T.+Hose · 2005-03-12 11:03 · Score: 5, Insightful

What benefits and/or difficulties did you have in using an XML database, as opposed to its relational counterpart?

Benefits: XML is new and trendy.

Difficulties: Ignorance of the decades of scientific research and engineering experience in the field of relational database management systems, relational algebra, set theory and predicate calculus; lack of real atomicity of transactions, lack of guaranteed consistency of data, lack of isolated operations, lack of real durability in the ACID sense, and in short, the lack of relational model; scalability, portability, SQL standard, access to your data after two years and after twenty years; to name just a few.

How large of a learning curve should be expected with a product like this?

Certainly smaller than a real, relational database.

Do XML databases really live up to the hype?

No.

I believe that you are confusing an RDBMS with an object store. You should read this excellent comment posted almost three years ago by Frater 219. I understand that you may be inexperienced but you should not be ignorant. Literally decades of scientific research has been put into relational database management systems. Of course you are perfectly free to forget about computer science, jump on the bandwagon and choose whatever buzzword is trendy these days (yesterday it was OOP, today it is XML, tomorrow it will be .NET) but then you have to realise that you are gambling with your data that may be rendered inaccessible in few years (and that is if you are lucky and don't lose its consistency before) and those unfortunate enough to inherit the responsibility of maintenance of your system will curse you to no end wishing you were dead, and not without a reason. You can be fancy with your applications and front-ends, but RDBMSs are probably the most mature computer systems known to man. Ignoring it is foolish, to say the very least. You may say: but my application will always be the only front-end to that data and it will always be an optimal way to work with it! To which I say: Kids these days!

--
Sincerely,
Pan Tarhei Hosé, PhD.
"Homo sum et cogito ergo odi profanum vulgus et libido."
1. Re:Obvious by swillden · 2005-03-12 18:56 · Score: 4, Insightful
  
  Excellent post, as is the Frater 219 post that you referenced.
  I think that both of you stopped short of pushing your arguments to their conclusions, though, so I'd like to add a bit.
  Frater 219 is exactly right that objects and tuples are fundamentally different, but he focused on both from a purely data-oriented point of view, which caused him to understate the issue a bit. A better understanding of the real goals of objects and tuples helps, IMO, to clarify why they're so different -- and the arguments can be extended to consider XML as well.
  Consider the goals behind relational database normalization. It's obvious that the primary goal is one of flexibility, ensuring that the data can be sliced and diced in any way imaginable, easily (which is not always the same as efficiently). A good relational design provides total "transparency", so that no matter what future demands are made, if the information is in the database it can be retrieved, just by asking the right, simple, question.
  Obviously, relational database technology was created because in the past there were systems that structured data in ways that limited the ways in which it could be retrieved and analyzed. RDBMSs solve that problem admirably well.
  So, if data transparency is such a wonderful thing, why does another computing tool, Object-Oriented Software structure, place so much emphasis on data abstraction and even data "hiding"? The answer is: because OO is about behavior, not data.
  The tenets of good OO design are all about partitioning the problem into compact components that interact in flexible ways. Objects have data, but only, really, to provide these fundamentally behavioral entities with the data elements they need in order to function "independently". This doesn't mean that object architectures can be defined without consideration of data, or that none of the ideas about data relationships which would be at home in a relational design have a place in object design, because they do, but the core ideas of object-oriented design are about entities that act in response to stimuli, allowing internal details (like what the supporting data looks consists of) to be hidden, and allowing subtitution of other entities that accomplish the same abstract goals, but may do it in different ways, using different data.
  This is the real fundamental "impedance mismatch" between OO design and relational design, IMO. Relational design focuses almost purely on data, with little attention paid to how the data will be used (well, in practice, that gets a lot of attention when it becomes clear that the nicely normalized model is simply too slow, but that's separate), and object design focuses mostly on behavior, paying attention to data only as needed to point out obviously bad factorings. This means that if you design a very nice object-oriented application and then try to simply persist those objects in relational tables, the result will be a very poor relational database. On the other hand, if you create a nice relational design and then try to create a class for each table, the result will be a painfully sub-optimal OO design.
  So, as Frater 219 pointed out, if you want a database, use an RDBMS, if you want a persistent object store, use an OODBMS. If you want both (as is common), well, you have to deal with the impedance mismatch, and it'll nevery be pretty, or very efficient. IMO, the best approach is to do the OO and relational designs more or less separately, then work out a solution to translate between them.
  So what about XML? Well, let's look at the goals behind XML.
  One problem with doing that is that there are at least two uses of XML. The first is as markup, in the sense that the document content is really not intended to be understood or processed by machines so much as people. The tags are only used to make machines ablee to grab hold and manipulate bits of it, without any understanding of the rest of the stuff. HTML is like this. An HTML document is ulti
  
  --
  Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
I've used Tamino and here's my story by snowtigger · 2005-03-12 11:35 · Score: 5, Interesting

A few years back, I was brought in to a small company to build their new software on top of the Tamino DB. XML was "the way of the future" and we were asked to use it as much as we could. Software AG promised that everything would be easy to program and that their software functioned perfectly. Software AG's sales rep used the fact that Tamino was used in production by (insert major national company here) as a major selling argument. I later found out from a friend working there that they had only evaluated Tamino, found it useless, and never used it in production.

Well, we did finish the software on time, but it was a complete nightmare. Software AG hardly gave us any straight answers (even though they charged big $ for customer support).

Tamino itself was missing a lot of features and seemed designed as a system for storing documents, totally lacking traditional database qualities (uniqueness, reliability, scalability, ...) We couldn't even get a reliable unique key from the database. The id we did get "could change" if we were to backup and restore the database. Tamino also scaled very badly with simple queries taking up to a minute on the fastest PC we could buy.

Needless to say, the software was thrown away and rebuilt with a reliable SQL database.

I would strongly discourage anyone from bilding an application on top of an xml database, especially Tamino. If you really want to build your application on top of an xml database, I would seriously ask myself why and what difference it would make. Also, if you really need an xml interface, choose an ordinary sql db that has a xml plugin.
don't waste your time with XML by Anonymous Coward · 2005-03-12 11:37 · Score: 4, Insightful

XML is a file format. Repeat after me. A *text file format*.

It is not a database, nor a data model, nor should it have anything to do with data storage and manipulation. You can store XML documents *in* a database (just like you can store dates, IP addresses, or JPG data). You can index and join on XPath components of an XML file. And you get XML documents *from* a database. But the database itself has little to do with XML. A well-designed XML database is just a well-designed relational database, and XML is just another data type.

People are now reverse-engineering a hierarchic data model from XML text files. But the hierarchic data model is less general than the relational model, and in fact was used and rejected *40 years ago* as not being general or powerful enough. Funny how history repeats itself.

Example: for simplicity, the relational model specifies that ALL data must be stored explicitly in the database. For instance if you have three rows of data, you can't assume any particular order unless the order can be calculated from the contents of each row. But XML nodes have implicit order, which means even the simplest XML document mixes data with metadata. Even a simple query requires dealing with both.

I recommend anyone who has ever uttered the term "XML database" with straight face to go back and learn some basic relational principles. I think you will agree that all data models are either 1) flawed and incomplete; or 2) reduce to the relational model.

In CS we don't have a lot of formal models to guide us, as in engineering or other science. Much of CS is entirely ad-hoc. However we do have a sound and complete model for data storage (relational model) and hardly anyone uses it. It boggles my mind. Do people not *want* their programs to work predictably?
The Problem by SmurfButcher+Bob · 2005-03-12 11:50 · Score: 4, Funny

... is that XML is only half of the solution.

For an XML database to really shine, it needs to be integrated with with a TCP/IP filesystem. Once the physical data is stored using TCP/IP (as opposed to FAT or NTFS), the XML database really begins to take off because the data is already in a network format.

I swear to god there was a Dilbert on this...

--

help me i've cloned myself and can't remember which one I am
1. Re:The Problem by SmurfButcher+Bob · 2005-03-12 14:44 · Score: 5, Funny
  
  Well, you really need to have a TCP/IP based File I/O for any performance with an XML database. Although technically, you would probably get better gains by switching to an HTML database. The HTML database would be better, anyway, because it'll run in any web browser, and it doesn't exactly care what filesystem is in use. That, and all these "data integrity" whiners can then use any CSS validator to check the validity of the data. That way, your HTML Programmers can write on whatever platform they wish, enabling a new paradigm for a pan-dimensional database structure to coexist and re-leverage new legacies before they are implemented, in a cost-efficient and transcendentally transparent manner.
  
  I found that Dilbert, btw! It was an E-Mail based database! Now if you'll please excuse me, I'll be over here, ducking under a table.
  
  --
  
  help me i've cloned myself and can't remember which one I am
Proverb by Anonymous Coward · 2005-03-12 15:25 · Score: 5, Funny

I once had a problem. I thought: "Oh, I know: I'll just use XML!" Now I had two problems.
XML,SQL,XML Query, Databases by Ankh · 2005-03-12 16:22 · Score: 4, Informative

There seem to be a lot of confused comments on this, but hey, it's slashdot :-)

If you mostly deal with the sort of data for which relational databases are generally optimised, you'll probably not be very interested in XML solutions, as they are solving problems you don't have.

If you routinely get questions like "how often is part 1976 mentioned in the same repair procedure as part 2001?" or "which of our 150,000 documents have chapters containing five or more subsections any of which does not yet have a summary?" then the XML approach becomes more interesting.

In my book on XML databases (1999 so I don't recommend going out and getting a copy today) I talked about using a hybrid system, with metadata picked out of XML whenever a changed version is stored (e.g. you might use a CVS commit script) and stored in a relational database.

With a relational database you have a lot of flexibility to change your queries but the data representation has to be static. Even changing the type of a column can be difficult in an RDBMS.

Queries may be a little harder with the XML system, but the data storage is more flexible and you have native knowledge of sequence and hierarchy that are traditionally absent using SQL.

More recent versions of SQL have added some XML support, understanding the different sorts of queries that people typically run against such very different sors of data. There has been a lot of research over the past 30 or 40 years (hierarchical databases predate the relational model) on hierarchy, sequence and thesort of irregularity that RDBMS people call semistructured data and the rest of us call XML :-)

XML Query is a query language designed to run over both relational and XML-native data sources (and others, for that matter) and to be optimized very efficiently, so that people like IBM (makers of DB2), Oracle, BEA, Software AG and othes can have efficient implementations. There's also standards work on how to embed XML Query expressions in SQL.

The public XML Query Web page is at www.w3.org/XML/Query and lists quite a large number of implementations. Software AG have participated in the XML Query development.

You might like to look at the XML Query use case document and see how close the examples map to your own situation.

Disclaimer: I work for the W3C, participate in the XML Query WOrking Group, and maintain the XML Query Web page. But it sounded like it's the sort of information you were looking for.

I can't comment on the quality of Tamino, as I have not used it, but I will also note that if you stick to openly-defined standard query languages wherever you can, there's a good chance you could move to a different implementation if you needed to with relatively little cost. This is similar to SQL, of course.

There was lots of hype around XML, but that doesn't mean it's all false, nor that it was all true. XML is a good way to interchange structured, hierarchical imformation, but it probably won't cure acne :-)

Liam

[slashdot::Ankh -- Liam Quin, W3CXML Activity Lead]

--
Live barefoot!
free engravings/woodcuts