With XML, is the Time Right for Hierarchical DBs?

← Back to Stories (view on slashdot.org)

With XML, is the Time Right for Hierarchical DBs?

Posted by Cliff on Sunday November 18, 2001 @07:40AM from the digital-evolution-of-data-storage dept.

DullTrev asks: "The hierarchical database model existed before the far more familiar relational model. Hierarchical databases were blown away by relational versions because it was difficult to model a many-to-many relationship - the very basis of the hierarchical model is that each child element has only one parent element. However, we now live in a web world that demands quick access to a variety of data on a variety of platforms. XML is being used to facilitate this, and XML has, of course, a hierarchical structure." Do you think a hierarchical database would really be a better answer for storing XML data over the existing relational counterparts?

"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.

Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"

4 of 276 comments (clear)

Min score:

Reason:

Sort:

I'm currently working on a paper about this... by Carnage4Life · 2001-11-18 08:16 · Score: 3, Interesting

Hi,
I wrote a paper on native XML databases and SQL databases that support XML that appeared on Slashdot a little while ago. While doing research for that paper I asked myself the same question, whether instead of coming up with hybrid methods to store relational and hierarchical data we should store XML in already existing hierarchical databases. Unfortunately things are not so clear cut.

First of all, a lot of data out there is relational and people aren't ready or willing to transition all that data to XML based storage so mixing of relational and XML data will probably be with us for a while. The biggest problem with object oriented databases is that they didn't understand this fundamental issue but it seems that with XMKL databases the vendors understand that hybrid data will be with us for quite a while which is why Tamino supports importing data from relational sources and even ships with a SQL engine.

Secondly, XML documents have a lot of metadata beyond the hierarchical parent-child relationships such as processing instructions, comments and entities which are require more intelligence in the support from the database than just storing parent-child relationships.

Finally all the major [commercial] relational database vendors have included some sort of native suppport for XML including XML types and there is a an ANSI standard in the works for combining XML and SQL. From what I've seen, none of the hierarchical databases plan to support XML as much as the relational databases have or plan to.

Now if you were simply asking whether a native XML database can be built on top of a hierarchical database then I believe the answer is yes. Then again native XML databases can and have been built on object oriented databases and relational databses so it makes sense that they can be implemented in a database system that is more suited to handling hierarchical data.
XML not meant as a replacement for RDBMSs by bwt · 2001-11-18 08:24 · Score: 4, Interesting

Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"

I believe that RDBMS's should add functionality to read/write XML, especially as the XML Schema recommendations is basically done.

The idea that XML should be the permanent storage format is a bad one. There is a lot of power in a normalized data model -- it enforces data integrity , while eliminating data fragmentation automatically and it minimizes transaction resources.

Consider XML representations for different entities that all share some kind of child entity. For example: people, businesses, and schools all share addresses. In XML, you want the addresses to appear in the description of the individual object. Does that mean you want to store the addresses separately that way? Absolutely not, because then when you enforce constraints or ask questions about addresses, your data is fragmented in three places. For that matter, how do you know all the entities that might use addresses? In an RDBMS, you can inspect all the foreign keys to the address entitity. What's the XML analog?
Some thoughts... by Coventry · 2001-11-18 09:10 · Score: 5, Interesting
I have been struggling with these issues for awhile now, for various reasons. Why? Because I like Zope, but am, like most developers, more comfortable with relational data structures.

Zope uses an object database known as the ZODB. Some forms of many-to-many relationsships and such can be handled via the use of selection and multi-selection properties, which are designed to distinguish between a selected element and the list of available elements. The list of elements can be derived from a property on the current object, a property on a parent object, or be created via a method call - allowing for non-traditional (for OODBMS) cross-linking of objects. Of course, since this sort of thing is a workaround, no true relational links are created... 'Soft Relations' may be ok for MySQL, but in big application development, relationships must be enforced! Thus, the big-boys in RDBMS all enforce foreign keys (mysql does not)...

Of course, I've found that by careful creation of object heirarcies, very complex applications can be created on top of a OODBMS that are in fact more robust, in some ways, then the relational couterparts. The Bigest hurdle (Short-term) I see to OODBMS (including ones based upon XML [the ZODB can export objects as XML but they are stored differently internally]) is the lack of a true query and data manipulation language - like SQL. Sure, OQL exists, and is even technically a standard, but it A) sucks and B) is geared towards large java applications with huge amounts of active objects, not general purpose OODB queries. Thus, without such language, OODBMS are all disimilar in how one queries and creates/updates data, and in many cases, the only interface is a truely procedural one! Thus OODBMS are forced to use proprietary tools, and are locked into one system - not to mention speed of development (something normally associated with OO development and OODBMS in general) is hindered by the excessive amount of procedural calls one needs to simply query thier data...

Recently, an add-on to Zope addressed some of these issues. Called 'ZOQL' - it uses a SQL like syntax and allows for very discrete querying of the ZODB (something one had to do programatically using the 'ZCatalog' before) with all of the familar aggregate and comparison operators SQL users love... Of course, this _still_ doesn't address the issue of soft-relationships:

I think the bigest hurdle to OODBMS in the long term (tools like ZOQL are interfaces to existing systems, thus can be mplemented easily) is the lack of handling relationships. It seems that most RDBMS force a developer to think in Relational terms about the data, and most OODBMS force you to think in terms of objects... Most problems can be mapped to either of these domains, but you are forcing the data-model-type onto the problem. What is needed is a hybrid system, an 'Object-Relational' DBMS. This is to say that OODBMS system makers desist with the traditional OO idea that relations are of the following types:
- Object A is a Object B
- Object A Has a/many Object B(s)
What RDBMS systems excelled in (and thus fell into pupular use for) was ease of management and allowing common data to be moved and grouped. A 'Look-up Table' - for instance, which simply holds a list of common data (an enumerated list) and can be centrally maintained is a Boon in the RDBMS world. For example, you have a lookup table of car manufactureres, and one of them changes its name... Instead of updating all N Cars that are made by the manufacturer, you simply update the single record in lookup table. Since each car would have somehting akin to a 'Manuafactuer_ID' column linking it to the lookup table, the Cars belonging to the manufacturer are all taken care of.

How does one do this in a hierarchal system? Well, the easy answer would be that each manufacturer object contains all the cars that manufacturer makes. Simple, right? WRONG. Why?

Because each car also has a body-type (compact, sedan, SUV, truck, van, etc...) - which in a relational database would simple by another lookup table, but in an OODBMS poses data management issues. Do we put body-type higher then manufacturer? If so, then we have to maintain the list of manufacturers for each body type, causing headaches. Or do we put body-type below manufacturer, causing us to need to maintain a seperate list of body types for each manufacturer - these lists of course need to match exactly if we ever plan on being able to search or do reports based upon all cars of a specific body type.
Sadly enough, this sort of seperate-enumeration-relationship isn't implemented (well) in any OODBMS I've found.
Take the ZOBD for example, its selection and multiselection lists Try to handle this situation, but fail because relational integrety is not maintained! That is to say, behind the scenes it's not a true reference to a value in the enumerated list, but just a text entry representing a value in the list. If the value in the list changes, the selection-property does not update, leaving you with the equivilent of MySQL's bastard-children, the orphaned records.
This sort of soft-relationship handling is Ugly and BAD for maintainaility, but OODBMS users are faced with two ugly choices each time they map such a relationship: Do I store this as a plain-text property and just update N records each time this changes, or do I map it into the hierarchy and deal with the headaches incurred by doing so...?

I don't think I've answered the question, but hopefully I've at least shed some light on the subject for members of both the OODBMS camps and RDBMS camps... Now if only a useful ORDBMS were to come along...

(Note that PostgreSQL and some other RDBMS actualy can be used in a semi-OO manner, but this is usually reserved for inheritable structures of data to be used for specific extensions to the data model - thus the SUV table inherits from the Cars table and adds some columns - but all other relationships SUV has will still be relational)
--
man is machine
Not so simple by drodver · 2001-11-18 15:21 · Score: 3, Interesting

The problem with your scheme is that as the list of information grows the amount of time to do the access grows linearly. With systems where there are thousands of concurrent users all doing large numbers of data accessing of the information your solution would grind the system to a halt. While technically possible to do this with a relational database it is impractical. Taking into account that 90% of the databases use is doing reads and the access of such a system would be even worse. Also take into account that the system has to be able to handle some data which does not change and some data that has to be grouped based apon a visit date. Your system would have no efficient way to group certain elements by date.