With XML, is the Time Right for Hierarchical DBs?
"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.
Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"
Rather than discard the advantages of relational and object databases, should we instead ask how XML can be used to represent those kinds of relationships?
What do you mean they cut the power? How can they cut the power, man? They're animals!
1337ness for sale.
Just because XML is a hierarchical markup language does not mean that it can only be used for hierarchical things. Perhaps you should look at RDF which can use many to many mappings through resources and groupings (sequences, bags, and alternates). (A resource in one grouping can refer to another grouping i.e. many to many.)
marotti.com
I don't agree - look at LDAP. The benefits for LDAP'fying services is clear. With a hierarchial database, specific queries can target a subset of the entire database, without the over head of having seperate tables and/or database for varying information. For keeping track of 'real world' objects: People, Printers, IPS, etc.. the advantage is that the system used to organize them is similar to the actual grouping going on. Managers have employees 'underneath' them. Its basically taking the organizational concepts used for filesystems and applying them to database design. I havent done any performance testing LDAP vs. SQL for similar schema setup, but from what I understand one of the other benefits is fast lookups. Sounds like a good project! To implement databases in both LDAP and SQL and measure the performance of similar queries!! :)
-- NeTMoNGeR
There is lots said on this over at Database Debunkings
Special Relativity: The person in the other queue thinks yours is moving faster.
An afterthought, databases are about storage and speed of insertion/extraction. I honestly don't believe that fitting the database to the data structure is worth the cost or the trouble, just yet.
I think these discussions come up part of the time because people want something new and sexy. In this case OO DB's, which 'XML DB's' are a variant of, may have benefits in specific and limited cases. But I have not been impressed.
Take your classic orders table. Part NO, Custoemr NO, etc. etc. The number of apps with only one parent is tiny, the flexibilty limited, and the whole metadata scanning business awkaward.
For anyone doing and serious larger scale database work some of this stuff is a joke. The idea these vendors have is that we'll be storing XML data in these DB's, ignoring that even for a simple phone directory, the XML data probably takes up a significantly greater amount of space than a simple relational DB would require
And this ignores the significant amount of time and energy invested in toolsets and models for the existing setup. Sure, someone might come out with a chip that runs 2x as fast as an intel at the same price, but unless it is intel compatible how many people would buy it or care?
Maybe its just me, but the goal today is integration and having a special database for XML and special database for this and that just because its faster for this particular problem creates such a level of complexity, which prevents accomplishing even of the most trivial tasks.
Still, XML is only a way how to describe data, that might be often in their structure relational. Why do not store data in their native form and create XML documents out of database on fly by filters?
This question of hierarchical databases is just plain trolling in my eyes.
If programs would be read like poetry, most programmers would be Vogons.
Relational databases didn't come to dominate the database market because they pushed aside equally valid alternatives, they dominate the market because relational databases implement relational calculus. Indeed, that's the very touchstone that distinguishes relational databases from something like DBM and its many descendants.
And *that* is important because it assures the desiger and user that every possible operation is well-defined and (hopefully) correctly implemented. The exact syntax for a "join" may differ, and a specific implementation may be flawed, but everyone agrees to a common baseline.
For hierarchial databases to really take off, they need to have an equally strong mathematical underpinning. For now, AFAIK, there is none other than that you get when you map a hierarchial database into relational tables and use exactly those relational properties. That's a good start, but if you're only using the properties in relational databases, why not stick with them?
As for XML, that's completely irrelevant. It's a good format for transferring data, but that's about it. You can store hierarchial data in an XML file, but you can also use it to store purely relational data or completely unstructured data (in some CDATA block).
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Hi,
I wrote a paper on native XML databases and SQL databases that support XML that appeared on Slashdot a little while ago. While doing research for that paper I asked myself the same question, whether instead of coming up with hybrid methods to store relational and hierarchical data we should store XML in already existing hierarchical databases. Unfortunately things are not so clear cut.
First of all, a lot of data out there is relational and people aren't ready or willing to transition all that data to XML based storage so mixing of relational and XML data will probably be with us for a while. The biggest problem with object oriented databases is that they didn't understand this fundamental issue but it seems that with XMKL databases the vendors understand that hybrid data will be with us for quite a while which is why Tamino supports importing data from relational sources and even ships with a SQL engine.
Secondly, XML documents have a lot of metadata beyond the hierarchical parent-child relationships such as processing instructions, comments and entities which are require more intelligence in the support from the database than just storing parent-child relationships.
Finally all the major [commercial] relational database vendors have included some sort of native suppport for XML including XML types and there is a an ANSI standard in the works for combining XML and SQL. From what I've seen, none of the hierarchical databases plan to support XML as much as the relational databases have or plan to.
Now if you were simply asking whether a native XML database can be built on top of a hierarchical database then I believe the answer is yes. Then again native XML databases can and have been built on object oriented databases and relational databses so it makes sense that they can be implemented in a database system that is more suited to handling hierarchical data.
Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"
I believe that RDBMS's should add functionality to read/write XML, especially as the XML Schema recommendations is basically done.
The idea that XML should be the permanent storage format is a bad one. There is a lot of power in a normalized data model -- it enforces data integrity , while eliminating data fragmentation automatically and it minimizes transaction resources.
Consider XML representations for different entities that all share some kind of child entity. For example: people, businesses, and schools all share addresses. In XML, you want the addresses to appear in the description of the individual object. Does that mean you want to store the addresses separately that way? Absolutely not, because then when you enforce constraints or ask questions about addresses, your data is fragmented in three places. For that matter, how do you know all the entities that might use addresses? In an RDBMS, you can inspect all the foreign keys to the address entitity. What's the XML analog?
XML is not a magic bullet. Relational database won out over the Hierarchical model for a lot of reasons. For instance, there exists a number of integrity constraints with the Hierarchical model such as
1) No record occurrences except root records can exist without being related to a parent record occurrence. This means that
a) a child record cannot be inserted unless it is linked to a parent record.
b) a child record may be deleted independently of its parent however, deletion of the parent record automatically results in the deletion of all its child and descendent records.
c) the above rules do not apply to virtual child records and virtual parent records.
2) If a child record has 2 or more parent records from the SAME record type, the child record must be duplicated once under each parent record.
3) A child record having 2 or more parent records of DIFFERENT record types can do so only by having at most 1 real parent, with all the others represented as virtual parents. IMS limites the number of virtual parents to 1.
In addition to these flaws, relational databases have had over a decade to become mature, optimized, and enterprise scalable. Harddrive partitioning for such databases as oracle work out perfectly with the cylinder, sector, and tracks of a hard drive to allow for the fastest read/write times as can be possible.
Too often people see that XML "can" do so many things and decides that it should be the way things are done but XML is NOT a magic bullet and just because it has the potential to do something does not make it the best methodology for doing so.
Zope uses an object database known as the ZODB. Some forms of many-to-many relationsships and such can be handled via the use of selection and multi-selection properties, which are designed to distinguish between a selected element and the list of available elements. The list of elements can be derived from a property on the current object, a property on a parent object, or be created via a method call - allowing for non-traditional (for OODBMS) cross-linking of objects. Of course, since this sort of thing is a workaround, no true relational links are created... 'Soft Relations' may be ok for MySQL, but in big application development, relationships must be enforced! Thus, the big-boys in RDBMS all enforce foreign keys (mysql does not)...
Of course, I've found that by careful creation of object heirarcies, very complex applications can be created on top of a OODBMS that are in fact more robust, in some ways, then the relational couterparts. The Bigest hurdle (Short-term) I see to OODBMS (including ones based upon XML [the ZODB can export objects as XML but they are stored differently internally]) is the lack of a true query and data manipulation language - like SQL. Sure, OQL exists, and is even technically a standard, but it A) sucks and B) is geared towards large java applications with huge amounts of active objects, not general purpose OODB queries. Thus, without such language, OODBMS are all disimilar in how one queries and creates/updates data, and in many cases, the only interface is a truely procedural one! Thus OODBMS are forced to use proprietary tools, and are locked into one system - not to mention speed of development (something normally associated with OO development and OODBMS in general) is hindered by the excessive amount of procedural calls one needs to simply query thier data...
Recently, an add-on to Zope addressed some of these issues. Called 'ZOQL' - it uses a SQL like syntax and allows for very discrete querying of the ZODB (something one had to do programatically using the 'ZCatalog' before) with all of the familar aggregate and comparison operators SQL users love... Of course, this _still_ doesn't address the issue of soft-relationships:
I think the bigest hurdle to OODBMS in the long term (tools like ZOQL are interfaces to existing systems, thus can be mplemented easily) is the lack of handling relationships. It seems that most RDBMS force a developer to think in Relational terms about the data, and most OODBMS force you to think in terms of objects... Most problems can be mapped to either of these domains, but you are forcing the data-model-type onto the problem. What is needed is a hybrid system, an 'Object-Relational' DBMS. This is to say that OODBMS system makers desist with the traditional OO idea that relations are of the following types:
- Object A is a Object B
- Object A Has a/many Object B(s)
What RDBMS systems excelled in (and thus fell into pupular use for) was ease of management and allowing common data to be moved and grouped. A 'Look-up Table' - for instance, which simply holds a list of common data (an enumerated list) and can be centrally maintained is a Boon in the RDBMS world. For example, you have a lookup table of car manufactureres, and one of them changes its name... Instead of updating all N Cars that are made by the manufacturer, you simply update the single record in lookup table. Since each car would have somehting akin to a 'Manuafactuer_ID' column linking it to the lookup table, the Cars belonging to the manufacturer are all taken care of.How does one do this in a hierarchal system? Well, the easy answer would be that each manufacturer object contains all the cars that manufacturer makes. Simple, right? WRONG. Why?
Because each car also has a body-type (compact, sedan, SUV, truck, van, etc...) - which in a relational database would simple by another lookup table, but in an OODBMS poses data management issues. Do we put body-type higher then manufacturer? If so, then we have to maintain the list of manufacturers for each body type, causing headaches. Or do we put body-type below manufacturer, causing us to need to maintain a seperate list of body types for each manufacturer - these lists of course need to match exactly if we ever plan on being able to search or do reports based upon all cars of a specific body type.
Sadly enough, this sort of seperate-enumeration-relationship isn't implemented (well) in any OODBMS I've found.
Take the ZOBD for example, its selection and multiselection lists Try to handle this situation, but fail because relational integrety is not maintained! That is to say, behind the scenes it's not a true reference to a value in the enumerated list, but just a text entry representing a value in the list. If the value in the list changes, the selection-property does not update, leaving you with the equivilent of MySQL's bastard-children, the orphaned records.
This sort of soft-relationship handling is Ugly and BAD for maintainaility, but OODBMS users are faced with two ugly choices each time they map such a relationship: Do I store this as a plain-text property and just update N records each time this changes, or do I map it into the hierarchy and deal with the headaches incurred by doing so...?
I don't think I've answered the question, but hopefully I've at least shed some light on the subject for members of both the OODBMS camps and RDBMS camps... Now if only a useful ORDBMS were to come along...
(Note that PostgreSQL and some other RDBMS actualy can be used in a semi-OO manner, but this is usually reserved for inheritable structures of data to be used for specific extensions to the data model - thus the SUV table inherits from the Cars table and adds some columns - but all other relationships SUV has will still be relational)
man is machine
This is a terrible example. You are trying to describe a scenario that requires a many to many relationship. The intermediary "joiner" or cross-reference table is only necessary if you have a need to keep both joined tables normalized, i.e. you want each distinct telephone number, as well as each person object, to be stored in the database only once.
You've already given up the possibility of normalizing your phone numbers in the heirarchical model (my roomates home phone is the same as mine and it shows up in LDAP twice, once for me and once for him), so a simple many to one join to the telephone number table will allow you to list a home phone twice, once for each of us.
Now, if the data you are modeling truely requires a many to many relationship (your model needs to handle the real world, you can't change the world to fit the limitations of your tools), you have no way of representing that information in a normalized fashion in a heirarchical model. The so called "kludge" of an x-ref table from the relational world is not even an option.
The heirarchical model is so limited and simplistic that it can be implemented in a single, self-referential table in a relational database, and can even be queried in a recursive manner (oracle has had 'connect by prior' for dealing with these models since I started with the product 10 years ago).
From my view as a mathematician, and not a computer programmer, the relational model is so much more robust and powerful than a heirarchical model it hardly warrants discussion.
09F911029D74E35BD84156C5635688C0
Jesus loves you, I think you suck
Human readable?
I suppose you don't mind it when someone send you mail, and you see a bunch of tags all over the place because it's in HTML. XML is just the same kind of thing ... all cluttered with tags. The computer can read XML easier and more quickly than humans. Sure it could read it even faster if it didn't have to parse all those tags. But I wouldn't call this a design intended for humans to read.
now we need to go OSS in diesel cars
Why do you assume relational databases are more developed than hierarchical?? The company I work for has been using our own hierarchical database for 25 years. They had the potential to become what Oracle is today but decided to stay focused on the medical industry. The serious problem with relational databases is they have traditionally not handled sparse data well at all. In the case of a patient every time they come for a visit there are tens of thousands of possible data points that can be entered, but most usually are empty. For tasks such as these relational databases have been completely impractical. With the use of indexing a heirarchical database can do everything a relational database can do.
Anyone can explain to me what is suddenly so wrong about relational database with hierarchical indexing?
/.) there is a correct answer.
Maybe its just me, but the goal today is integration and having a special database for XML and special database for this and that just because its faster for this particular problem creates such a level of complexity, which prevents accomplishing even of the most trivial tasks.
Forgive me for tooting my own horn on this one, but I believe that (for once on
I summarize the answer in a paper written for VLDB 2001 (www.vldb.org). The paper presents joint work between Stanford, Berkeley, and RightOrder, Inc. It can be found online here (in PDF).
What we found is that relational systems, with appropriate indexes for XML data, give the advantages of both worlds. XML is a hierarchical representation in only the loosest sense. It's written linearly in a flat text document, just as a child learns to write things down on a piece of paper. However, you wouldn't convince anyone but that same child that something written on paper can only represent two-dimensional objects just because the paper itself is flat. XML in many variants is plainly richer in concept than its simple hierarchical representation and thus quite suited to ER. I believe a previous poster mention RDF... a perfect example.
Punchline: XML is neat, XML is tasty, but XML is not inherently more or less expressive than ER; it just requires a little critical thinking (and index tweaking) to tune ER engines to deal with it. (Once tuned, the ER engines dominate all others in performance.)
I suppose you don't mind it when someone send you mail, and you see a bunch of tags all over the place because it's in HTML. XML is just the same kind of thing ... all cluttered with tags. The computer can read XML easier and more quickly than humans. Sure it could read it even faster if it didn't have to parse all those tags. But I wouldn't call this a design intended for humans to read.
The XML isn't human readable, but browsers and other applications can make pretty good guesses at a nice human readable representation.
Further, you can define style sheets to produce different views, with data that would be unimportant to a particular human (or application) elided.
It may be oversold, but the point is that the data definition is well defined such that writers and readers (often human readers, also applications) can interact more easily. It's about portability of data, which readability is a subset.
XML may be hierarchical but the data it is used to markup is not necessarily hierarchical. For instance, XML can be used to markup conventional fielded (flat file) data to serve as an interchange format.
More importantly, XML is used to impose some structure on inherently unstructured text. The structure it provides is based on some assumptions of how the data will be used or how it will be presented. If the data is used in some otherway, the markup can be useless.
An example is a book. For XML purposes, it can be described as structured by chapter, section, subsection, and paragraph. For information purposes, tags are assigned to represent the ideas, terminology, names and other index-like content. There is virtually no structure in these index type of tags but they convey the most important information in the book.
Or not. These tags are assigned based on assumptions about what readers are interested in. A different set of assumptions would produce a different set of tags even thought the structure of the document would stay the same. If the sentences and paragraphs are shuffled and exerpted for some other publication, even the structure becomes irrelevant.
How this inherently unstructured information is stored is relevant to how it is managed, that is, how it is backed up, how access is controled, how changes are tracked. However, when it comes to putting the information to some useful purpose, it is the retrieval mechanisms that are important. The issues here are how easily the user can specify the type of information he wants and how accurately the mechanism can find it. This process is usually independent of the underlying structure and uses some higher level concepts of relevance and context.
The question of whether to use a hierarchical, relational or object-oriented data structures misses the point for textual data, for which XML is commonly used, because none of these structures capture meaning.
Topic maps make a heroic stab at capturing meaning in XML markup but still only within a set of assumption. I suspect a true meaning markup language is theoretically impossible, or at least theoretically very far in the future.
Indeed, that's the very touchstone that distinguishes relational databases from something like DBM and its many descendants.
The alternative to relational databases is not "DBM", it is object oriented, tree structured, logical, and other kinds of database models. Those are just as well defined as relational databases.
And *that* is important because it assures the desiger and user that every possible operation is well-defined and (hopefully) correctly implemented. The exact syntax for a "join" may differ, and a specific implementation may be flawed, but everyone agrees to a common baseline.
Relational databases provide a common baseline for a primitive set of relational operations. Real-world implementations of those models have been augmented by zillions of operations that weren't part of the original relational model and that often don't even fit into the relational model. And without those extra operations, relational databases would not be useful in practice.
For now, AFAIK, there is none other than that you get when you map a hierarchial database into relational tables and use exactly those relational properties.
Are you kidding? It is a major pain trying to express hierarchical data in a relational database model: the relations that describe hierarchical data and the operations that you might want to execute often require complex, multiple, inefficient queries and updates, and the relational model provides few tools to ensure that the corresponding relations remain consistent.
The semantics of tree structures are trivial to define. People do it in programming language classes all the time. And it is trivial to formulate a database model corresponding to it. In fact, if you have an object-oriented database that respects language semantics, you get hierarchical databases automatically when you define an abstract tree datatype.
Still, so-called "relational" databases will continue to dominate the market for a long time to come. That's not because the relational model is particularly well-suited to a lot of applications. In part, that's because "relational databases" are not purely relational anymore: they generally include numerous facilities for object-oriented and hierarchical databases, under a "relational veneer". They even include the old "navigational" database systems, combined with the widespread use of stored procedures that do whatever they want whenever they want it on the database server.
In different words, traditionally relational databases will provide increasingly better support for hierarchical and object-oriented data, but they will continue to also support the relational model, as well as relational access to these other data types. And newly developed databases with other kinds of data models will provide an SQL or other relational frontend to their content. And marketing will continue to include "something-relational" in all the advertising because otherwise the old database hands won't buy it.
99+% of all corporate data that isn't in a flat-file or (possibly three-dimensional) spreadsheat is in relational tables. The typical task that XML has been designed for is to standardize data exchanges between differently-structured relational systems, by providing sets of tags specific to the standards of specific industries. The whole point of XML is to enable companies to continue to use their current investment in relational databases, without the drag of having to do custom data conversions when dealing with suppliers or distant divisions in the company.
... just about everything industry consists of. So if there's an impedence mismatch between relational and XML that's enough to make trouble, it's XML that should be replaced by another model.
If you're going to throw out the installed investment in relational databases, you might as well just design a common database standard per industry (rather than an XML data exchange standard) and let them all exchange native data rather than translating in and out of any exchange format. Obviously that won't happen.
Now, if you're a new firm, you might decide it's easier to go OO or heirarchical or keep your data in slips of paper in a shoe box. But most of the available tools and solutions will continue to respect that relational works real, real well for inventory, manufacturing, accounts
What design changes would be required to produce XML's relational equivalent?
"with their freedom lost all virtue lose" - Milton
It's easy to dismiss a new database technology as irrelevant because of the dominance of the RDBMS, but you should really learn more about it and when it is appropriate and when it's not. It's not going to replace relational, and isn't intended to. Here's a few links where you can learn more beyond what's available on Ronald Bourret's site mentioned in the original post.
The XML:DB Initiative
The dbXML Project (open source native XML database) Soon to become an Apache XML project named Xindice
eXist (another open source native XML database)
My blog on the subject.
Kimbro Staken
A bit surprised to hear that 'Hierarchical databases were blown away by relational versions' - since I'm pretty sure they've been paying my pay check for the last three years... :-)
There are a large number of heirarchical databases out there. The big fellas are the X500 directories (X509 certs came out of this work). More common are X500's demented kid sisters, the LDAP directories ( rfc2251). The DNS system also fits the description 'heirarchical database'.
As far as XML goes, there are people storing XML in directories - although they're still fussing about exactly how to do it. There are a bunch of people trying to come up with standards - check the directory services markup language people www.dsml.org.
There are people trying to sell XML enable directories - Novell sells an XML directory, but most directories can be used to store XML (including our 'eTrust Directory').
As a final quicky - when do you use a directory over an RDBMS? Directories are good for naturally heirarchical data with few cross connections. They are usually optimised for slow writes/fast reads. They are *very* good for distributed data (e.g. DNS, international organisations etc.). The X500 spec defines a very fine grained security model, which can also be useful. However, if your data is closely cross-linked with lots of relationships... well, use an RDBMS!
Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird.
The problem with your scheme is that as the list of information grows the amount of time to do the access grows linearly. With systems where there are thousands of concurrent users all doing large numbers of data accessing of the information your solution would grind the system to a halt. While technically possible to do this with a relational database it is impractical. Taking into account that 90% of the databases use is doing reads and the access of such a system would be even worse. Also take into account that the system has to be able to handle some data which does not change and some data that has to be grouped based apon a visit date. Your system would have no efficient way to group certain elements by date.