With XML, is the Time Right for Hierarchical DBs?
"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.
Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"
Rather than discard the advantages of relational and object databases, should we instead ask how XML can be used to represent those kinds of relationships?
What do you mean they cut the power? How can they cut the power, man? They're animals!
1337ness for sale.
Just because XML is a hierarchical markup language does not mean that it can only be used for hierarchical things. Perhaps you should look at RDF which can use many to many mappings through resources and groupings (sequences, bags, and alternates). (A resource in one grouping can refer to another grouping i.e. many to many.)
marotti.com
I don't agree - look at LDAP. The benefits for LDAP'fying services is clear. With a hierarchial database, specific queries can target a subset of the entire database, without the over head of having seperate tables and/or database for varying information. For keeping track of 'real world' objects: People, Printers, IPS, etc.. the advantage is that the system used to organize them is similar to the actual grouping going on. Managers have employees 'underneath' them. Its basically taking the organizational concepts used for filesystems and applying them to database design. I havent done any performance testing LDAP vs. SQL for similar schema setup, but from what I understand one of the other benefits is fast lookups. Sounds like a good project! To implement databases in both LDAP and SQL and measure the performance of similar queries!! :)
-- NeTMoNGeR
An afterthought, databases are about storage and speed of insertion/extraction. I honestly don't believe that fitting the database to the data structure is worth the cost or the trouble, just yet.
I think these discussions come up part of the time because people want something new and sexy. In this case OO DB's, which 'XML DB's' are a variant of, may have benefits in specific and limited cases. But I have not been impressed.
Take your classic orders table. Part NO, Custoemr NO, etc. etc. The number of apps with only one parent is tiny, the flexibilty limited, and the whole metadata scanning business awkaward.
For anyone doing and serious larger scale database work some of this stuff is a joke. The idea these vendors have is that we'll be storing XML data in these DB's, ignoring that even for a simple phone directory, the XML data probably takes up a significantly greater amount of space than a simple relational DB would require
And this ignores the significant amount of time and energy invested in toolsets and models for the existing setup. Sure, someone might come out with a chip that runs 2x as fast as an intel at the same price, but unless it is intel compatible how many people would buy it or care?
Maybe its just me, but the goal today is integration and having a special database for XML and special database for this and that just because its faster for this particular problem creates such a level of complexity, which prevents accomplishing even of the most trivial tasks.
Still, XML is only a way how to describe data, that might be often in their structure relational. Why do not store data in their native form and create XML documents out of database on fly by filters?
This question of hierarchical databases is just plain trolling in my eyes.
If programs would be read like poetry, most programmers would be Vogons.
Relational databases didn't come to dominate the database market because they pushed aside equally valid alternatives, they dominate the market because relational databases implement relational calculus. Indeed, that's the very touchstone that distinguishes relational databases from something like DBM and its many descendants.
And *that* is important because it assures the desiger and user that every possible operation is well-defined and (hopefully) correctly implemented. The exact syntax for a "join" may differ, and a specific implementation may be flawed, but everyone agrees to a common baseline.
For hierarchial databases to really take off, they need to have an equally strong mathematical underpinning. For now, AFAIK, there is none other than that you get when you map a hierarchial database into relational tables and use exactly those relational properties. That's a good start, but if you're only using the properties in relational databases, why not stick with them?
As for XML, that's completely irrelevant. It's a good format for transferring data, but that's about it. You can store hierarchial data in an XML file, but you can also use it to store purely relational data or completely unstructured data (in some CDATA block).
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
Human readable?
I suppose you don't mind it when someone send you mail, and you see a bunch of tags all over the place because it's in HTML. XML is just the same kind of thing ... all cluttered with tags. The computer can read XML easier and more quickly than humans. Sure it could read it even faster if it didn't have to parse all those tags. But I wouldn't call this a design intended for humans to read.
now we need to go OSS in diesel cars
Why do you assume relational databases are more developed than hierarchical?? The company I work for has been using our own hierarchical database for 25 years. They had the potential to become what Oracle is today but decided to stay focused on the medical industry. The serious problem with relational databases is they have traditionally not handled sparse data well at all. In the case of a patient every time they come for a visit there are tens of thousands of possible data points that can be entered, but most usually are empty. For tasks such as these relational databases have been completely impractical. With the use of indexing a heirarchical database can do everything a relational database can do.
XML may be hierarchical but the data it is used to markup is not necessarily hierarchical. For instance, XML can be used to markup conventional fielded (flat file) data to serve as an interchange format.
More importantly, XML is used to impose some structure on inherently unstructured text. The structure it provides is based on some assumptions of how the data will be used or how it will be presented. If the data is used in some otherway, the markup can be useless.
An example is a book. For XML purposes, it can be described as structured by chapter, section, subsection, and paragraph. For information purposes, tags are assigned to represent the ideas, terminology, names and other index-like content. There is virtually no structure in these index type of tags but they convey the most important information in the book.
Or not. These tags are assigned based on assumptions about what readers are interested in. A different set of assumptions would produce a different set of tags even thought the structure of the document would stay the same. If the sentences and paragraphs are shuffled and exerpted for some other publication, even the structure becomes irrelevant.
How this inherently unstructured information is stored is relevant to how it is managed, that is, how it is backed up, how access is controled, how changes are tracked. However, when it comes to putting the information to some useful purpose, it is the retrieval mechanisms that are important. The issues here are how easily the user can specify the type of information he wants and how accurately the mechanism can find it. This process is usually independent of the underlying structure and uses some higher level concepts of relevance and context.
The question of whether to use a hierarchical, relational or object-oriented data structures misses the point for textual data, for which XML is commonly used, because none of these structures capture meaning.
Topic maps make a heroic stab at capturing meaning in XML markup but still only within a set of assumption. I suspect a true meaning markup language is theoretically impossible, or at least theoretically very far in the future.
Indeed, that's the very touchstone that distinguishes relational databases from something like DBM and its many descendants.
The alternative to relational databases is not "DBM", it is object oriented, tree structured, logical, and other kinds of database models. Those are just as well defined as relational databases.
And *that* is important because it assures the desiger and user that every possible operation is well-defined and (hopefully) correctly implemented. The exact syntax for a "join" may differ, and a specific implementation may be flawed, but everyone agrees to a common baseline.
Relational databases provide a common baseline for a primitive set of relational operations. Real-world implementations of those models have been augmented by zillions of operations that weren't part of the original relational model and that often don't even fit into the relational model. And without those extra operations, relational databases would not be useful in practice.
For now, AFAIK, there is none other than that you get when you map a hierarchial database into relational tables and use exactly those relational properties.
Are you kidding? It is a major pain trying to express hierarchical data in a relational database model: the relations that describe hierarchical data and the operations that you might want to execute often require complex, multiple, inefficient queries and updates, and the relational model provides few tools to ensure that the corresponding relations remain consistent.
The semantics of tree structures are trivial to define. People do it in programming language classes all the time. And it is trivial to formulate a database model corresponding to it. In fact, if you have an object-oriented database that respects language semantics, you get hierarchical databases automatically when you define an abstract tree datatype.
Still, so-called "relational" databases will continue to dominate the market for a long time to come. That's not because the relational model is particularly well-suited to a lot of applications. In part, that's because "relational databases" are not purely relational anymore: they generally include numerous facilities for object-oriented and hierarchical databases, under a "relational veneer". They even include the old "navigational" database systems, combined with the widespread use of stored procedures that do whatever they want whenever they want it on the database server.
In different words, traditionally relational databases will provide increasingly better support for hierarchical and object-oriented data, but they will continue to also support the relational model, as well as relational access to these other data types. And newly developed databases with other kinds of data models will provide an SQL or other relational frontend to their content. And marketing will continue to include "something-relational" in all the advertising because otherwise the old database hands won't buy it.