With XML, is the Time Right for Hierarchical DBs?
"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.
Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"
There is lots said on this over at Database Debunkings
Special Relativity: The person in the other queue thinks yours is moving faster.
XML is not a magic bullet. Relational database won out over the Hierarchical model for a lot of reasons. For instance, there exists a number of integrity constraints with the Hierarchical model such as
1) No record occurrences except root records can exist without being related to a parent record occurrence. This means that
a) a child record cannot be inserted unless it is linked to a parent record.
b) a child record may be deleted independently of its parent however, deletion of the parent record automatically results in the deletion of all its child and descendent records.
c) the above rules do not apply to virtual child records and virtual parent records.
2) If a child record has 2 or more parent records from the SAME record type, the child record must be duplicated once under each parent record.
3) A child record having 2 or more parent records of DIFFERENT record types can do so only by having at most 1 real parent, with all the others represented as virtual parents. IMS limites the number of virtual parents to 1.
In addition to these flaws, relational databases have had over a decade to become mature, optimized, and enterprise scalable. Harddrive partitioning for such databases as oracle work out perfectly with the cylinder, sector, and tracks of a hard drive to allow for the fastest read/write times as can be possible.
Too often people see that XML "can" do so many things and decides that it should be the way things are done but XML is NOT a magic bullet and just because it has the potential to do something does not make it the best methodology for doing so.
This is a terrible example. You are trying to describe a scenario that requires a many to many relationship. The intermediary "joiner" or cross-reference table is only necessary if you have a need to keep both joined tables normalized, i.e. you want each distinct telephone number, as well as each person object, to be stored in the database only once.
You've already given up the possibility of normalizing your phone numbers in the heirarchical model (my roomates home phone is the same as mine and it shows up in LDAP twice, once for me and once for him), so a simple many to one join to the telephone number table will allow you to list a home phone twice, once for each of us.
Now, if the data you are modeling truely requires a many to many relationship (your model needs to handle the real world, you can't change the world to fit the limitations of your tools), you have no way of representing that information in a normalized fashion in a heirarchical model. The so called "kludge" of an x-ref table from the relational world is not even an option.
The heirarchical model is so limited and simplistic that it can be implemented in a single, self-referential table in a relational database, and can even be queried in a recursive manner (oracle has had 'connect by prior' for dealing with these models since I started with the product 10 years ago).
From my view as a mathematician, and not a computer programmer, the relational model is so much more robust and powerful than a heirarchical model it hardly warrants discussion.
09F911029D74E35BD84156C5635688C0
Jesus loves you, I think you suck
Anyone can explain to me what is suddenly so wrong about relational database with hierarchical indexing?
/.) there is a correct answer.
Maybe its just me, but the goal today is integration and having a special database for XML and special database for this and that just because its faster for this particular problem creates such a level of complexity, which prevents accomplishing even of the most trivial tasks.
Forgive me for tooting my own horn on this one, but I believe that (for once on
I summarize the answer in a paper written for VLDB 2001 (www.vldb.org). The paper presents joint work between Stanford, Berkeley, and RightOrder, Inc. It can be found online here (in PDF).
What we found is that relational systems, with appropriate indexes for XML data, give the advantages of both worlds. XML is a hierarchical representation in only the loosest sense. It's written linearly in a flat text document, just as a child learns to write things down on a piece of paper. However, you wouldn't convince anyone but that same child that something written on paper can only represent two-dimensional objects just because the paper itself is flat. XML in many variants is plainly richer in concept than its simple hierarchical representation and thus quite suited to ER. I believe a previous poster mention RDF... a perfect example.
Punchline: XML is neat, XML is tasty, but XML is not inherently more or less expressive than ER; it just requires a little critical thinking (and index tweaking) to tune ER engines to deal with it. (Once tuned, the ER engines dominate all others in performance.)
I suppose you don't mind it when someone send you mail, and you see a bunch of tags all over the place because it's in HTML. XML is just the same kind of thing ... all cluttered with tags. The computer can read XML easier and more quickly than humans. Sure it could read it even faster if it didn't have to parse all those tags. But I wouldn't call this a design intended for humans to read.
The XML isn't human readable, but browsers and other applications can make pretty good guesses at a nice human readable representation.
Further, you can define style sheets to produce different views, with data that would be unimportant to a particular human (or application) elided.
It may be oversold, but the point is that the data definition is well defined such that writers and readers (often human readers, also applications) can interact more easily. It's about portability of data, which readability is a subset.
99+% of all corporate data that isn't in a flat-file or (possibly three-dimensional) spreadsheat is in relational tables. The typical task that XML has been designed for is to standardize data exchanges between differently-structured relational systems, by providing sets of tags specific to the standards of specific industries. The whole point of XML is to enable companies to continue to use their current investment in relational databases, without the drag of having to do custom data conversions when dealing with suppliers or distant divisions in the company.
... just about everything industry consists of. So if there's an impedence mismatch between relational and XML that's enough to make trouble, it's XML that should be replaced by another model.
If you're going to throw out the installed investment in relational databases, you might as well just design a common database standard per industry (rather than an XML data exchange standard) and let them all exchange native data rather than translating in and out of any exchange format. Obviously that won't happen.
Now, if you're a new firm, you might decide it's easier to go OO or heirarchical or keep your data in slips of paper in a shoe box. But most of the available tools and solutions will continue to respect that relational works real, real well for inventory, manufacturing, accounts
What design changes would be required to produce XML's relational equivalent?
"with their freedom lost all virtue lose" - Milton
That is a crock. XML was developed explicitly to fix the problems in SGML. LDAP was developed to fix the problems in X.500. In both cases it was the poor design of the predecessor that was being fixed.
Henrick F-N was working on SOAP like ideas long before he joined Microsoft. Again all SOAP does is to fix known incompetence in CORBA. Gates devised .NET to solve two problems, first how to get a foothold in the enterprise space, second how to improve on C++ without the proprietary lock that Sun had imposed on Java.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
It's easy to dismiss a new database technology as irrelevant because of the dominance of the RDBMS, but you should really learn more about it and when it is appropriate and when it's not. It's not going to replace relational, and isn't intended to. Here's a few links where you can learn more beyond what's available on Ronald Bourret's site mentioned in the original post.
The XML:DB Initiative
The dbXML Project (open source native XML database) Soon to become an Apache XML project named Xindice
eXist (another open source native XML database)
My blog on the subject.
Kimbro Staken
A bit surprised to hear that 'Hierarchical databases were blown away by relational versions' - since I'm pretty sure they've been paying my pay check for the last three years... :-)
There are a large number of heirarchical databases out there. The big fellas are the X500 directories (X509 certs came out of this work). More common are X500's demented kid sisters, the LDAP directories ( rfc2251). The DNS system also fits the description 'heirarchical database'.
As far as XML goes, there are people storing XML in directories - although they're still fussing about exactly how to do it. There are a bunch of people trying to come up with standards - check the directory services markup language people www.dsml.org.
There are people trying to sell XML enable directories - Novell sells an XML directory, but most directories can be used to store XML (including our 'eTrust Directory').
As a final quicky - when do you use a directory over an RDBMS? Directories are good for naturally heirarchical data with few cross connections. They are usually optimised for slow writes/fast reads. They are *very* good for distributed data (e.g. DNS, international organisations etc.). The X500 spec defines a very fine grained security model, which can also be useful. However, if your data is closely cross-linked with lots of relationships... well, use an RDBMS!
Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird.