SQL, XML, and the Relational Database Model

"NULLS are bad." quote by MattRog · 2004-06-28 08:08 · Score: 5, Informative

Celko is misquoting Darwin in saying that "The idea that you will always know everything is arrogant".

Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:
http://www.hughdarwen.freeola.com/TheThirdManifest o.web/Missing-info-without-nulls.pdf

And yes, XML DBMS are a throwback to IBM IMS and other hierarchical DBMS products. Anyone who has ever used a hierarchical DBMS will tell you that there are some pretty non-trivial problems that you cannot work around due to their hierarchical data model, yet XML DBMS proponents propose we go back to that old, inflexible system!

--

Thanks,
--
Matt

Re:"NULLS are bad." quote by Yosi · 2004-06-28 08:20 · Score: 2, Informative

At least some XML people recognize the same, thus the existence of RDF.
RDF is equivilent to a relational database. There is a working group right now looking into a query language for RDF.

That said, some RDF people here at the w3c don't care that much for serializing RDF as XML, prefering the much more readable n3
Re:"NULLS are bad." quote by RatRagout · 2004-06-28 08:42 · Score: 3, Informative

The S in SQL points to the fact that the Query Language is structured...not the data...

Link to history of SQL.. by MisanthropicProgram · 2004-06-28 08:13 · Score: 5, Informative

Here is a history of SQL. I wanted to check the article's facts. Also, I was curious... History of SQL

Re:Isn't XML semi-object oriented? by Bitsy+Boffin · 2004-06-28 08:14 · Score: 4, Informative

No.

XML is a file format, it has nothing to do with objects, no more than HTML does, which is not at all.

However to counter your claim that XML is hard to represent in a relational database. Uhm. No, it's not.

XML consists (simplifying) of elements and attributes, elements may be nested.

A generic mapping to a relational database is that elements correspond to the entity tables, attributes correspond to columns in those tables, and the nesting of elements is modelled as a foreign key in the child entity records.

Whats so hard?

--
NZ Electronics Enthusiasts: Check out my Trade Me Listings

More detailed articles by deepchasm · 2004-06-28 08:17 · Score: 4, Informative

Readers interested in what Pascal and Date have to say may be interested in visiting Database Debunkings. It has lots of articles written by one or both of them.

Personally, I recommend the articles written by Date because they are clearly, concisely, accurately, and calmly written. Pascal's tend to turn into a rant, which I wouldn't mind but he always seems to refer to his books rather than give a detailed justification of his arguments.

Re:'scuse my ignorance but... by MattRog · 2004-06-28 08:21 · Score: 5, Informative

There are a couple of "problems" that they have identified:
1) You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.
2) Little/No support for relational domains (e.g. complex data types)
3) Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)
4) Weak support for complex integrity constraints (e.g. business rules)
5) No support for entity sub/supertype relationships
6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)

Try searching www.dbdebunk.com for SQL. Or pick up the great book "Practical Issues in Database Management" by Fabian Pascal.

--

Thanks,
--
Matt

Re:Isn't XML semi-object oriented? by AKAImBatman · 2004-06-28 08:21 · Score: 3, Informative

Whats so hard?

The "arbitrary XML" part. You must have existing mappings set up to process the XML. New forms of XML thus require a great deal of work on the part of the DB developer.

XML databases such as Xindice OTOH, allow you to create a table and insert XML in whatever format you chose. XPATH queries take a bit of getting used to, but you can query on tags, attributes, CDATA, or whatever else you chose at whatever level in the XML hierarchy you choose. Thus I can query for the list of addresses for all records that have a firstname attribute that is LIKE "Bob". Or I can dive down to the individual address level and query for all records that have an address of "Drury Lane" (important for tracking down the Muffin Man when you need a giant gingerbread cookie).

It's not like you can't do this stuff with SQL databases, it's simply a different method of accomplishing the task. Depending on the data you're working with, an XML database may very well be a more efficient method of storage and queries.

--
Javascript + Nintendo DSi = DSiCade

Re:It would seems SQL is better for RDB than XDB by TheSunborn · 2004-06-28 08:23 · Score: 3, Informative

And the problem happens as soon as you want to know which producs uses a specific screw. Doing that query against the xml would be difficult, and/or extreamly slow.

Martin

Re:Isn't XML semi-object oriented? by Black+Perl · 2004-06-28 08:23 · Score: 2, Informative

Of course XML is going to be hard to represent in a relational database. Unless your tables are ( id, object text) and you pull out your XML and parse it.

This is completely false. If you had RTFA, it is mentioned that the relational model can represent hierarchies (and thus XML) just fine. It is SQL that is deficient for this purpose.

Also, it makes no sense to call XML "object-oriented," which is a programming language term[semi-OO? LOL]. XML is a syntactic hierarchy that can be used to represent "objects", just as it can be used to represent other types of data.

Last, just about every major database now supports XML as a native datatype, meaning you don't have to pull out XML documents and parse them.

--
bp

Re:It would seems SQL is better for RDB than XDB by kpharmer · 2004-06-28 08:30 · Score: 2, Informative

> An iterative SQL call would be needed unless the overall part depth was known at query time.

Both oracle & db2 have very good support for recursion. DB2 in particular can easily handle hierarchies and networks of unknown depth in a single query.

Re:Isn't XML semi-object oriented? by Tassach · 2004-06-28 08:38 · Score: 2, Informative

but I WILL point out, in regards to objects, us in the modern day have a field type cald BLOB (Binary large object)/blockquote And I will point out that BLOBs are, for all practical purposes, unusable from within the database. While you can store and retrieve BLOBs from SQL, that's about all you can do with them -- anything more elaborate has to be done outside of the database.

--
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?

Re:'scuse my ignorance but... by MattRog · 2004-06-28 09:01 · Score: 5, Informative

"Just because two queries return the same results today do not mean that they will continue to do so in the future."

Total misunderstanding of what I wrote. To put it another way:
SQL allows you to write queries which are mathematically equivalent but result in vastly different query plans and performance. Again, not a particularly stinging-indictment of SQL as such but had it been designed differently it could have avoided such ambiguity in the language.

"Not a bug, it's a feature. The S in SQL is for "structure"... go hammer out your data into a structured format rather than a complex one and then come back."

So you're saying a tree has no "structure"? That a domain has no structure? If it had no structure, it would be a little difficult for computers to process.

"View stuff"
Pascal (or Date, can't remember) provides an iron-clad (mathematical definition) method of creating views which will always be updatable. There are structural deficiencies in SQL which prohibit this. I will not waste time/typing here illustrating them, they are all identified at their web site.

"SQL triggers" etc.
It is precisely the reason that applications were enforcing business rules that DBMS were invented all those years ago! There are plenty of reasons that application-enforcement of business rules is a bad thing. Again these are illustrated on their web site. Also, your quote about "SQL triggers" is basically re-stating what I mentioned: that SQL is poor at implementing business rules!

"Plenty of support, just not intrinsically."
Which is exactly the same as saying "no support for entity sub/supertypes". Plus, one-to-many tables are not the same, you're thinking of something else. Chapter 6 of Fabian Pascal's book "Practical Issues in Database Management" covers this in some depth.

"That's like trying to do math without a concept of zero."
Not quite the same. Remember that the relational model is based upon predicate logic and set theory. Set theory has the empty set, which is not the same as NULL. SQL products currently handle null in a ridiculous manner (some sort NULL oddly, comparison is difficult, summation is odd). Pascal/Date do not suggest that the concept of "unknown" is bad, just that the SQL representation as NULL is.

--

Thanks,
--
Matt

Nitpick (OT) by wtrmute · 2004-06-28 09:12 · Score: 2, Informative

This is off-topic, but it bothers me when someone blasts another for ignorance and then links into an otherwise perfectly good article ("The Myth of Self-Describing XML", by Eric Browne)which contains the following pearl:

The business logic cannot be constructed post priori!

Doesn't it bother anyone that someone who doesn't know his Latin (but uses it, anyway, where a perfectly good "after the fact" would suffice) is used as an authoritative source by the author? If you're trying to prove someone's incompetence, you won't want to counteract it with further incompetence. Mr. Browne may be a genius in relational databases, but makes stuff up just the same.

And in the unlikely case Mr. Browne should read this post, the correct form is a posteriori, as any law student probably knows.

The Problem, stated more accessibly by IBitOBear · 2004-06-28 09:46 · Score: 4, Informative

The problem is that most "Relational Database Management Systems" only represent one type of "relationship", that being "the table".

This, in turn, means that every operator (programmer, statement, etc) on the database must _individually_ "already understand" all the relationships that lie outside "the table" before they can act on the data at any significant scope.

That is, you, the programmer or operator must know, from some source besides the RDBMS itself, how the different tables work with eachother.

In simpler SQL-biased terms, you have to know, before you start, what is "good" to put in your WHERE CLAUSE to make a join. And then the RDBMS query optimizer needs to guess how to satisfy your needs in something other than glacial time.

Consider a new verb "EXPOSE thing, thing, thing, thing, thing..." that would fish out of a database the one-or-more relationships between the things, and produce a table-looking vector of "tuples" that consist sets of actual values for those things. This is what the theoretical "perfect" RDBMS would do.

Given (somewhat denormal 8-):

Customer ID -> Customer Name
Customer ID -> Street Address
Customer ID -> Zip Code
Customer ID -> Purchase Order ID
Purchase Order ID -> Part Number, Quantity

One should be able to "EXPOSE Part Number, Zip Code" and have the database "know the relationship" and produce the correct vector of tuples.

But that doesnt happen.

Now all the people bleating about the Higherarchial databases and bad things from the past are doing this harping because they remember the bad-old-days when a database would maintian one tree-structured set of relationships like this. In the higherarchical model, you could basically do this EXPOSE operation, but only if you had, by dint of pre-knowledge, asked for things lying on one linear path through the tree. (* simplified for brevity).

In essence, SQL requires you, the programmer, to be in the business of making up relationships that should be in the data or schema structure but isn't.

A magically complete RDBMS would take a series of vectors of the form "Independent Datum (key)->Dependent Datum (value)" (where either side of the arrow could be a list of atomic values). The RDBMS would then assemble and maintian tables or linked lists or whatever with no exposure of SQL-esque "tables" and the accessors would be storage method agnostic. (That is, there would be no such thing as a FROM CLAUSE.)

For instance, in the above list of relationships, Customer ID, Customer Name, Street Address, and Zip Code *could* all live in a "table", or not, but you would never know that. But a better vector of
Customer ID -> Street Name, Building Number, Suite to replace "Customer ID -> Street Address" has a table-feel, but would not bias against "EXPOSE Street Name, Part Number".

The power of this comes from being able to do:

EXPOSE Building, Part Number, Quantity
Where Building == "Word Trade Center 2";

And have the RDBMS already know the sequence of relationships to get from Building to (address elements) to Customer ID, to Part Number without the programmer writing the three stage join across the "uninteresting middle tables".

(The above presumes you have a building relation that has Street Name, Street Number -> Building or some such.)

All the XML nonsense is nonsense *_BECAUSE_* the strict-nesting enclosure requirements of XML make it "naturally" become hierarchically organized. But data exists outside the single-inheretance strict parantage trees that the hierarchical model dictates.

The problem is that SQL got real popular and so the idea of structural inferrence got pared down to tables and Query-Like constraints on tables before anybody had a chance to formalize the idea of living, encoded relationships between arbitrarily stored datum. So we never really got a language or system that could "EXPOSE".

--
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press

Re:Never heard so much nonsense.. by FnH · 2004-06-28 09:54 · Score: 2, Informative

Question - can you express the recursive ancestor relation in SQL? You can express a single relation such as grand parent, but not the full relation.

I beg to differ:

Given ParentOf(parent,child)

The ancestors of 'Mary' are:

WITH RECURSIVE Ancestor(anc,desc) AS ( (SELECT parent as anc, child as desc FROM ParentOf) UNION (SELECT Ancestor.anc, ParentOf.child as desc FROM Ancestor, ParentOf WHERE Ancestor.desc = ParentOf.parent) ) SELECT anc FROM Ancestor WHERE desc = "Mary"

SQL might still not be true relational calculus (but then again, it might as well be, I don't know, and the parent doesn't know either), but at least it't possible to express the ancestor relationship.

Now, if only the PostgreSQL guys would implement this ...

Re:SQL sucks? by Anonymous Coward · 2004-06-28 11:04 · Score: 1, Informative

However, he hasn't delivered an exaple of one, he hasn't clearly articulated the differences between his vision and the commercial options, and he apparently refuses to acknowledge that some problems in life fail to fit well into the relational model (hierarchies, networks, inheritance, etc).

FP, Date, and Darwen may or may not be extremists, but the above characterization is just plain wrong.

Date and Darwen have extensively documented the differences their vision and the commerial options in The Third Manifesto Chapter 7 (IIRC) of Pascal's Practical Issues in Database Management is devoted to tree processing in a relational language. There's extensive discussions of tree processing and handling transitive closure in The Third Manifesto as well. The cover hierarchies and networks, and in addition have several chapters devoted to inheritance.

You may not agree with their critique, but to say they haven't articulated the specifics is just wrong; they've spent 20 years doing exactly that.

Re:The Criterion here is no longer machine efficie by warrax_666 · 2004-06-28 11:37 · Score: 2, Informative

The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not. The beauty of XML is "agreed-upon" is optional now. We can understand the data without a formal meeting to come to an agreement.

No. You. Can't.

You simply cannot know what the data means without specified semantics (which you have to agree upon somehow; they aren't magically apparent from the XML itself -- that's why such abominations as DTD and XML-Schema exist).

--
HAND.

NULLs and Normalization by PatientZero · 2004-06-28 11:55 · Score: 2, Informative

You can't get rid of NULLs and maintain Normal Form. The two are simply at odds with one another.

No, they are not, but the way to rectify them is a bit extreme. Keep in mind that -- as Date says again and again -- there is a difference between the logical model and the physical model. I'll summarize the example he used: a EMP_SALARY table.

Let's start simple:

EMP SALARY --------------- Alice 100,000 Bob NULL Chris NULL

[Sorry, ecode doesn't seem to want to do vertical alignment.]

Now, what do those two NULLs mean; do they carry value? If Bob is unemployed, we could write 0, but then it could be confused with unemployed and employed for no pay. Perhaps we don't know how much money Chris makes, but we do know that he makes some money.

You could fix this by adding a TYPE enumeration column that would take on values like EMPLOYED, UNEMPLOYED and UNKNOWN, but you'll still need the NULL value for the UNEMPLOYED and UNKNOWN cases.

This last part can be solved by logically segregating the table above (sans TYPE column) into three tables (one per type). Both the UNEMPLOYED and UNKNOWN_EMPLOYMENT tables would lack the SALARY column -- they would have only the EMP_ID as there's no more information to add. All rows in the EMPLOYED table would have a known salary.

NULLs have been removed and the design is further normalized -- some would say to the extreme.

Now, how you would model that physically without using NULL and still managing some level of performance I do not know. But that at least explains the reasoning behind NULL not being necessary.

As for myself, after nearly fifteen years of database design and implementation, I'm quite satisfied with using NULL where appropriate. I've never been befuddled by it nor sidelined by its behavior with respect to logical operators. Learn the rules and move on.

--
Freedom to fear. Freedom from thought. Freedom to kill.
I guess the War on Terror really is about freedom!

Re:'scuse my ignorance but... by Anonymous Coward · 2004-06-28 12:05 · Score: 1, Informative

multi user read consistency.
online backup.
roll-forward recovery.
stored procedures, views, triggers, analytic functions, partitioning, function-based indexes ...
Enterprise class support from the vendor.

mysql just does not cut it yet.

I don't care what the RDBMS is provided that we don't have to recode every function provided in database kernel-space in user application space (huge waste of time).

the license costs are not what they used to be.

Re:'scuse my ignorance but... by TheBracket · 2004-06-28 12:18 · Score: 2, Informative

Ugh, the lameness filter kept complaining about this!

That's why sequences (as implemented by PostgreSQL and Oracle) are handy. Simply create a sequence, and call NEXTVAL(sequence_name) to get the identity of the next record. It isn't an autoincrementer (but can be used as one with default values), and you get the advantage of knowing what ID you will use before your insert - very handy for inserting a lot of related data at once. You can also do tricks like having sequences with different increment numbers, different base values, or even concatenate them with a string to get multi-master friendly safe replication.

For a sequence/nextval to be useful, the increment must be absolutely atomic - that is, it must return the next value and increment without any chance of the same number being given to another caller. Oracle and PostgreSQL do this for you.

You can simulate these in SQL Server with the following stored procedure (original source here: Sequence table: CREATE TABLE sequences ( -- sequence is a reserved word seq varchar(100) primary key, sequence_id int ); MS SQL Server stored procedure: CREATE PROCEDURE nextval @sequence varchar(100), @sequence_id INT OUTPUT AS -- return an error if sequence does not exist -- so we will know if someone truncates the table set @sequence_id = -1 UPDATE sequences SET @sequence_id = sequence_id = sequence_id + 1 WHERE seq = @sequence RETURN @sequence_id I've used this to great effect in SQL server. I've had to emulate it in Access once (yuck! it worked, though!). I've never tried it in MySQL, but it can probably be done.

This page (see LAST_INSERT_ID(expr) talks about how to do it in MySQL. I don't use MySQL when I can avoid it; it's a great, fast database for high-read environments without triggers and similar, but I've never seemed to end up working in one of those!

--
Lead developer, http://wisptools.net

Re:Isn't XML semi-object oriented? by Doctor+Faustus · 2004-06-28 12:53 · Score: 4, Informative

Would you please enlighten me on how an XSLT job operating on a structure like above and showing all the employees who have a salary over 100k, having more than 20 directs and have travelled in the past 12 months look like? Also please include their manages names and phone numbers int he result. And what performance it would have?

It sounds like you're kidding (or being sarcastic), and I'm not going to debug it (I'd need data, anyway.), but probably something like this:

<xsl:template match="/root/Employees/Employee[@Salary>100000] " >
<xsl:variable name="EmployeeID" select="@EmployeeID" />
<xsl:variable name="NumReports" select="count(/root/Employees/Employee[@ManagerID = $EmployeeID])" />


<xsl:variable name="RecentTravel" select="boolean(/root/Trips/Trip[@EmployeeID = $EmployeeID and @Date>=20030628 ])" />

<xsl:if test="$RecentTravel and ($NumReports>20)" >
<xsl:variable name="ManagerID" select="@ManagerID" />
<xsl:text>
Name: <xsl:value-of select="@Name" />, Phone #<xsl:value-of select="@PhoneNumber" />,
Manager Name: <xsl:value-of select="/root/Employees/Employee[@EmployeeID = $ManagerID]/@Name" />
</xsl:text>
</xsl:if>

</xsl:template>

And this approach seems to perform just fine in MSXSL, which I believe is DOM-based. It might give a SAX-based engine problems, because it jumps around so much.

Lest I give a false impression, I'm not suggesting that this sort of XML replace relational databases. The point is that a roughly relational layout is still a good approach, even when you need to be working in XML.

Re:Isn't XML semi-object oriented? by LoztInSpace · 2004-06-28 17:46 · Score: 2, Informative

I think that document refers to new (connect by features) rather than (new connect by) features.
I've been happily using it on version 8. The connect_by_root & sys_connect or whatever it is may well be introduced in 10g. Certainly I could have killed for the ORDER SIBLINGS BY clause in 8 but it's only available in 9i+.

Re:FUD FUD FUD by ianezz · 2004-06-28 18:23 · Score: 2, Informative

As for "stored procedures, views, triggers, ..." these fall under the category of --USELESS FEATURES--.

...useless until you realize you have to connect to the DB from several applications written in several different languages for which you have either to reimplement your way to manipulate the data everywhere, or you have to put in a middle layer of some sort which is able to talk multiple languages (via CORBA, SOAP, plain XML RPC, custom protocol, whatever) and ensure that everyone is accessing data exclusively through it.

Reimplementing logic everywhere across different languages is usually a bad approach because it doubles the development and testing effort.

Middle layers are usually hard to get right the first time, much harder than using stored procedures and triggers, since the typical procedural language is not so at ease at manipulating relational data.

On MySQL: it has a somewhat bad reputation in the field because of the people abusing it: it's fast, it's free, it's easy to set up, but living with these gotchas is definitively too painful for developers more concerned about correctness than speed.

In the end: firing up Firebird/Oracle/PostgreSQL/SAPDB for simple data is plainly stupid, but often it is done anyways since they do a decent job even in that cases; firing up MySQL for your 30+ GB DB containing your whole network topology which is used by everyone for billing, service assurance, troubleshooting, network planning and whatever is stupid as well. MySQL AB knows that, and in fact it now proposes MaxDB (was: SAPDB) as well.

Identity and the standard by tarvin · 2004-06-28 19:54 · Score: 2, Informative

You are wrong. The SQL:2003 standard specifies IDENTITY. See http://troels.arvin.dk/db/rdbms/#mix-identity

Oh dear by Ankh · 2004-06-29 04:18 · Score: 3, Informative

I'm not going to try and reply in detail, but since I participate in the W3C XML Query Working Group and am also the w3C XML Activity Lead, a few comments may be useful.

The article seems to says "I don't like SQL and I don't like XML and I think XML Query is about mergin them although I don't understand it very well, so the people working on XML Query must be stupid, and in any case it's easier to attack people than understand a specification".

Perhaps that's unfair, but it's clear to me that the writer is a little fuzzy on the design goals of XML and also on the focus of SQL development over the past 10 or 15 years.

In both cases the story is about interoperability.

If you look at the XML Query Home Page you'll see approximately two dozen implementations of the XML Query draft, including a number of open source ones. If you look at the public mailing list for comments, you'll see we received over 1100 detailed technical comments at the last public review. So there's a lot of interest in this work.

Why is that? One reason is that, like Web services and SOAP, XML Query is able to replace a lot of proprietary and hard-to-maitain middleware. Another reason is that for the first time we'll have a standard way to search over multiple kinds of data source.

Don is the primary editor of the XQuery language, but the technical decisions reflected in the specification are a result of collaboration, and are agreed on by aconsensus process by a much larger number of particpants. The goal is to make a language that people agree to implement and to use. With support announced by Microsoft, Oracle, IBM, BEA and others (see Web page mentioned above) and judging by the public interest, I think it's fair to say that's going to happen.

It's pretty rare to see a large complex system that everyone is happy with. It's actually pretty rare to see a small system that everyone is happy with. There are people who are unhappy with some features in the Unix cat program, but it's better to have cat in every Unix system than to have millions of shell scripts break on systems where it's missing! The trick, then, is often to include features that will lead to massively wider adoption, even if some people would rather be without them.

Then we have (as part of W3C Process) a public call for implementations so that we can test to see how confident we are that all the major features can be implemented compatibly (i.e. interoperably) in multiple independent implementations.

Features that were not implemented get removed before the specifications are final.

Is XML Query a waste of time? Is XML evil? Is SQL evil? A lot of people think otherwise, and some of them are pretty smart, so if you are concerned, take the time to read the specs and decide for yourself. :-)

--
Live barefoot!
free engravings/woodcuts

Ok SQL is wrong...and?? by cjb110 · 2004-07-02 19:37 · Score: 2, Informative

I've read the article (maybe i've misread it but its 8:30am and a saturday:)) and although I can see lots of bashing of SQL, XML and nulls going on I can't see any 'alternative' or solutions suggested.

--
----- I refuse to have an argument with an unarmed person

Slashdot Mirror

SQL, XML, and the Relational Database Model

27 of 453 comments (clear)