SQL, XML, and the Relational Database Model

"NULLS are bad." quote by MattRog · 2004-06-28 08:08 · Score: 5, Informative

Celko is misquoting Darwin in saying that "The idea that you will always know everything is arrogant".

Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:
http://www.hughdarwen.freeola.com/TheThirdManifest o.web/Missing-info-without-nulls.pdf

And yes, XML DBMS are a throwback to IBM IMS and other hierarchical DBMS products. Anyone who has ever used a hierarchical DBMS will tell you that there are some pretty non-trivial problems that you cannot work around due to their hierarchical data model, yet XML DBMS proponents propose we go back to that old, inflexible system!

--

Thanks,
--
Matt

Re:"NULLS are bad." quote by MattRog · 2004-06-28 08:15 · Score: 5, Interesting

To those not "in the know" here's some further clarification:

"The use of the terms "flat tables" or "2D tables" to describe data stored in a relational database is wrong, he added."

Basically what I take from this is that the table (e.g. SELECT * FROM foo) is simply a convenient logical representation of a stored relation. That is to say, foo can be implemented by the DBMS as a linked list, a tree, any data structure. The problem is that current SQL DBMS products do NOT do this and so we have the associated performance problems with normalized schemas. If the DBMS was truly a RDBMS then it could optimize the physical storage to improve performance.

When asked if the relational model was implemented soundly in today's systems, Craig Mullins' instant reply was "no," but he doesn't think the situation is as bad as Date says it is.
"We're doing production work and delivering value," Mullins said. "Isn't that what it is all about?"

The question is not "Are current SQL systems providing value" because certainly they are. They overthrew the hierarchic DBMS products for good reason - they were better. The real question is "Are the current SQL systems providing all the value they can". One can simply look at the wide array of DBMS offshoot products like XML DBMS, so-called "Multivalued DBMS" etc. to know that there are significant limitations of SQL products - ones which Date/Pascal/Darwin stress are not limitations of the Relational Model but merely these SQL products. To put words in their mouth, but I don't think they'd disagree at my summation, they'd suggest that if someone were to implement a Truly Relational Database Management System that these other products would quickly become obsolete.

--

Thanks,
--
Matt

Link to history of SQL.. by MisanthropicProgram · 2004-06-28 08:13 · Score: 5, Informative

Here is a history of SQL. I wanted to check the article's facts. Also, I was curious... History of SQL

Re:'scuse my ignorance but... by XMyth · 2004-06-28 08:14 · Score: 5, Funny

Well, all the apps which are backed by SQL databases are crashing all over the place. After its several years in the field now SQL has been proven to be unstable, unreliable, and completely incapable of doing the job.

Evidence of this is in the hundreds of companies who are completely unable to maintain a database of any significant size despite vendor claims to the contrary. Also, note the thousands of websites which routinely fail due to random database problems. It appears that all SQL products are sad implementations of a horrible standard which simply does not cut it .

(the above is intended entire as sarcasm)

SQL sucks? by localman · 2004-06-28 08:15 · Score: 5, Insightful

From the article:

It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular

Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.

Everybody goes through a phase where they bitch about SQL. So did I. And I built a clever OO DataModel module that abstracted it into pretty heirarchies and all sorts of clever crap. Then I tried actually building systems with it and realized I was better off with ugly ol' SQL.

I've been part of too many projects where people pulled out the UML books in favor of a decent First Normal Form DB design and led the team down the tubes.

I'm not saying these other methods don't have their place -- they do. But they aren't going to displace SQL because it has it's place also. And it's place isn't theoretical, it's been practically demonstrated a million times.

Cheers.

Re:SQL sucks? by kpharmer · 2004-06-28 08:28 · Score: 5, Insightful

Keep in mind that Fabian Pascal is generally considered a crackpot purist. He's been insisting for years that there is no such thing as a relational database product - since none implement a purely relational model.

However, he hasn't delivered an exaple of one, he hasn't clearly articulated the differences between his vision and the commercial options, and he apparently refuses to acknowledge that some problems in life fail to fit well into the relational model (hierarchies, networks, inheritance, etc).

Much of what he, Celko, and Date complained about were actually responses by vendors to adapt to the real world. They were somewhat successful - and now SQL can be used successfully to solve a far greater set of problems than Pascal has ever admitted exist.

A perfect example of this nonsense is there insistence that good indexing in a 3NF model outperforms denormalized data in a star schema. Sounds great, absolutely doesn't work. Across eighteen years working with relational databases I've never seen their suggestions work. Of course they have a response to this - the vendors should just "make the databases faster" - like it's fucking magic or something.

Of course, this isn't to say that he's wrong about xquery - trying to work with unbalanced networks or hierarchies in which the rules change change throughout the schema causes a few problems.

We already have extensive support for recursion & networks in the more powerful RDMBS (db2 & oracle for sure). But combining that with data structures supporting optional branching, complete lack of declarative constraints, optional rules, etc - sounds like something that will never work well.

And going back to the days in which you have to spend a day writing code against a hierarchical database in order to answer a simple question sucks.

Hmmm, haven't people gotten tired of the xml hype yet?

Re:'scuse my ignorance but... by afidel · 2004-06-28 08:18 · Score: 5, Insightful

The problem with standards is their's so many to choose from. Or in the case of SQL every vendor seems to think that the standardized language is inadequate and yet they make no roads towards improving the standard. This leads to every vendor having their own superset of the language which makes maintainability in cross database projects exceedingly difficult and migration in applications that aren't designed for it incredibly difficult. As to fundamental flaws in the concepts around SQL I have yet to hear a concrete argument against it, mostly vague rantings from people who's ideas weren't chosen by the marketplace to serve real world needs.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

What the?!... by __aagmrb7289 · 2004-06-28 08:19 · Score: 5, Insightful

As a rant, this article does a great job. But here's what is missing - what the heck is he talking about? Everything he says is liberally sprinkled with statements telling us these things are self-evident, when they are anything but. He is constantly is referring to how this will clearly show that, or pointing out that this proves this or that later on, but never gets there.

Can anyone summarize what is being said here in some sort of logical way? Because I'm confused. I see the title, I see no information supporting the title (unless, perhaps, I was to do the research myself).

Re:'scuse my ignorance but... by tanguyr · 2004-06-28 08:20 · Score: 5, Insightful

What exactly is the problem with SQL?

Wouldn't call it a problem, but there just seems to be something about it that drives all the Oo fanboys up the wall - maybe it's the fact that they can't make nice UML diagrams of a query or something.

There nothing wrong with SQL and RDBMS - they've been around for years and they'll be around for years to come. I have this argument each and every day at work with people who seem to think that the solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code and invest a king's ransom in application server licenses to run it on. Don't get me wrong: i've seen some real horrorshow coding with SQL mixed into code, but a bad coder will produce bad code in any language. Until then, SQL works. What more can you say?

--
#!/usr/bin/english

Re:'scuse my ignorance but... by MattRog · 2004-06-28 08:21 · Score: 5, Informative

There are a couple of "problems" that they have identified:
1) You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.
2) Little/No support for relational domains (e.g. complex data types)
3) Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)
4) Weak support for complex integrity constraints (e.g. business rules)
5) No support for entity sub/supertype relationships
6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)

Try searching www.dbdebunk.com for SQL. Or pick up the great book "Practical Issues in Database Management" by Fabian Pascal.

--

Thanks,
--
Matt

Error: User doesn't know how to use program. by LostCluster · 2004-06-28 08:24 · Score: 5, Interesting

Most of the problems that I've seen with SQL commands that are more complex than they really should be are a result of mistaken assumptions made during the design phase of the database. As a result, extra tables get added late, and therefore create new "features" that code then has to be revised to take advantage of...

XML's going to be no better in this area. Mistakes made during the design phase will always come back to haunt while implementing and using the system. If a single query can't return the results desired, then that should have been thought of while designing the tables of the DB. Trying to get a query to specify "All things that are red" like Pascal suggests is only going to work if all objects implement the "color" property the same way. If somebody uses CMYK, somebody else uses RGB, and a third uses Play-Doh color names, it's still gonna be a mess that requires code to figure out who really matches whom.

I don't see how this "new model" fixes the real problems with working with SQL between databases that weren't designed to work with each other.

Peddling a better way? by BillsPetMonkey · 2004-06-28 08:27 · Score: 5, Insightful

A legacy application is one that works. And the same can be said of SQL. Actually XML works too.

It's important to understand what XML is replacing - binary or proprietary interfaces. This means an acceptable tradeoff between human readability (a hugely underrated requirement of a structured data format) and efficiency.

An example is EDI vs XML. EDI "efficiency" accrues only to the intermediary that invented the means to setup trading relationships across their proprietary network. XML uses the end users' bandwidth but it simultaneously eliminates the intermediary completely. This single fact saves literally millions in kilocharacter and storage charges.

SQL is what my old compsci prof would call opportunity-cost efficient. It's quick and can be implemented a number of ways - it's a data query framework, not a requirement for databases. There may be better ways to do it but you'll have a tradeoff somewhere.

And if someone offers something better, then when they've finished telling you how good the new way is, wait and listen for the inevitable sales pitch.

--
"It's not your information. It's information about you" - John Ford, Vice President, Equifax

Re:'scuse my ignorance but... by sql*kitten · 2004-06-28 08:37 · Score: 5, Insightful

there just seems to be something about it that drives all the Oo fanboys up the wall

Yeah, I know what you mean. These kids can't wrap their tiny minds around the following concepts:

A table is not a class
A row is not an object
A column is not a property

Whenever I see a project gone horribly wrong, and the language is C++ or Java, the problem usually is the system architect didn't grok the above statements. They should be tattooed onto the forehead of every OO programmer, so when they're "pair programming" they can read it off each other.

solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code

I get that too - then I show 'em the logs that show the database processor is mostly idle as it waits for their application to either request more data or finish working on what it's got!

Re:'scuse my ignorance but... by MattRog · 2004-06-28 09:01 · Score: 5, Informative

"Just because two queries return the same results today do not mean that they will continue to do so in the future."

Total misunderstanding of what I wrote. To put it another way:
SQL allows you to write queries which are mathematically equivalent but result in vastly different query plans and performance. Again, not a particularly stinging-indictment of SQL as such but had it been designed differently it could have avoided such ambiguity in the language.

"Not a bug, it's a feature. The S in SQL is for "structure"... go hammer out your data into a structured format rather than a complex one and then come back."

So you're saying a tree has no "structure"? That a domain has no structure? If it had no structure, it would be a little difficult for computers to process.

"View stuff"
Pascal (or Date, can't remember) provides an iron-clad (mathematical definition) method of creating views which will always be updatable. There are structural deficiencies in SQL which prohibit this. I will not waste time/typing here illustrating them, they are all identified at their web site.

"SQL triggers" etc.
It is precisely the reason that applications were enforcing business rules that DBMS were invented all those years ago! There are plenty of reasons that application-enforcement of business rules is a bad thing. Again these are illustrated on their web site. Also, your quote about "SQL triggers" is basically re-stating what I mentioned: that SQL is poor at implementing business rules!

"Plenty of support, just not intrinsically."
Which is exactly the same as saying "no support for entity sub/supertypes". Plus, one-to-many tables are not the same, you're thinking of something else. Chapter 6 of Fabian Pascal's book "Practical Issues in Database Management" covers this in some depth.

"That's like trying to do math without a concept of zero."
Not quite the same. Remember that the relational model is based upon predicate logic and set theory. Set theory has the empty set, which is not the same as NULL. SQL products currently handle null in a ridiculous manner (some sort NULL oddly, comparison is difficult, summation is odd). Pascal/Date do not suggest that the concept of "unknown" is bad, just that the SQL representation as NULL is.

--

Thanks,
--
Matt

Re:Isn't XML semi-object oriented? by yintercept · 2004-06-28 09:06 · Score: 5, Insightful

One does not know whether to laugh or cry. It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular

This quote needs to be placed toward the beginning of the Grand Encyclopedia of Intellectual Arrogance. Let's see, you have flat tables with a defined primary key and you form relations between these flat tables.

I do agree that SQL is not the best possible query language, but it succeeds where the other languages fail, it is easy for people to grasp and manipulate. Likewise, HTML has many faults. Plain HTML is still the preferred choice of most web designers because it is easy to learn and write.

Personally, I think the primary intellectual impulse is to add convolution to simple processes. There will never be an end to the stream of blither about how nulls cannot exist, and anyone who simply uses an sequence counter as a primary key is the devil incarnate. HTML and SQL have two things that almost all the stuff coming from arrogant snits like this author lack. They were designed by people who were actually doing stuff.

This quote needs a position in the library of intellectual arrogance as well:

Indeed, data/information management requires "some organizing principle"; that is, structure; anything "unstructured" -- and many in the industry promote XML for that purpose -- is not data, but meaningless random noise that carries no information.

A snit crassly dismisses several millenia of literature because it is unstructured.

Quite frankly, meaning and structure are independent of each other. It is possible to find meaning in things with radically different structures. It is true that there is a correlation between structure and the ability to communicate meaning, but a healthy mind can find meanings in things that have not been normalized.

Likewise, you can have meaningless garbage in relational databases. A case in point is the large number of fake web sites that do things like join the FIPS database to product names so that they can have millions of pages that show up in search engines. Likewise, we see academician filling volume after volume of publications with meaningless tripe.

You Are Missing the point of NULLs entirly by IBitOBear · 2004-06-28 11:43 · Score: 5, Insightful

table: order ID, part number, quantity shipped

select part number, sum(quantity shiped), avg(quantity shipped)
Group By Part Number;

This works with NULLs in the column for quantity shipped on parts which have not yet been shipped. If you just use zero for "no shipment" then your average number will have no real value for answering questions like "how much do we spend shipping these parts, on average?" etc.

If you wan't to throw an exception you can throw the execption or not in your program. In that case you fetch the individual values and do the math yourself and the "that's not a number" that is caused by the null gives you the chance to throw your exception.

But since, in aggregate operations, your program isn't even interractin with the data yet, where would such an exception go?

What would the SQL syntax be communicating a list of results PLUS a list of exceptions to your program? Which order would things be processed in?

Your boolean analogy is also flawed. "You have stopped beating your wife?" is not a yes-or-no question because it carries a predicate around with it that you may not fulfill for serveral reasions (not married; you are hetrosexual female, so you don't have a "wife", you have never beaten your wife so you can't "stop" doing it; etc). There are a surprisingly large number of "real data" that nature. For those of you who have trouble abstracting this, the "real comparason matrix" is "True, False, and Not Applicable". NILL buys you "Not Applicable" so very cheaply.

In poin of fact, people who don't like NULL, usually because they don't understand its purpose and use, make a hell of a lot of work for themselves.

My current employer has a large database of test values that grows by huge numbers of elements each day. The programmer "didn't understand" NULLs (ro RDBMS' for that matter) and has "-" in fields that should be NULL.

Consequently we cannot aggregate. All of our client applications end up haveing to bulk-fetch whole table ranges and run through elaborate statistical routines full of conditionals; or do separate fetches with "field != '-'" in the where clause and run a concordance operation in ram after the repeated bulk fetches.

This costs bocup time and degrades the quality of the product.

You call "academic bullshit", I suspect you have never had to work the really large or significant data sets. I suspect that you don't ever ask the server-side to aggregate for you. And I suspect you have never worked time-critical transactions across a "slow" link.

You can't have. You think of "NULL" in terms of equality.

I will give you the "syntatic" point that "Where X = NULL" ought to be unversal. But, for instance, the cartesian nightmare of having "NULL == NULL" in a join is beyond idiotic.

--
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press

Re:Rediculous by (negative+video) · 2004-06-28 14:07 · Score: 5, Insightful

...the SQL language retrieves information from SQL databases, not relational ones (the former, Chamberlin's own contribution) due to failure to understand the latter. ...
Makes me wish I understood a bit more, for it's all a bit confusing.

Fabian Pascal is smart and well-informed, but a zealot. Like all zealots he is willing to sacrifice anything and everything for his vision of technical purity.

One of his specific complaints is about SQL NULL values being "unrelational". As an example, a real-world designer might use the following table in a genealogy DB:

CREATE TABLE people ( person_id INTEGER NOT NULL, name VARCHAR(100) NOT NULL, birth DATE NOT NULL, death DATE )

If a person hasn't died yet, then people.death would be NULL. Well that just isn't relational enough for our friend Pascal. Since relations can be used to express optional values, then by God they have to be:

CREATE TABLE people ( person_id INTEGER PRIMARY KEY, name VARCHAR(100) NOT NULL, birth DATE NOT NULL )
CREATE TABLE deaths ( person_id UNIQUE INTEGER REFERENCES people, death_date DATE NOT NULL )

It's pure, correctly formed, and worse than useless. It causes a profusion of tables: one for every optional value. It turns every simple query ("tell me useful stuff about this person") into a join ("find matching rows from several tables about this person"). The database server has to waste time enforcing deaths.person_id's pointless UNIQUE constraint. Cascading deletion has to be used to clean up deaths when a row is deleted in people.

The simple fact is that the world is full of optional, single-valued data. NULL-allowed columns express that data efficiently, without confusion, and without breakage. Community college database designers have no trouble using the convention productively. It may be a little inelegant, but it is pragmatic and balanced engineering. The only whining you hear is from zealots like Pascal who heap fire and fury on others, but never seem to deliver the mythic PerfectoRDBMS.

Slashdot Mirror

SQL, XML, and the Relational Database Model

17 of 453 comments (clear)