SQL, XML, and the Relational Database Model
Kardamon writes "In an article on DBAzine, Fabian Pascal writes that SQL is not a good representation of the relational model, and is afraid the situation will get worse with XML and XQUERY. An overview of some of the reactions on the positions Pascal and also C.J. Date take on this issue is given in this article over at SearchDatabase.com by Sara Cushman."
From the article:
It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular
Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.
Everybody goes through a phase where they bitch about SQL. So did I. And I built a clever OO DataModel module that abstracted it into pretty heirarchies and all sorts of clever crap. Then I tried actually building systems with it and realized I was better off with ugly ol' SQL.
I've been part of too many projects where people pulled out the UML books in favor of a decent First Normal Form DB design and led the team down the tubes.
I'm not saying these other methods don't have their place -- they do. But they aren't going to displace SQL because it has it's place also. And it's place isn't theoretical, it's been practically demonstrated a million times.
Cheers.
The problem with standards is their's so many to choose from. Or in the case of SQL every vendor seems to think that the standardized language is inadequate and yet they make no roads towards improving the standard. This leads to every vendor having their own superset of the language which makes maintainability in cross database projects exceedingly difficult and migration in applications that aren't designed for it incredibly difficult. As to fundamental flaws in the concepts around SQL I have yet to hear a concrete argument against it, mostly vague rantings from people who's ideas weren't chosen by the marketplace to serve real world needs.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
One problem is that as a language it lacks elegance and is awkward to build large queries in. More deeply smug relational weenies insist that it does not properly model the relational algebra model pioneered by Ted Codd.
I'm not sufficiently versed in database theory to understand the technical side, but SQL certainly does feel to me like a non-optimal solution.
Google confirms: Ruby is the world's most beloved programm
As a rant, this article does a great job. But here's what is missing - what the heck is he talking about? Everything he says is liberally sprinkled with statements telling us these things are self-evident, when they are anything but. He is constantly is referring to how this will clearly show that, or pointing out that this proves this or that later on, but never gets there.
Can anyone summarize what is being said here in some sort of logical way? Because I'm confused. I see the title, I see no information supporting the title (unless, perhaps, I was to do the research myself).
What exactly is the problem with SQL?
Wouldn't call it a problem, but there just seems to be something about it that drives all the Oo fanboys up the wall - maybe it's the fact that they can't make nice UML diagrams of a query or something.
There nothing wrong with SQL and RDBMS - they've been around for years and they'll be around for years to come. I have this argument each and every day at work with people who seem to think that the solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code and invest a king's ransom in application server licenses to run it on. Don't get me wrong: i've seen some real horrorshow coding with SQL mixed into code, but a bad coder will produce bad code in any language. Until then, SQL works. What more can you say?
#!/usr/bin/english
SQL is meant for a relational database. XML is a hiearchial database... have you never worked on a project where your manager HAD to convert a projects database in XML because it was the new shiny buzzword? It NEVER works out very well because they are logically different. The same will go for SQL->XQuery.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Try to model anything moderately complex with XML, where things have many to many relationships with each-other. Nesting becomes impossible.
A legacy application is one that works. And the same can be said of SQL. Actually XML works too.
It's important to understand what XML is replacing - binary or proprietary interfaces. This means an acceptable tradeoff between human readability (a hugely underrated requirement of a structured data format) and efficiency.
An example is EDI vs XML. EDI "efficiency" accrues only to the intermediary that invented the means to setup trading relationships across their proprietary network. XML uses the end users' bandwidth but it simultaneously eliminates the intermediary completely. This single fact saves literally millions in kilocharacter and storage charges.
SQL is what my old compsci prof would call opportunity-cost efficient. It's quick and can be implemented a number of ways - it's a data query framework, not a requirement for databases. There may be better ways to do it but you'll have a tradeoff somewhere.
And if someone offers something better, then when they've finished telling you how good the new way is, wait and listen for the inevitable sales pitch.
"It's not your information. It's information about you" - John Ford, Vice President, Equifax
No kidding. This sounds like Andy griping that Linus' "school project" is an inferior kernel.
Sounds like a semantic argument to me; where the Rubber Meets The Road, Linux is the kernel of a variety of widely used, production-quality OS's, while Minix is an academic *model* (on purpose, to be sure, but a *model* rather than a useable-on-a-daily-basis kernel, nevertheless). Similary, claiming SQL is crappy for various academic/theoretical reasons doesn't change the fact that it's in wide use today, as a concrete solution to any number of million- and billion-dollar abstract problems.
So, if SQL is so bad, maybe they should stop cursing the darkness, and show us the light. In the meantime, people will use (and incrementally) improve the tools at hand to solve the problems at hand.
Right now, SQL-based database solutions are generally the best solutions for *real* data problems that we have to solve, from mySQL-driven personal webpages, to enterprise-grade databases powering major websites, business-to-business e-commerce, and everything in between.
Invent a better mousetrap, and the world will beat a path to your door. Criticizing the mousetrap as an inferior pest control device doesn't do much to keep the mice out...
Xentax
You shouldn't verb words.
I read this and pretty much gave up getting anything of value out of this article -- I hadn't understood much that went before it, though my distrust of all things XML had led me to believe this guy might know what he's talking about.
If you removed NULLs from relational database design, people would reinvent them (poorly) -- probably by using IDs of -1 or 0, or IDs to a special magic "null" row, which I suspect is what he's talking about by "it can be handled relationally." To suggest that missing or inapplicable values are not part of "the real world" is so wrong it's... well... wrong. Anyone who's actually done database work (or programming work, for that matter) knows this.
Yeah, I know what you mean. These kids can't wrap their tiny minds around the following concepts:
Whenever I see a project gone horribly wrong, and the language is C++ or Java, the problem usually is the system architect didn't grok the above statements. They should be tattooed onto the forehead of every OO programmer, so when they're "pair programming" they can read it off each other.
solution to (hypothetical) "database bottlenecks" is to bury everything in a quarter of a million lines of EJB code
I get that too - then I show 'em the logs that show the database processor is mostly idle as it waits for their application to either request more data or finish working on what it's got!
1) You can write a given query and number of different ways. This is not necessarily a SQL problem but due to this the query optimizers have to be enormously complex to handle complicated queries and by association you can have queries which describes two identical sets but have vastly different runtimes/costs.
Just because two queries return the same results today do not mean that they will continue to do so in the future. If a value that used to be bounded from 1 to 10 suddenly is declared to be allowed to be cranked to 11, then suddenly "equal or greater than 9" and "equal to 9 or equal to 10" will have gone from always returning the same results to now specifying different sets. Clearly, the more specific code will execute faster, but if an assumed boundry no longer holds in the future, the program will become obsoleted and require revision to the less specific version. This isn't a language-specific issue, it's just a problem that crops up whenever a computer program encounters a situation its designer wasn't expecting.
2) Little/No support for relational domains (e.g. complex data types)
Not a bug, it's a feature. The S in SQL is for "structure"... go hammer out your data into a structured format rather than a complex one and then come back.
3) Non-updateable views (partially due to duplicate handling and/or allowing relations with no primary key)
Totals will always be a non-updatable view. You can't change the number of objects you have without creating some new objects or chosing to get rid of some existing objects. Fields in a one-to-many relationship cannot be changed because to do so would be ambigious... do you want to create a new entry in the other table, or do you want to rename an existing entry in the other table. Go do what you meant to do, then refresh your view.
4) Weak support for complex integrity constraints (e.g. business rules)
That's more an issue for applications rather than databases. The program or user that's creating the query should know what's allowed by business rules, because if the database is going to refuse a query due to business rule violations, that query shouldn't have been offered to the database in the first place. Those errors should be trapped upstream before they get that far. SQL triggers for business rules should be a last line of defense, not something that should be regularly asked to function.
5) No support for entity sub/supertype relationships
Plenty of support, just not intrinsically. Just use a one-to-many relationship in your DB structure and go along your way.
6) Supports NULLs (Date/Pascal/Darwin do not like NULLs)
That's like trying to do math without a concept of zero. Sometimes, things just don't apply and we put "N/A" on the form and "NULL" in the database.
This quote needs to be placed toward the beginning of the Grand Encyclopedia of Intellectual Arrogance. Let's see, you have flat tables with a defined primary key and you form relations between these flat tables.
I do agree that SQL is not the best possible query language, but it succeeds where the other languages fail, it is easy for people to grasp and manipulate. Likewise, HTML has many faults. Plain HTML is still the preferred choice of most web designers because it is easy to learn and write.
Personally, I think the primary intellectual impulse is to add convolution to simple processes. There will never be an end to the stream of blither about how nulls cannot exist, and anyone who simply uses an sequence counter as a primary key is the devil incarnate. HTML and SQL have two things that almost all the stuff coming from arrogant snits like this author lack. They were designed by people who were actually doing stuff.
This quote needs a position in the library of intellectual arrogance as well:
A snit crassly dismisses several millenia of literature because it is unstructured.
Quite frankly, meaning and structure are independent of each other. It is possible to find meaning in things with radically different structures. It is true that there is a correlation between structure and the ability to communicate meaning, but a healthy mind can find meanings in things that have not been normalized.
Likewise, you can have meaningless garbage in relational databases. A case in point is the large number of fake web sites that do things like join the FIPS database to product names so that they can have millions of pages that show up in search engines. Likewise, we see academician filling volume after volume of publications with meaningless tripe.
I know some people who do in some cases, but I wouldn't exactly call that a standard procedure. Or call those people DB designers for that matter. 'Cause that is NOT database design.
You design a DB best with a pen and a large sheet of paper. Or some drawing tool your extremely good at.
SQL is the language you feed you results into the box so it builds a more or less representative imprint of the abstract reality you've designed. Which can be as relational as you want it to - as long as it meets the physical constraints of non-abstract reality. As soon as you put it onto a computer, you'll have to cut corners. That's the difference between a database _model_ and a database _implementation_. That takes stuff into account like DB load, DB Server Features and data types.
Types for instance - somewhat relevant when dealing with DB Servers and SQL - are a thing you don't want to touch with a ten-foot pole when designing a _model_.
I'm suprised a supposedly db expert guy get's all worked up about this and doesn't seem to be able to keep apples and pears apart.
Anyone initially designing a non-trivial DB with SQL and - on top of that - bitchering about this DB language not being rational deserves a clobbering.
My 2 cents.
We suffer more in our imagination than in reality. - Seneca
table: order ID, part number, quantity shipped
select part number, sum(quantity shiped), avg(quantity shipped)
Group By Part Number;
This works with NULLs in the column for quantity shipped on parts which have not yet been shipped. If you just use zero for "no shipment" then your average number will have no real value for answering questions like "how much do we spend shipping these parts, on average?" etc.
If you wan't to throw an exception you can throw the execption or not in your program. In that case you fetch the individual values and do the math yourself and the "that's not a number" that is caused by the null gives you the chance to throw your exception.
But since, in aggregate operations, your program isn't even interractin with the data yet, where would such an exception go?
What would the SQL syntax be communicating a list of results PLUS a list of exceptions to your program? Which order would things be processed in?
Your boolean analogy is also flawed. "You have stopped beating your wife?" is not a yes-or-no question because it carries a predicate around with it that you may not fulfill for serveral reasions (not married; you are hetrosexual female, so you don't have a "wife", you have never beaten your wife so you can't "stop" doing it; etc). There are a surprisingly large number of "real data" that nature. For those of you who have trouble abstracting this, the "real comparason matrix" is "True, False, and Not Applicable". NILL buys you "Not Applicable" so very cheaply.
In poin of fact, people who don't like NULL, usually because they don't understand its purpose and use, make a hell of a lot of work for themselves.
My current employer has a large database of test values that grows by huge numbers of elements each day. The programmer "didn't understand" NULLs (ro RDBMS' for that matter) and has "-" in fields that should be NULL.
Consequently we cannot aggregate. All of our client applications end up haveing to bulk-fetch whole table ranges and run through elaborate statistical routines full of conditionals; or do separate fetches with "field != '-'" in the where clause and run a concordance operation in ram after the repeated bulk fetches.
This costs bocup time and degrades the quality of the product.
You call "academic bullshit", I suspect you have never had to work the really large or significant data sets. I suspect that you don't ever ask the server-side to aggregate for you. And I suspect you have never worked time-critical transactions across a "slow" link.
You can't have. You think of "NULL" in terms of equality.
I will give you the "syntatic" point that "Where X = NULL" ought to be unversal. But, for instance, the cartesian nightmare of having "NULL == NULL" in a join is beyond idiotic.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
One thing that everyone should understand: even though Pascal, Date, etc... argue that SQL is a bad implementation of the relational model, they *still* agree that it provides value, and that it is miles better than its hierarchical database predecessors. Since it is their job(s) to provide a reasoned critique of the field, it is only natural for them to rigorously compare SQL to the goals of the relational model. Many of their complaints fall in these categories (although there is much more):
1. Does too much -- too many ways of doing the same thing, and too many unecessary operations that could be better done another way.
2. Overly complex -- the SQL 1999 standard was something like 1200 pages.
3. Allows programmer to circumvent relational integrity. Things like "hidden identifiers", pointers, etc...
4. Too wrapped up in implementation -- users must spend a lot of time understanding the physical storage, rather than focusing on queries in abstraction.
5. Many small inconsistencies in SQL itself
But the problems with SQL are impossible to judge if you only know SQL. It's like the people who used to ask what was wrong with a perfectly good typewriter that made people want to use a word processor. To any who are curious, I suggest you do some reading. The absolute best simple introductions for these problems are in a two online documents by Hugh Darwen at www.thethirdmanifesto.com. Look for "The Askew Wall", and "The Importance of Column Names".