SQL, XML, and the Relational Database Model
Kardamon writes "In an article on DBAzine, Fabian Pascal writes that SQL is not a good representation of the relational model, and is afraid the situation will get worse with XML and XQUERY. An overview of some of the reactions on the positions Pascal and also C.J. Date take on this issue is given in this article over at SearchDatabase.com by Sara Cushman."
Of course XML is going to be hard to represent in a relational database. Unless your tables are ( id, object text) and you pull out your XML and parse it.
Celko is misquoting Darwin in saying that "The idea that you will always know everything is arrogant".
t o.web/Missing-info-without-nulls.pdf
Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:
http://www.hughdarwen.freeola.com/TheThirdManifes
And yes, XML DBMS are a throwback to IBM IMS and other hierarchical DBMS products. Anyone who has ever used a hierarchical DBMS will tell you that there are some pretty non-trivial problems that you cannot work around due to their hierarchical data model, yet XML DBMS proponents propose we go back to that old, inflexible system!
Thanks,
--
Matt
Drill baby drill - on Mars
I think relational SQL databases are just fine, easy enough to use, and serve their purpose very well. They DO take some serious thought when designing tables and queries - but we shouldn't be afraid to think. If your head hurts from SQL, keep on it for a couple more minutes and you'll probably have it! If your head hurts from SQL, you've tried thinking about it - and you still don't get it - you're probably in the wrong business. Complex information retrieval is complicated and sometimes difficult to understand.
On to the next part. XML serves its purpose very well. Although I wouldn't consider XML and SQL to serve the same problem sets equally well. There are certain situations where SQL is perfect. And there are other situations where XML is preferable. If you think of the two as two sides of the same coin, I think you're making a mistake. Likewise, you can't just flip between the two on a whim. Choose the format that's most suited to what you want to do and go forward.
It aint broke and don't need fixin'
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
Here is a history of SQL. I wanted to check the article's facts. Also, I was curious... History of SQL
Easy to use, easy to debug, easy to understand, powerful. Isn't this good enough?
got sig?
From the article:
It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular
Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.
Everybody goes through a phase where they bitch about SQL. So did I. And I built a clever OO DataModel module that abstracted it into pretty heirarchies and all sorts of clever crap. Then I tried actually building systems with it and realized I was better off with ugly ol' SQL.
I've been part of too many projects where people pulled out the UML books in favor of a decent First Normal Form DB design and led the team down the tubes.
I'm not saying these other methods don't have their place -- they do. But they aren't going to displace SQL because it has it's place also. And it's place isn't theoretical, it's been practically demonstrated a million times.
Cheers.
Readers interested in what Pascal and Date have to say may be interested in visiting Database Debunkings. It has lots of articles written by one or both of them.
Personally, I recommend the articles written by Date because they are clearly, concisely, accurately, and calmly written. Pascal's tend to turn into a rant, which I wouldn't mind but he always seems to refer to his books rather than give a detailed justification of his arguments.
As a rant, this article does a great job. But here's what is missing - what the heck is he talking about? Everything he says is liberally sprinkled with statements telling us these things are self-evident, when they are anything but. He is constantly is referring to how this will clearly show that, or pointing out that this proves this or that later on, but never gets there.
Can anyone summarize what is being said here in some sort of logical way? Because I'm confused. I see the title, I see no information supporting the title (unless, perhaps, I was to do the research myself).
I'm a pretty good software developer, but if someone doesn't explain to me what the argument is in plain english without extreme haughtiness, I'm going to write off this whole issue as a pissing contest.
~D
This sig has been enciphered with a one-time pad. It could say almost anything.
I described a general vision for this sort of formalism in a prior slahsdot post. Suffice to say some progress has been made since then -- and work in other areas is starting to converge. There is much yet to be done.
Seastead this.
SQL is meant for a relational database. XML is a hiearchial database... have you never worked on a project where your manager HAD to convert a projects database in XML because it was the new shiny buzzword? It NEVER works out very well because they are logically different. The same will go for SQL->XQuery.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
And the problem happens as soon as you want to know which producs uses a specific screw. Doing that query against the xml would be difficult, and/or extreamly slow.
Martin
Most of the problems that I've seen with SQL commands that are more complex than they really should be are a result of mistaken assumptions made during the design phase of the database. As a result, extra tables get added late, and therefore create new "features" that code then has to be revised to take advantage of...
XML's going to be no better in this area. Mistakes made during the design phase will always come back to haunt while implementing and using the system. If a single query can't return the results desired, then that should have been thought of while designing the tables of the DB. Trying to get a query to specify "All things that are red" like Pascal suggests is only going to work if all objects implement the "color" property the same way. If somebody uses CMYK, somebody else uses RGB, and a third uses Play-Doh color names, it's still gonna be a mess that requires code to figure out who really matches whom.
I don't see how this "new model" fixes the real problems with working with SQL between databases that weren't designed to work with each other.
A legacy application is one that works. And the same can be said of SQL. Actually XML works too.
It's important to understand what XML is replacing - binary or proprietary interfaces. This means an acceptable tradeoff between human readability (a hugely underrated requirement of a structured data format) and efficiency.
An example is EDI vs XML. EDI "efficiency" accrues only to the intermediary that invented the means to setup trading relationships across their proprietary network. XML uses the end users' bandwidth but it simultaneously eliminates the intermediary completely. This single fact saves literally millions in kilocharacter and storage charges.
SQL is what my old compsci prof would call opportunity-cost efficient. It's quick and can be implemented a number of ways - it's a data query framework, not a requirement for databases. There may be better ways to do it but you'll have a tradeoff somewhere.
And if someone offers something better, then when they've finished telling you how good the new way is, wait and listen for the inevitable sales pitch.
"It's not your information. It's information about you" - John Ford, Vice President, Equifax
> An iterative SQL call would be needed unless the overall part depth was known at query time.
Both oracle & db2 have very good support for recursion. DB2 in particular can easily handle hierarchies and networks of unknown depth in a single query.
The author seems to have many serious misunderstandings about XML.
The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged - semantics - and once they do that, there is no need to repeat the tags in each and every record/document being transmitted
A major point of XML is that the semantics should be explicit. If you don't repeat the tags, you reply purely on position to indicate meaning. This is a Bad Thing. For example, it does not allow sparse data in which non-default or null values can be excluded.
Inter-system data exchange requires an agreed-on efficient machine-readable delimited file format.
XML was designed to avoid the issue of 'yet another machine-readable format'. XML can be read reasonably efficiently, but always preserves meaning, ignoring the horrors of character sets and byte order. Compressed XML is a very efficient way to transmit data.
An important part of XML design was that it should allow information to be expressed in a way that is independent of the software that uses it. In this way, it has something in common with SQL. The point of 'human reading and writing' is that in the last resort, you still have you data! It also makes data transfer hugely easier to debug.
There is also a lot of confusion about the order of tags in XML. Its possible to specify in a schema or DTD that some tags are required and should be in a certain order, but its also possible to just not care about order. XML is neutral about this.
Use Open Office? unzip that .sxw or .sxc file, and what do you see? XML.
You probably use XML a lot without knowing it.
ok, aside from efficiencies in implementation and writing, the main concern should be with the expressability.
Question - can you express the recursive ancestor relation in SQL? You can express a single relation such as grand parent, but not the full relation.
Reason? Because SQL is not full relational calculus. It is basically propositional calculus (actually I maybe slightly wrong, and it falls in a higher calculus than propositional).
Prolog for example is closer, but still not true relational calculus (I forget whats missing).
There's a reason for SQL's limitations, and that's decidability - guaranteeing that the query will terminate... (admittedly in theory, and you can write some pretty horrendous statements).
Anyway, just wanted to get that off my chest. Doesn't say much about where XML falls in the scheme of thing - I guess strictlt speaking it doesn't - its just a layout. In fact, to compare XML and SQL is a complete misunderstanding. Its XSLT which should be compared.
This guy is really smart. He takes issue with the design of SQL, and with some of the commonly-used relational-database concepts (such as the NULL). His criticisms here are valid, but his position seems to be extreme. With that said, he is absolutely right about XQuery/XPath/XQueryX++/whatever.
XML is great for data exchange. Schemas are a wonderful way to describe data. But it is completely inappropriate for querying. If you need to query XML, you should import it into a database then query it that way. The very design trade-offs made when building XML were to make it extensible, and hard to query. XPath is nice to have for simple dinky import scripts. But trying to build a whole hierarchical query language on top of XML is silly.
If someone wants to build a standardized hierarchical query language, that's great. Very few people use hierarchical databases and need such a language, but I wish them luck in that endeavor. But don't pretend that it is appropriate to use it on XML, or that it has anything to do with XML. XML is an interchange format. Leave it where it works well.
To see the solution to almost all of the problems identified by Pascal, Date, Darwin and Codd please checkout Dataphor from Alphora. I am a user of their project and it is a marvel. It uses any vendor's DBMS for storage and heavy lifting, but adds untop of it true relational lanaguage and constructs. The system is so highly formalized that Dataphor provides *instant* derived user interfaces for all your tables in your existing database. After taking a few days to supply the missing information about your data that you currently cannot do in your DBMS the instant UI knows how to display all tables and forms in a logical workflow. It also handles all form validation and referencial integrity. If you change your database schema the UI is automatically updated because it is *derived*. It is not the result of some lame MS Access style wizard.
This is off-topic, but it bothers me when someone blasts another for ignorance and then links into an otherwise perfectly good article ("The Myth of Self-Describing XML", by Eric Browne)which contains the following pearl:
Doesn't it bother anyone that someone who doesn't know his Latin (but uses it, anyway, where a perfectly good "after the fact" would suffice) is used as an authoritative source by the author? If you're trying to prove someone's incompetence, you won't want to counteract it with further incompetence. Mr. Browne may be a genius in relational databases, but makes stuff up just the same.
And in the unlikely case Mr. Browne should read this post, the correct form is a posteriori, as any law student probably knows.
The consensus clearly is that the Relational Database is an unqualified success, and that is true. But don't confuse SQL and the Relational Database. SQL was an early hack intended for querying the RDB only.
Part of the problem is that the RDB was such as success that people do confuse the two. SQL, however, has shortcomings, that are addressed in the third manifesto.
This product http://www.alphora.com/ is a serious attempt ot address the shortcomings of SQL, with a new language for programming an RDB, and a new Query language.
SQL itself is not complete, and even the RDBs out there today break many of the rules as set out for the correct RDB.
Fabian Pascal is a bit of a nutter, you wouldn't ask him around to a dinner party, but he is correct.
XML is a badly thought out hack, but it works to an extent.
UML is hardly that, the U is really an act of hubris.
SQL can be improved, made more complete, and RDBs should be made to confrom more closely to the rules of what makes a real RDB.
I know some people who do in some cases, but I wouldn't exactly call that a standard procedure. Or call those people DB designers for that matter. 'Cause that is NOT database design.
You design a DB best with a pen and a large sheet of paper. Or some drawing tool your extremely good at.
SQL is the language you feed you results into the box so it builds a more or less representative imprint of the abstract reality you've designed. Which can be as relational as you want it to - as long as it meets the physical constraints of non-abstract reality. As soon as you put it onto a computer, you'll have to cut corners. That's the difference between a database _model_ and a database _implementation_. That takes stuff into account like DB load, DB Server Features and data types.
Types for instance - somewhat relevant when dealing with DB Servers and SQL - are a thing you don't want to touch with a ten-foot pole when designing a _model_.
I'm suprised a supposedly db expert guy get's all worked up about this and doesn't seem to be able to keep apples and pears apart.
Anyone initially designing a non-trivial DB with SQL and - on top of that - bitchering about this DB language not being rational deserves a clobbering.
My 2 cents.
We suffer more in our imagination than in reality. - Seneca
That SQL is mostly a kludge?
Let me restructure that...
The experts who know what the heck the relational model is and is not argue that the language we use to query a specific type of relational-like database, that they call the SQL databases, the SQL language, has unsufficient representation power to represent the whole model, and hence can't be used to get the whole power of the model.
That's certainly interesting, and leaves us to ponder two things:
1) a) Just how much more power could we get? b) And at what cost?
2) What about alternatives, can we get that same power elsewhere, cheaper?
1)a) is beyond my mere predicate logic skills at this time.
1)b) The cost of a model for data storage, representation and management is directly linked to it's adaptability to the data you represent. The article mentions a lot of errors with NULLs(I remember thinking, while reading the article: a NULL was an attempt by the language developers to simulate an interrupt in a language that doesn't have any, this is of course, an oversimplification on my part, but considering stored procedures and triggers[SQL's own exceptions] weren't around yet, they sound like a good basis for further research.) There are a lot of other hidden "costs" for people who use a relational tool for not-quite-so-relational data, but that's not part of the cost of a relational language, per se.
2) Brings up a few notions: there are the types of databases relational databases replaced, like network databases, and there are attempted replacements, like object databases. There are also further possibilities that I will explore deeper later. Object databases can certainly be interesting, in the sense that by bundling data with code, you can have data that can handle itself, in the very basic sense that we humans apply it to ourselves. The problem is that we tend to have a very fuzzy, real-world view of such data, and can't work with it that easily(we are using computers to make data easier to work with, so if we had software that could handle real-world data complexity outside of our brains, we wouldn't be having this discussion). Object data is certainly very adept with data that has some broad commonalities, re-usable behaviours, and follows set-rules. We can call those business rules for now. Those business rules imply that a certain subset of "The Universe" interests us more than the rest, and follows predictable commonalities, making our mental models a lot less complex. On the other hand, object methodology is not always well understood, and the documentation and models it generates sometimes dwarf some production systems implemented to solve the same problems.
Now, at the beginning, Relational Systems were data-handling "toolkits" set to handle specific subsets of data, who also followed business rules.
That's interesting to my purpose, simply because I can envision, at this time(some vendors have similar concepts, but don't formalize them in any way), a new set of "toolkits" where the relational model is only one of many "toolsets" available.
Indeed, what is probably the most used sql-based server available(MySQL) has been lacking true relational functionality for most of its life, yet that doesn't make the tool less useful for most of its users. Future toolkits can inspire themselves by focusing on specific uses of technology to solve specific problems, and yet keep the SQL as a sort of security blanket, since that's where most of the training about databases(and indeed, usually most of the training about data, period, is in database classes and perhaps, some algorithmics classes)
After reading the linked articles about XML's weaknesses, though, I don't think it belongs into any toolkit of that nature. Simply because the tool that belongs in the toolkit is the "self-documenting data", and XML's weakness in that area is evidenced there. XML's early focus as a medium of e
The problem is that most "Relational Database Management Systems" only represent one type of "relationship", that being "the table".
This, in turn, means that every operator (programmer, statement, etc) on the database must _individually_ "already understand" all the relationships that lie outside "the table" before they can act on the data at any significant scope.
That is, you, the programmer or operator must know, from some source besides the RDBMS itself, how the different tables work with eachother.
In simpler SQL-biased terms, you have to know, before you start, what is "good" to put in your WHERE CLAUSE to make a join. And then the RDBMS query optimizer needs to guess how to satisfy your needs in something other than glacial time.
Consider a new verb "EXPOSE thing, thing, thing, thing, thing..." that would fish out of a database the one-or-more relationships between the things, and produce a table-looking vector of "tuples" that consist sets of actual values for those things. This is what the theoretical "perfect" RDBMS would do.
Given (somewhat denormal 8-):
Customer ID -> Customer Name
Customer ID -> Street Address
Customer ID -> Zip Code
Customer ID -> Purchase Order ID
Purchase Order ID -> Part Number, Quantity
One should be able to "EXPOSE Part Number, Zip Code" and have the database "know the relationship" and produce the correct vector of tuples.
But that doesnt happen.
Now all the people bleating about the Higherarchial databases and bad things from the past are doing this harping because they remember the bad-old-days when a database would maintian one tree-structured set of relationships like this. In the higherarchical model, you could basically do this EXPOSE operation, but only if you had, by dint of pre-knowledge, asked for things lying on one linear path through the tree. (* simplified for brevity).
In essence, SQL requires you, the programmer, to be in the business of making up relationships that should be in the data or schema structure but isn't.
A magically complete RDBMS would take a series of vectors of the form "Independent Datum (key)->Dependent Datum (value)" (where either side of the arrow could be a list of atomic values). The RDBMS would then assemble and maintian tables or linked lists or whatever with no exposure of SQL-esque "tables" and the accessors would be storage method agnostic. (That is, there would be no such thing as a FROM CLAUSE.)
For instance, in the above list of relationships, Customer ID, Customer Name, Street Address, and Zip Code *could* all live in a "table", or not, but you would never know that. But a better vector of
Customer ID -> Street Name, Building Number, Suite to replace "Customer ID -> Street Address" has a table-feel, but would not bias against "EXPOSE Street Name, Part Number".
The power of this comes from being able to do:
EXPOSE Building, Part Number, Quantity
Where Building == "Word Trade Center 2";
And have the RDBMS already know the sequence of relationships to get from Building to (address elements) to Customer ID, to Part Number without the programmer writing the three stage join across the "uninteresting middle tables".
(The above presumes you have a building relation that has Street Name, Street Number -> Building or some such.)
All the XML nonsense is nonsense *_BECAUSE_* the strict-nesting enclosure requirements of XML make it "naturally" become hierarchically organized. But data exists outside the single-inheretance strict parantage trees that the hierarchical model dictates.
The problem is that SQL got real popular and so the idea of structural inferrence got pared down to tables and Query-Like constraints on tables before anybody had a chance to formalize the idea of living, encoded relationships between arbitrarily stored datum. So we never really got a language or system that could "EXPOSE".
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
I've done several years of application programming using SQL
I've also implemented the (XQuery-derived) query processing module for a native XML database
In my former life as a application programmer, I really liked SQL. It allowed some pretty complicated computation to be done in the query, and very concisely in many cases compared to doing the same thing in, say, C++. For example, things like grouping are very nice for many application purposes.
In my current job, I'm hoping to create an XML query language that supports the same sort of capabilities as SQL. Our XML query language implementation has decent path/predicate, sorting, and output structuring capabilities, mostly derived from earlier drafts of XQuery.
My feeling about XQuery 1.0 is that it is extremely bloated. XML seems really simple; querying it shouldn't be all that complicated, should it? But the XQuery committee has created several hundred pages of specifications for the new language. This seems excessive, to say the least. We basically have implemented a subset of an earlier version (with paths, predicates, sorting, XML construction, a few dozen functions), and stopped tracking what they were doing. This is kind of unfortunate, but we really don't have the resources to support his behemoth in all its awesome grandeur.
We just want a language that lets programmers efficiently access our database. I think we're on the right track. I'm not at all sure that XQuery is going to wind up as a long-term success, partly because of its bloat factor.
My favorite illustration of the XQuery bloat is this: early versions (up to about April 2002) of the XQuery language description contained this sentence in the introduction:
It is designed to be a small, easily implementable language in which queries are concise and easily understood.
Starting in August 2002, this was changed to:
It is designed to be a language in which queries are concise and easily understood.
The "small, easily implementable" part got smothered up by the avalanche of features they were adding.
Have you read my blog lately?
"XML is not meant to be read by humans, it's a data interchange format and thus meant to be read by machines" - Good heavens!
Someone saying something like that has really got a BS in BS! Or perhaps even worse: a PhD in BS.
XML is all about programmers being able to understand the data! Yes, because we are not anywhere near that nirvana of fully semantic systems that can (semi)automatically understand each other. NO! Programmers have to do the work to make the systems fit together and XML gives them the advantage that they do not have to reverse engineer another proprietary data format or dig into a horsepile of documentation. XML makes it easy to understand how to process the information at hand - without any extra work!
Also it's great format for storing small amounts of constantly changing data (like user preferences) cause it's extensible and with only a tiny bit of effort backward compatible as well.
Anyone trying to use XML for processing large amount of data (like data warehousing) is either nuts or doesn't give a damn about the speed or costs.
However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain (anybody who has tried to decode data from some long gone accounting package from the '80-s knows what I am talking about).
So yes XML is self describing only to humans and that's the whole point of it. Formalizing data semantics is not the goal of XML, has never been and will never be, thats what we have RDF, RDFS, OWL and other nice initiatives from Semantic Web movement for.
I think this whole dispute is not necessary. It's just a simple case of conflicting goals. One side wants "efficiency" and another wants "extensibility". What both sides miss is that real systems can't afford to choose just one of these, you have to get both. So you have to have just enough extensibility to allow reasonable extensions, and still attain reasonable levels of efficiency. But trying to get all of either thing will totally lose the game. So the XML camp is wrong to think that extensibility is everything (ever tried parsing XML in a real-time system?). And the "binary transfer format" camp is wrong to think efficiency is everything (ever tried to make two versions of your protocols to interoperate?).
-- Esa Pulkkinen
No. You. Can't.
You simply cannot know what the data means without specified semantics (which you have to agree upon somehow; they aren't magically apparent from the XML itself -- that's why such abominations as DTD and XML-Schema exist).
HAND.
table: order ID, part number, quantity shipped
select part number, sum(quantity shiped), avg(quantity shipped)
Group By Part Number;
This works with NULLs in the column for quantity shipped on parts which have not yet been shipped. If you just use zero for "no shipment" then your average number will have no real value for answering questions like "how much do we spend shipping these parts, on average?" etc.
If you wan't to throw an exception you can throw the execption or not in your program. In that case you fetch the individual values and do the math yourself and the "that's not a number" that is caused by the null gives you the chance to throw your exception.
But since, in aggregate operations, your program isn't even interractin with the data yet, where would such an exception go?
What would the SQL syntax be communicating a list of results PLUS a list of exceptions to your program? Which order would things be processed in?
Your boolean analogy is also flawed. "You have stopped beating your wife?" is not a yes-or-no question because it carries a predicate around with it that you may not fulfill for serveral reasions (not married; you are hetrosexual female, so you don't have a "wife", you have never beaten your wife so you can't "stop" doing it; etc). There are a surprisingly large number of "real data" that nature. For those of you who have trouble abstracting this, the "real comparason matrix" is "True, False, and Not Applicable". NILL buys you "Not Applicable" so very cheaply.
In poin of fact, people who don't like NULL, usually because they don't understand its purpose and use, make a hell of a lot of work for themselves.
My current employer has a large database of test values that grows by huge numbers of elements each day. The programmer "didn't understand" NULLs (ro RDBMS' for that matter) and has "-" in fields that should be NULL.
Consequently we cannot aggregate. All of our client applications end up haveing to bulk-fetch whole table ranges and run through elaborate statistical routines full of conditionals; or do separate fetches with "field != '-'" in the where clause and run a concordance operation in ram after the repeated bulk fetches.
This costs bocup time and degrades the quality of the product.
You call "academic bullshit", I suspect you have never had to work the really large or significant data sets. I suspect that you don't ever ask the server-side to aggregate for you. And I suspect you have never worked time-critical transactions across a "slow" link.
You can't have. You think of "NULL" in terms of equality.
I will give you the "syntatic" point that "Where X = NULL" ought to be unversal. But, for instance, the cartesian nightmare of having "NULL == NULL" in a join is beyond idiotic.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
No, they are not, but the way to rectify them is a bit extreme. Keep in mind that -- as Date says again and again -- there is a difference between the logical model and the physical model. I'll summarize the example he used: a EMP_SALARY table.
Let's start simple:
[Sorry, ecode doesn't seem to want to do vertical alignment.]
Now, what do those two NULLs mean; do they carry value? If Bob is unemployed, we could write 0, but then it could be confused with unemployed and employed for no pay. Perhaps we don't know how much money Chris makes, but we do know that he makes some money.
You could fix this by adding a TYPE enumeration column that would take on values like EMPLOYED, UNEMPLOYED and UNKNOWN, but you'll still need the NULL value for the UNEMPLOYED and UNKNOWN cases.
This last part can be solved by logically segregating the table above (sans TYPE column) into three tables (one per type). Both the UNEMPLOYED and UNKNOWN_EMPLOYMENT tables would lack the SALARY column -- they would have only the EMP_ID as there's no more information to add. All rows in the EMPLOYED table would have a known salary.
NULLs have been removed and the design is further normalized -- some would say to the extreme.
Now, how you would model that physically without using NULL and still managing some level of performance I do not know. But that at least explains the reasoning behind NULL not being necessary.
As for myself, after nearly fifteen years of database design and implementation, I'm quite satisfied with using NULL where appropriate. I've never been befuddled by it nor sidelined by its behavior with respect to logical operators. Learn the rules and move on.
Freedom to fear. Freedom from thought. Freedom to kill.
I guess the War on Terror really is about freedom!
It's one thing to say that either SQL or XQuery have their problems, because they do. It's quite another to say that SQL is bad because it doesn't live up to some arbitrary never-achieved (and perhaps unachievable) standard of relational purity that even Codd himself found superfluous. When Pascal does nothing but the latter, and in addition takes a dozen thoroughly unprofessional swipes at Chamberlin for having been involved in both SQL and XQuery, his professional jealousy is becomes thick enough to choke on. I wish he would, so we would be spared the incessant ranting of someone whose whole career has been marked by a lot of words and not a single deed to back them up.
Slashdot - News for Herds. Stuff that Splatters.
"The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not"
yeah, you tak a file that has 1000 linse, each line has 1024 characters, and 300 fields. Transmitted from Brazilla at 28K.
Now convert the to an XML format, and then try sending it. suddenly it take 10 times longer to get the data. Literally going from a couple of hours to all day. Not exactly cost effective.
XML is easy to read, if you know the format and context in which your reading it. It still needs to be designed, you still need to go to meetings, and you still need to create paperwork on the structure.
It's got its place, but it is not the end all, or even cloce for that matter.
The Kruger Dunning explains most post on
(I just used my mod points in another thread, so I gotta hope other mods recognise the parent post.)
In the world of self taught dabblers, NULL is not well enough understood to be expected to do anything more than cause the kind of problems you alude to with the likes of '-' to (partially) imply what NULL should be used for.
SQL has to coexist with other components where an empty string and a numeric zero are assumed null and treated accordingly, the quantity shipped example you give being just as easy to understand and implement with zero meaning not shipped as with a separate null (just add "where quanity shipped > 0").
There are also several possible reasons for a data value to be left NULL or undef, not all of which are mutually exclusive. Is it "not yet", "not known" or "not applicable"? In the real world we sometimes need to pair a status enum and a (numeric or string) value column to properly represent a single logical datum which needs to sometimes take state values not sensibly representable by numbers or strings.
We used to use a string of 9s in a numeric key field to represent end of data and even today Perl's DBI interface uses the 0E0 kludge to represent a "true" zero.
-- Our systemic servants do not good masters make.
Pascal's entire line of argument fails to explain why I should care. I RTFA'd. I even read some of the links. This is a topic near and dear to me, as I fight for better design, better technologies at work against various technology and platform zealots.
I'd argue that if you have a thesis or hypothesis, you need to make a testable predication and test it against the real world. That's the gist of science. Pascal tests XML and SQL against the Gospel of Formal Data Model's according to Codd. Pascal's arguments are theological, not practical, not scientific.
I want to know how XQuery will or will not make my job of designing and implementing solutions that save my company money. How will the problems of XQuery, or the blasphemous SQL implementations cause me problems that cost time? How do these deviations for orthodoxy really make code harder to write, harder to maintain, more prone to error? These are the relevant questions. These are issues that can also be tested against production scenarios.
I don't give a hoot about how well XML falls into Codd's formal model. I care whether or not it saves me time. I'll happily accept that normalization, taken not too literally, does saves me time. XML also saves me lots of time: makes a great prototype to store data in (gasp!) before I implement the database layer. Even for a somewhat complex schema, I can blast out a small XML document in a few minutes in my text editor of choice, and test out how the components use the data that they store. Sometimes, for small data sets such as configuration information, I even leave it in XML for years before finding a compelling reason to deal with an RDBMS. Web services (the horror!) have made the middle tier much more accessible - I can drive access to a service based on policy now, rather than to whom I can distribute the damn DLL and it's dependencies. None of this is perfect, but XML works where it makes a difference to me.
In short, regardless of how well or poorly Pascal addresses his questions, they are the wrong questions addressed in the wrong way. He should go into theology or critique Chinese literature and leave us working folks alone.
My motto: "A cat is no trade for integrity."
One thing that everyone should understand: even though Pascal, Date, etc... argue that SQL is a bad implementation of the relational model, they *still* agree that it provides value, and that it is miles better than its hierarchical database predecessors. Since it is their job(s) to provide a reasoned critique of the field, it is only natural for them to rigorously compare SQL to the goals of the relational model. Many of their complaints fall in these categories (although there is much more):
1. Does too much -- too many ways of doing the same thing, and too many unecessary operations that could be better done another way.
2. Overly complex -- the SQL 1999 standard was something like 1200 pages.
3. Allows programmer to circumvent relational integrity. Things like "hidden identifiers", pointers, etc...
4. Too wrapped up in implementation -- users must spend a lot of time understanding the physical storage, rather than focusing on queries in abstraction.
5. Many small inconsistencies in SQL itself
But the problems with SQL are impossible to judge if you only know SQL. It's like the people who used to ask what was wrong with a perfectly good typewriter that made people want to use a word processor. To any who are curious, I suggest you do some reading. The absolute best simple introductions for these problems are in a two online documents by Hugh Darwen at www.thethirdmanifesto.com. Look for "The Askew Wall", and "The Importance of Column Names".
...useless until you realize you have to connect to the DB from several applications written in several different languages for which you have either to reimplement your way to manipulate the data everywhere, or you have to put in a middle layer of some sort which is able to talk multiple languages (via CORBA, SOAP, plain XML RPC, custom protocol, whatever) and ensure that everyone is accessing data exclusively through it.
Reimplementing logic everywhere across different languages is usually a bad approach because it doubles the development and testing effort.
Middle layers are usually hard to get right the first time, much harder than using stored procedures and triggers, since the typical procedural language is not so at ease at manipulating relational data.
On MySQL: it has a somewhat bad reputation in the field because of the people abusing it: it's fast, it's free, it's easy to set up, but living with these gotchas is definitively too painful for developers more concerned about correctness than speed.
In the end: firing up Firebird/Oracle/PostgreSQL/SAPDB for simple data is plainly stupid, but often it is done anyways since they do a decent job even in that cases; firing up MySQL for your 30+ GB DB containing your whole network topology which is used by everyone for billing, service assurance, troubleshooting, network planning and whatever is stupid as well. MySQL AB knows that, and in fact it now proposes MaxDB (was: SAPDB) as well.
You are wrong. The SQL:2003 standard specifies IDENTITY. See http://troels.arvin.dk/db/rdbms/#mix-identity
XML is all about programmers being able to understand the data!
No. XML is all about storing meta-data alongside the data. And you are implying that programmers will not understand data that is not easy readable, like XML; this implication is an oversimplification.
(...) reverse engineer another proprietary data format (...)
XML is certainly not the only well-documented interchange format.
(...) dig into a horsepile of documentation.
Is is fairly easy to create XML documents that require horsepiles of documentation, just like any other format. Just look at the XML output of Microsoft Office; XML is no magic bullet.
(...) (like user preferences) (...)
No, one should use an API for that and be independent of storage meganism.
However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain
I would not like to retrieve all information from abovementioned Office XML files right now, let alone in ten years time.
So yes XML is self describing only to humans and that's the whole point of it.
Ehhm, no. Human readability is a side effect that probably led to its widespread adoption. Work is underway to make certain XML far less 'readable' to humans, but smaller and faster to process.
Formalizing data semantics is not the goal of XML (...)
No, because that would require AI, but formalizing meta-data semantics most certainly was a target set for XML to achieve the goal of better data interchange.
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
One foundamental error in today's operating systems is that they are datatype-agnostic. They simply don't know what the data they handle is. This task is left completely to applications, and each application usually provides its own way of managing data. This causes incompatibilities between applications, and these incompatibilities are not solved either by SQL or XML (that are nothing more than human-readable representations).
/. discussion when half of the posters say that SQL sucks and the other half saying that XML sucks!)
In my opinion, an operating system must primarily provide a data management solution. It must provide the common ways to organize, store, retrieve and process data. This means that the application should only care about the logic behind the data, not how data management is implemented. There are only a few methods of data organization anyway, and it is a shame that these methods are not available when an O/S is installed.
The algorithms that concern the data types should also be available along with the data types. This means that an operating system not only should provide data management, but it should be object-oriented: each "data node" in the system should be available as a class in the chosen programming language (if it supports such a concept).
The availability of data and their types on the O/S level would also boost security and safety, as it would not be possible for a 'devious' application to approach the data in any other way other than the intended one.
Finally, the concept of 'application' is also wrong, and I am saying this in the context of data: in our day and age, data not only multiply fast, but the types of data are frequently modified. The liquid status of data (and their data types) makes the concept of an 'application' (thousands of source code lines, cast in the stone, with a huge degree of coupling between them) a huge obstacle in really making computers useful. Applications need to be replaced by a live system of persistent objects that do simple jobs and inform the world (through events) about changes in their state or the results of their computations; the O/S should be responsible of organizing how objects communicate with each other (either in the same memory space, in different memory spaces, or in different computers).
Since the current situation is not exactly orthogonal (as described above), there are many misunderstandings and problems in defining concepts clearly; many thousands of dollars are spent in re-inventing the wheel, and many work hours and brain power is consumed in creating what should already be there...(and thus we can have a nice
I was saying
1) a) Just how much more power could we get?
is beyond my mere predicate logic skills at this time.
Explaining just what power is missing from SQL to explain "the full power of relational models" in the context of the article, where it is said relational algebra can represent tree-like structures etc... is beyond me.
I am not interested in how hard relational is to use.
1)a) was referring to expressing, just what's missing in all the SQL servers, that could be reached if they were truly relational. In this case it has a lot more to do with how we define what you can AND on, a lot more than what operations you can use. I can understand AND just fine, I have a lot more problems with understanding how the experts say on one breath "it's not flat tables anymore" and on the next, no explanation on what operations you can use on a non-flat table... Say is the AND of a pair of arrays of floats the matrix product of the ands of the arrays? In what precision? Or are they just saying "the tables aren't flat, they can be linked" but you gotta restrict your operations on the virtual flat tables that overlay your more esoteric structures?
ADABAS D had arrays and such(it's now MySQL MaxDB I believe) inside sql-addressable tables, but that was a relatively poorly supported feature, once which wasn't well modeled( you can see from my examples why it wasn't well modeled, I think, it brings up questions about the domains of each operation)
We can certainly agree on the lousy marketer and packager, but you know what? The marketers won't help you write your complex queries either, Codd's theories just might(well again, they might not, but that's what the original post was about).
I'm sure I misunderstand more about predicate logic than you've understood in the time it took you to read these lines, but I was aware of those limitations, and kept my post in line with that.
You however chose to limit the relational model's power, to the logical operators it uses, without mentioning that the relational model also includes rules on how you can organize what objects you can apply those operators TO, which is a rather important distinction, and probably closer to weaknesses of SQL, as evidenced in the article anyways. Anyone can write relational algebra if you got scalar booleans, but if you got an array of complex numbers to multiply by a matrix of floats, and all that needs union(or any other type of join etc...) with userids, full names and addresses, it gets a little bit more fun.
Well "the rules of our system" also include just what we can NAND (or AND or XOR) ON, not just those operations. The definition domain of those operations is also important, that's why in SQL's fuzzy logic, ANDing with a NULL gives you a NULL.
Now if there's an impedence mismatch between sql and relational theory, it's probably on the definition domain of the operations(NULLs, non-scalars values, strings) and not on AND itself, which is pretty much a "simple" operator, at least when you limit yourself to scalar values like booleans, and logical predicates.
I'm not going to try and reply in detail, but since I participate in the W3C XML Query Working Group and am also the w3C XML Activity Lead, a few comments may be useful.
:-)
The article seems to says "I don't like SQL and I don't like XML and I think XML Query is about mergin them although I don't understand it very well, so the people working on XML Query must be stupid, and in any case it's easier to attack people than understand a specification".
Perhaps that's unfair, but it's clear to me that the writer is a little fuzzy on the design goals of XML and also on the focus of SQL development over the past 10 or 15 years.
In both cases the story is about interoperability.
If you look at the XML Query Home Page you'll see approximately two dozen implementations of the XML Query draft, including a number of open source ones. If you look at the public mailing list for comments, you'll see we received over 1100 detailed technical comments at the last public review. So there's a lot of interest in this work.
Why is that? One reason is that, like Web services and SOAP, XML Query is able to replace a lot of proprietary and hard-to-maitain middleware. Another reason is that for the first time we'll have a standard way to search over multiple kinds of data source.
Don is the primary editor of the XQuery language, but the technical decisions reflected in the specification are a result of collaboration, and are agreed on by aconsensus process by a much larger number of particpants. The goal is to make a language that people agree to implement and to use. With support announced by Microsoft, Oracle, IBM, BEA and others (see Web page mentioned above) and judging by the public interest, I think it's fair to say that's going to happen.
It's pretty rare to see a large complex system that everyone is happy with. It's actually pretty rare to see a small system that everyone is happy with. There are people who are unhappy with some features in the Unix cat program, but it's better to have cat in every Unix system than to have millions of shell scripts break on systems where it's missing! The trick, then, is often to include features that will lead to massively wider adoption, even if some people would rather be without them.
Then we have (as part of W3C Process) a public call for implementations so that we can test to see how confident we are that all the major features can be implemented compatibly (i.e. interoperably) in multiple independent implementations.
Features that were not implemented get removed before the specifications are final.
Is XML Query a waste of time? Is XML evil? Is SQL evil? A lot of people think otherwise, and some of them are pretty smart, so if you are concerned, take the time to read the specs and decide for yourself.
Live barefoot!
free engravings/woodcuts
I've read the article (maybe i've misread it but its 8:30am and a saturday:)) and although I can see lots of bashing of SQL, XML and nulls going on I can't see any 'alternative' or solutions suggested.
----- I refuse to have an argument with an unarmed person