SQL, XML, and the Relational Database Model
Kardamon writes "In an article on DBAzine, Fabian Pascal writes that SQL is not a good representation of the relational model, and is afraid the situation will get worse with XML and XQUERY. An overview of some of the reactions on the positions Pascal and also C.J. Date take on this issue is given in this article over at SearchDatabase.com by Sara Cushman."
convert SQL to XQuery and others, though? If you could, it'd make porting your applications easier... Wonder how long it'll be until something out there does it.
Disconnect and self-destruct, one bullet at a time.
Of course XML is going to be hard to represent in a relational database. Unless your tables are ( id, object text) and you pull out your XML and parse it.
Celko is misquoting Darwin in saying that "The idea that you will always know everything is arrogant".
t o.web/Missing-info-without-nulls.pdf
Date/Darwin/Pascal propose that you codify what you don't know (so to speak). Read their proposed solution here:
http://www.hughdarwen.freeola.com/TheThirdManifes
And yes, XML DBMS are a throwback to IBM IMS and other hierarchical DBMS products. Anyone who has ever used a hierarchical DBMS will tell you that there are some pretty non-trivial problems that you cannot work around due to their hierarchical data model, yet XML DBMS proponents propose we go back to that old, inflexible system!
Thanks,
--
Matt
Drill baby drill - on Mars
..use a Relational DBMS like PostgreSQL?
I think relational SQL databases are just fine, easy enough to use, and serve their purpose very well. They DO take some serious thought when designing tables and queries - but we shouldn't be afraid to think. If your head hurts from SQL, keep on it for a couple more minutes and you'll probably have it! If your head hurts from SQL, you've tried thinking about it - and you still don't get it - you're probably in the wrong business. Complex information retrieval is complicated and sometimes difficult to understand.
On to the next part. XML serves its purpose very well. Although I wouldn't consider XML and SQL to serve the same problem sets equally well. There are certain situations where SQL is perfect. And there are other situations where XML is preferable. If you think of the two as two sides of the same coin, I think you're making a mistake. Likewise, you can't just flip between the two on a whim. Choose the format that's most suited to what you want to do and go forward.
It aint broke and don't need fixin'
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
Here is a history of SQL. I wanted to check the article's facts. Also, I was curious... History of SQL
Easy to use, easy to debug, easy to understand, powerful. Isn't this good enough?
got sig?
From the article:
It has been quite obvious that the designers of SQL had little understanding of data fundamentals in general, and the relational model in particular
Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.
Everybody goes through a phase where they bitch about SQL. So did I. And I built a clever OO DataModel module that abstracted it into pretty heirarchies and all sorts of clever crap. Then I tried actually building systems with it and realized I was better off with ugly ol' SQL.
I've been part of too many projects where people pulled out the UML books in favor of a decent First Normal Form DB design and led the team down the tubes.
I'm not saying these other methods don't have their place -- they do. But they aren't going to displace SQL because it has it's place also. And it's place isn't theoretical, it's been practically demonstrated a million times.
Cheers.
It is rather intensive to do a relational 'hiearchy' search, whereas an XML representation would allow fast 'object' modeling. One example: Bill of Material. An iterative SQL call would be needed unless the overall part depth was known at query time. I like the idea of hybrid DBMS that allow XML-stored data to be accessible like a relational database. Oracle does this somewhat in their newer releases.
Readers interested in what Pascal and Date have to say may be interested in visiting Database Debunkings. It has lots of articles written by one or both of them.
Personally, I recommend the articles written by Date because they are clearly, concisely, accurately, and calmly written. Pascal's tend to turn into a rant, which I wouldn't mind but he always seems to refer to his books rather than give a detailed justification of his arguments.
As a rant, this article does a great job. But here's what is missing - what the heck is he talking about? Everything he says is liberally sprinkled with statements telling us these things are self-evident, when they are anything but. He is constantly is referring to how this will clearly show that, or pointing out that this proves this or that later on, but never gets there.
Can anyone summarize what is being said here in some sort of logical way? Because I'm confused. I see the title, I see no information supporting the title (unless, perhaps, I was to do the research myself).
I'm a pretty good software developer, but if someone doesn't explain to me what the argument is in plain english without extreme haughtiness, I'm going to write off this whole issue as a pissing contest.
~D
This sig has been enciphered with a one-time pad. It could say almost anything.
I described a general vision for this sort of formalism in a prior slahsdot post. Suffice to say some progress has been made since then -- and work in other areas is starting to converge. There is much yet to be done.
Seastead this.
SQL is meant for a relational database. XML is a hiearchial database... have you never worked on a project where your manager HAD to convert a projects database in XML because it was the new shiny buzzword? It NEVER works out very well because they are logically different. The same will go for SQL->XQuery.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Most of the problems that I've seen with SQL commands that are more complex than they really should be are a result of mistaken assumptions made during the design phase of the database. As a result, extra tables get added late, and therefore create new "features" that code then has to be revised to take advantage of...
XML's going to be no better in this area. Mistakes made during the design phase will always come back to haunt while implementing and using the system. If a single query can't return the results desired, then that should have been thought of while designing the tables of the DB. Trying to get a query to specify "All things that are red" like Pascal suggests is only going to work if all objects implement the "color" property the same way. If somebody uses CMYK, somebody else uses RGB, and a third uses Play-Doh color names, it's still gonna be a mess that requires code to figure out who really matches whom.
I don't see how this "new model" fixes the real problems with working with SQL between databases that weren't designed to work with each other.
A legacy application is one that works. And the same can be said of SQL. Actually XML works too.
It's important to understand what XML is replacing - binary or proprietary interfaces. This means an acceptable tradeoff between human readability (a hugely underrated requirement of a structured data format) and efficiency.
An example is EDI vs XML. EDI "efficiency" accrues only to the intermediary that invented the means to setup trading relationships across their proprietary network. XML uses the end users' bandwidth but it simultaneously eliminates the intermediary completely. This single fact saves literally millions in kilocharacter and storage charges.
SQL is what my old compsci prof would call opportunity-cost efficient. It's quick and can be implemented a number of ways - it's a data query framework, not a requirement for databases. There may be better ways to do it but you'll have a tradeoff somewhere.
And if someone offers something better, then when they've finished telling you how good the new way is, wait and listen for the inevitable sales pitch.
"It's not your information. It's information about you" - John Ford, Vice President, Equifax
At the foot of the page, we have a "sponsored by Oracle" graphic.
I'm sure those two things are entirely unrelated.
My first professional coding job was to write financial reports in a mix of C and SQL. When I started learning SQL I was amazed how 10 lines of SQL could do what would take dozens of C code easily. I have since seen a number of highly successful projects use SQL-based databases. This article calls SQL a failure -- BS.
Just wondering if the author had any suggestions to wean me from the SQL/XML data exchange habit.
I've recently been trying to understand the significance of RDF (I picked up the o'reiley book, but haven't gotten around to reading much of it yet). It seems that it allows you to represent data in a graph structure instead of a tree structure, as in XML.
What do people out there think of RDF? Can it represent the same things that people use relational databases for? What is it good at? What is it not good at?
-jim
Shopkeeper: You've got a pet database?
Customer: Yes. I chose him out of thousands. I didn't like the others, they were all too flat.
Shopkeeper: You must be a looney.
-Adam
The author seems to have many serious misunderstandings about XML.
The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged - semantics - and once they do that, there is no need to repeat the tags in each and every record/document being transmitted
A major point of XML is that the semantics should be explicit. If you don't repeat the tags, you reply purely on position to indicate meaning. This is a Bad Thing. For example, it does not allow sparse data in which non-default or null values can be excluded.
Inter-system data exchange requires an agreed-on efficient machine-readable delimited file format.
XML was designed to avoid the issue of 'yet another machine-readable format'. XML can be read reasonably efficiently, but always preserves meaning, ignoring the horrors of character sets and byte order. Compressed XML is a very efficient way to transmit data.
An important part of XML design was that it should allow information to be expressed in a way that is independent of the software that uses it. In this way, it has something in common with SQL. The point of 'human reading and writing' is that in the last resort, you still have you data! It also makes data transfer hugely easier to debug.
There is also a lot of confusion about the order of tags in XML. Its possible to specify in a schema or DTD that some tags are required and should be in a certain order, but its also possible to just not care about order. XML is neutral about this.
The fact is that in order for any data interchange to work, the parties must first agree on what data will be exchanged -- semantics -- and once they do that, there is no need to repeat the tags in each and every record/document being transmitted. Any agreed-upon delimited format will do, and the criterion here is efficiency,
WRONG!!!
The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not. The beauty of XML is "agreed-upon" is optional now. We can understand the data without a formal meeting to come to an agreement.
As a guy who has spent a significant portion of his life in meetings about data mapping between systems, I love XML because of how easy it is to read, understand, communicate about and map to the structures I need the data for.
Use Open Office? unzip that .sxw or .sxc file, and what do you see? XML.
You probably use XML a lot without knowing it.
I don't think XML has been designed to be efficient. I don't think any business stores all their data in XML files (unless speed/datasize isn't a problem). XML has been designed to be universal, easily "morphable". I think XML/XSLT is an awesome tool to create reports out of SQL results, and I don't see what kind of format would fit the job better. Compare Wix with raw MSI tables. Which one do you think is the better design tool? XML, of course, because the foreign key links (Components, Files, Shortcut, UI stuff) are assumed by the tags. I'm sure that XML fits for a lot of other kind of jobs I'm not aware of. Thus, we can see XML as a pretty good jack-of-all-trade, not a efficiency champion.
perception is reality
ok, aside from efficiencies in implementation and writing, the main concern should be with the expressability.
Question - can you express the recursive ancestor relation in SQL? You can express a single relation such as grand parent, but not the full relation.
Reason? Because SQL is not full relational calculus. It is basically propositional calculus (actually I maybe slightly wrong, and it falls in a higher calculus than propositional).
Prolog for example is closer, but still not true relational calculus (I forget whats missing).
There's a reason for SQL's limitations, and that's decidability - guaranteeing that the query will terminate... (admittedly in theory, and you can write some pretty horrendous statements).
Anyway, just wanted to get that off my chest. Doesn't say much about where XML falls in the scheme of thing - I guess strictlt speaking it doesn't - its just a layout. In fact, to compare XML and SQL is a complete misunderstanding. Its XSLT which should be compared.
This guy is really smart. He takes issue with the design of SQL, and with some of the commonly-used relational-database concepts (such as the NULL). His criticisms here are valid, but his position seems to be extreme. With that said, he is absolutely right about XQuery/XPath/XQueryX++/whatever.
XML is great for data exchange. Schemas are a wonderful way to describe data. But it is completely inappropriate for querying. If you need to query XML, you should import it into a database then query it that way. The very design trade-offs made when building XML were to make it extensible, and hard to query. XPath is nice to have for simple dinky import scripts. But trying to build a whole hierarchical query language on top of XML is silly.
If someone wants to build a standardized hierarchical query language, that's great. Very few people use hierarchical databases and need such a language, but I wish them luck in that endeavor. But don't pretend that it is appropriate to use it on XML, or that it has anything to do with XML. XML is an interchange format. Leave it where it works well.
I'm on Slashdot so I haven't read the article, of course. But I have read other articles and books written by Mr. Pascal.
If I recall correctly, his usual rant is not that SQL isn't object-oriented; Quite the contrary. He feels (or used to feel) that SQL -- the actual RDBMS part of it -- is fine, as is (as per Cobb), but that it are the *vendors* who have missed the mark.
In fact he goes so far as to say that OO DBMSes are redundant. No one needs an OO DBMS if the real McCoy (RDBMS) were produced by a worthy vendor.
Perhaps, if MS can open up their code (little by little), I suppose Fabian could come around. But I don't see why he needs to -- as I think the whole OO-RDBMS thing is just another way to make a n extra buck, if you're a vendor.
To see the solution to almost all of the problems identified by Pascal, Date, Darwin and Codd please checkout Dataphor from Alphora. I am a user of their project and it is a marvel. It uses any vendor's DBMS for storage and heavy lifting, but adds untop of it true relational lanaguage and constructs. The system is so highly formalized that Dataphor provides *instant* derived user interfaces for all your tables in your existing database. After taking a few days to supply the missing information about your data that you currently cannot do in your DBMS the instant UI knows how to display all tables and forms in a logical workflow. It also handles all form validation and referencial integrity. If you change your database schema the UI is automatically updated because it is *derived*. It is not the result of some lame MS Access style wizard.
I did database work for a few years and understand why 3NF is good (or 4NF perhaps) and I found it really really great for certain things. However I was able to create databases that could easily represent some really complex things that I could not query without writing a mixture of code (VBA for me) and SQL - like traversing a (potentially recursive) data structure. It may be that I didn't know enough, but I think the problem was with SQL - or my expectations for it. I suspect that's along the lines of what he's trying to get at.
This is off-topic, but it bothers me when someone blasts another for ignorance and then links into an otherwise perfectly good article ("The Myth of Self-Describing XML", by Eric Browne)which contains the following pearl:
Doesn't it bother anyone that someone who doesn't know his Latin (but uses it, anyway, where a perfectly good "after the fact" would suffice) is used as an authoritative source by the author? If you're trying to prove someone's incompetence, you won't want to counteract it with further incompetence. Mr. Browne may be a genius in relational databases, but makes stuff up just the same.
And in the unlikely case Mr. Browne should read this post, the correct form is a posteriori, as any law student probably knows.
The consensus clearly is that the Relational Database is an unqualified success, and that is true. But don't confuse SQL and the Relational Database. SQL was an early hack intended for querying the RDB only.
Part of the problem is that the RDB was such as success that people do confuse the two. SQL, however, has shortcomings, that are addressed in the third manifesto.
This product http://www.alphora.com/ is a serious attempt ot address the shortcomings of SQL, with a new language for programming an RDB, and a new Query language.
SQL itself is not complete, and even the RDBs out there today break many of the rules as set out for the correct RDB.
Fabian Pascal is a bit of a nutter, you wouldn't ask him around to a dinner party, but he is correct.
XML is a badly thought out hack, but it works to an extent.
UML is hardly that, the U is really an act of hubris.
SQL can be improved, made more complete, and RDBs should be made to confrom more closely to the rules of what makes a real RDB.
They must be really old by now. But still producing good ideas. I am glad we still have such luminaries in our scientific community.
NO SIG
What ever happen to the Pick database?
I know some people who do in some cases, but I wouldn't exactly call that a standard procedure. Or call those people DB designers for that matter. 'Cause that is NOT database design.
You design a DB best with a pen and a large sheet of paper. Or some drawing tool your extremely good at.
SQL is the language you feed you results into the box so it builds a more or less representative imprint of the abstract reality you've designed. Which can be as relational as you want it to - as long as it meets the physical constraints of non-abstract reality. As soon as you put it onto a computer, you'll have to cut corners. That's the difference between a database _model_ and a database _implementation_. That takes stuff into account like DB load, DB Server Features and data types.
Types for instance - somewhat relevant when dealing with DB Servers and SQL - are a thing you don't want to touch with a ten-foot pole when designing a _model_.
I'm suprised a supposedly db expert guy get's all worked up about this and doesn't seem to be able to keep apples and pears apart.
Anyone initially designing a non-trivial DB with SQL and - on top of that - bitchering about this DB language not being rational deserves a clobbering.
My 2 cents.
We suffer more in our imagination than in reality. - Seneca
That SQL is mostly a kludge?
Let me restructure that...
The experts who know what the heck the relational model is and is not argue that the language we use to query a specific type of relational-like database, that they call the SQL databases, the SQL language, has unsufficient representation power to represent the whole model, and hence can't be used to get the whole power of the model.
That's certainly interesting, and leaves us to ponder two things:
1) a) Just how much more power could we get? b) And at what cost?
2) What about alternatives, can we get that same power elsewhere, cheaper?
1)a) is beyond my mere predicate logic skills at this time.
1)b) The cost of a model for data storage, representation and management is directly linked to it's adaptability to the data you represent. The article mentions a lot of errors with NULLs(I remember thinking, while reading the article: a NULL was an attempt by the language developers to simulate an interrupt in a language that doesn't have any, this is of course, an oversimplification on my part, but considering stored procedures and triggers[SQL's own exceptions] weren't around yet, they sound like a good basis for further research.) There are a lot of other hidden "costs" for people who use a relational tool for not-quite-so-relational data, but that's not part of the cost of a relational language, per se.
2) Brings up a few notions: there are the types of databases relational databases replaced, like network databases, and there are attempted replacements, like object databases. There are also further possibilities that I will explore deeper later. Object databases can certainly be interesting, in the sense that by bundling data with code, you can have data that can handle itself, in the very basic sense that we humans apply it to ourselves. The problem is that we tend to have a very fuzzy, real-world view of such data, and can't work with it that easily(we are using computers to make data easier to work with, so if we had software that could handle real-world data complexity outside of our brains, we wouldn't be having this discussion). Object data is certainly very adept with data that has some broad commonalities, re-usable behaviours, and follows set-rules. We can call those business rules for now. Those business rules imply that a certain subset of "The Universe" interests us more than the rest, and follows predictable commonalities, making our mental models a lot less complex. On the other hand, object methodology is not always well understood, and the documentation and models it generates sometimes dwarf some production systems implemented to solve the same problems.
Now, at the beginning, Relational Systems were data-handling "toolkits" set to handle specific subsets of data, who also followed business rules.
That's interesting to my purpose, simply because I can envision, at this time(some vendors have similar concepts, but don't formalize them in any way), a new set of "toolkits" where the relational model is only one of many "toolsets" available.
Indeed, what is probably the most used sql-based server available(MySQL) has been lacking true relational functionality for most of its life, yet that doesn't make the tool less useful for most of its users. Future toolkits can inspire themselves by focusing on specific uses of technology to solve specific problems, and yet keep the SQL as a sort of security blanket, since that's where most of the training about databases(and indeed, usually most of the training about data, period, is in database classes and perhaps, some algorithmics classes)
After reading the linked articles about XML's weaknesses, though, I don't think it belongs into any toolkit of that nature. Simply because the tool that belongs in the toolkit is the "self-documenting data", and XML's weakness in that area is evidenced there. XML's early focus as a medium of e
A Multivalue database model is better suited for XML, and better suited for real-world data, period.
_
*donning flame proof suit - because I know the 'Relational Zealots' are out to get me*
_________________________________________________
Free your mind and your database will follow -- MultiValue
Does anybody find people who end their writing with "Caveat Emptor" pretentious and snobbish? I mean, it's like this: "I am superior enough to recognize a problem with something, and I advise you to be aware, because after all, I know better". "Caveat Emptor" is fine, it's how people use it, and from my perspective, I think it's taking on a snobbish tone, even if used correctly.
The problem is that most "Relational Database Management Systems" only represent one type of "relationship", that being "the table".
This, in turn, means that every operator (programmer, statement, etc) on the database must _individually_ "already understand" all the relationships that lie outside "the table" before they can act on the data at any significant scope.
That is, you, the programmer or operator must know, from some source besides the RDBMS itself, how the different tables work with eachother.
In simpler SQL-biased terms, you have to know, before you start, what is "good" to put in your WHERE CLAUSE to make a join. And then the RDBMS query optimizer needs to guess how to satisfy your needs in something other than glacial time.
Consider a new verb "EXPOSE thing, thing, thing, thing, thing..." that would fish out of a database the one-or-more relationships between the things, and produce a table-looking vector of "tuples" that consist sets of actual values for those things. This is what the theoretical "perfect" RDBMS would do.
Given (somewhat denormal 8-):
Customer ID -> Customer Name
Customer ID -> Street Address
Customer ID -> Zip Code
Customer ID -> Purchase Order ID
Purchase Order ID -> Part Number, Quantity
One should be able to "EXPOSE Part Number, Zip Code" and have the database "know the relationship" and produce the correct vector of tuples.
But that doesnt happen.
Now all the people bleating about the Higherarchial databases and bad things from the past are doing this harping because they remember the bad-old-days when a database would maintian one tree-structured set of relationships like this. In the higherarchical model, you could basically do this EXPOSE operation, but only if you had, by dint of pre-knowledge, asked for things lying on one linear path through the tree. (* simplified for brevity).
In essence, SQL requires you, the programmer, to be in the business of making up relationships that should be in the data or schema structure but isn't.
A magically complete RDBMS would take a series of vectors of the form "Independent Datum (key)->Dependent Datum (value)" (where either side of the arrow could be a list of atomic values). The RDBMS would then assemble and maintian tables or linked lists or whatever with no exposure of SQL-esque "tables" and the accessors would be storage method agnostic. (That is, there would be no such thing as a FROM CLAUSE.)
For instance, in the above list of relationships, Customer ID, Customer Name, Street Address, and Zip Code *could* all live in a "table", or not, but you would never know that. But a better vector of
Customer ID -> Street Name, Building Number, Suite to replace "Customer ID -> Street Address" has a table-feel, but would not bias against "EXPOSE Street Name, Part Number".
The power of this comes from being able to do:
EXPOSE Building, Part Number, Quantity
Where Building == "Word Trade Center 2";
And have the RDBMS already know the sequence of relationships to get from Building to (address elements) to Customer ID, to Part Number without the programmer writing the three stage join across the "uninteresting middle tables".
(The above presumes you have a building relation that has Street Name, Street Number -> Building or some such.)
All the XML nonsense is nonsense *_BECAUSE_* the strict-nesting enclosure requirements of XML make it "naturally" become hierarchically organized. But data exists outside the single-inheretance strict parantage trees that the hierarchical model dictates.
The problem is that SQL got real popular and so the idea of structural inferrence got pared down to tables and Query-Like constraints on tables before anybody had a chance to formalize the idea of living, encoded relationships between arbitrarily stored datum. So we never really got a language or system that could "EXPOSE".
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
The very first premise in that doesn't make sense.
If they want freedom why would they join a terror organization?
I've done several years of application programming using SQL
I've also implemented the (XQuery-derived) query processing module for a native XML database
In my former life as a application programmer, I really liked SQL. It allowed some pretty complicated computation to be done in the query, and very concisely in many cases compared to doing the same thing in, say, C++. For example, things like grouping are very nice for many application purposes.
In my current job, I'm hoping to create an XML query language that supports the same sort of capabilities as SQL. Our XML query language implementation has decent path/predicate, sorting, and output structuring capabilities, mostly derived from earlier drafts of XQuery.
My feeling about XQuery 1.0 is that it is extremely bloated. XML seems really simple; querying it shouldn't be all that complicated, should it? But the XQuery committee has created several hundred pages of specifications for the new language. This seems excessive, to say the least. We basically have implemented a subset of an earlier version (with paths, predicates, sorting, XML construction, a few dozen functions), and stopped tracking what they were doing. This is kind of unfortunate, but we really don't have the resources to support his behemoth in all its awesome grandeur.
We just want a language that lets programmers efficiently access our database. I think we're on the right track. I'm not at all sure that XQuery is going to wind up as a long-term success, partly because of its bloat factor.
My favorite illustration of the XQuery bloat is this: early versions (up to about April 2002) of the XQuery language description contained this sentence in the introduction:
It is designed to be a small, easily implementable language in which queries are concise and easily understood.
Starting in August 2002, this was changed to:
It is designed to be a language in which queries are concise and easily understood.
The "small, easily implementable" part got smothered up by the avalanche of features they were adding.
Have you read my blog lately?
My bad :) I'd never come across the 1999 SQL spec.
:= parent(x,y) := ancestor(x,y), parent(y,z)
Interesting, I must say.
Not quite as concise as say (now its been a while, so no critiquing my Prolog).
ancestor(x,y)
ancestor(x,z)
I dont know if the WITH RECURSIVE spec has any limits that arent in say Prolog (Prolog isn't full First Order Predicate Calculus either).
Winton
It would have been a more interesting article if he'd compared SQL and XQuery to some more positive examples. eg RPG.
"XML is not meant to be read by humans, it's a data interchange format and thus meant to be read by machines" - Good heavens!
Someone saying something like that has really got a BS in BS! Or perhaps even worse: a PhD in BS.
XML is all about programmers being able to understand the data! Yes, because we are not anywhere near that nirvana of fully semantic systems that can (semi)automatically understand each other. NO! Programmers have to do the work to make the systems fit together and XML gives them the advantage that they do not have to reverse engineer another proprietary data format or dig into a horsepile of documentation. XML makes it easy to understand how to process the information at hand - without any extra work!
Also it's great format for storing small amounts of constantly changing data (like user preferences) cause it's extensible and with only a tiny bit of effort backward compatible as well.
Anyone trying to use XML for processing large amount of data (like data warehousing) is either nuts or doesn't give a damn about the speed or costs.
However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain (anybody who has tried to decode data from some long gone accounting package from the '80-s knows what I am talking about).
So yes XML is self describing only to humans and that's the whole point of it. Formalizing data semantics is not the goal of XML, has never been and will never be, thats what we have RDF, RDFS, OWL and other nice initiatives from Semantic Web movement for.
Flat files are the wave of the future. If I can't grep it...I don't want it.
Plain ole ASCII flat files. Best database ever!
No, the "zero" or relational databases is the empty set.
HAND.
Hmmm... what if we made an XML database based on EMail? Then, we could create a whole new schema paradigm using IPSec!
Wow, suddenly my scalp begins to tingle. I think I'll call a meeting...
help me i've cloned myself and can't remember which one I am
I think this whole dispute is not necessary. It's just a simple case of conflicting goals. One side wants "efficiency" and another wants "extensibility". What both sides miss is that real systems can't afford to choose just one of these, you have to get both. So you have to have just enough extensibility to allow reasonable extensions, and still attain reasonable levels of efficiency. But trying to get all of either thing will totally lose the game. So the XML camp is wrong to think that extensibility is everything (ever tried parsing XML in a real-time system?). And the "binary transfer format" camp is wrong to think efficiency is everything (ever tried to make two versions of your protocols to interoperate?).
-- Esa Pulkkinen
Because they think they are going to fight for freedom, also realize most of these people join young. Anyways its basically like a cult.
No. You. Can't.
You simply cannot know what the data means without specified semantics (which you have to agree upon somehow; they aren't magically apparent from the XML itself -- that's why such abominations as DTD and XML-Schema exist).
HAND.
table: order ID, part number, quantity shipped
select part number, sum(quantity shiped), avg(quantity shipped)
Group By Part Number;
This works with NULLs in the column for quantity shipped on parts which have not yet been shipped. If you just use zero for "no shipment" then your average number will have no real value for answering questions like "how much do we spend shipping these parts, on average?" etc.
If you wan't to throw an exception you can throw the execption or not in your program. In that case you fetch the individual values and do the math yourself and the "that's not a number" that is caused by the null gives you the chance to throw your exception.
But since, in aggregate operations, your program isn't even interractin with the data yet, where would such an exception go?
What would the SQL syntax be communicating a list of results PLUS a list of exceptions to your program? Which order would things be processed in?
Your boolean analogy is also flawed. "You have stopped beating your wife?" is not a yes-or-no question because it carries a predicate around with it that you may not fulfill for serveral reasions (not married; you are hetrosexual female, so you don't have a "wife", you have never beaten your wife so you can't "stop" doing it; etc). There are a surprisingly large number of "real data" that nature. For those of you who have trouble abstracting this, the "real comparason matrix" is "True, False, and Not Applicable". NILL buys you "Not Applicable" so very cheaply.
In poin of fact, people who don't like NULL, usually because they don't understand its purpose and use, make a hell of a lot of work for themselves.
My current employer has a large database of test values that grows by huge numbers of elements each day. The programmer "didn't understand" NULLs (ro RDBMS' for that matter) and has "-" in fields that should be NULL.
Consequently we cannot aggregate. All of our client applications end up haveing to bulk-fetch whole table ranges and run through elaborate statistical routines full of conditionals; or do separate fetches with "field != '-'" in the where clause and run a concordance operation in ram after the repeated bulk fetches.
This costs bocup time and degrades the quality of the product.
You call "academic bullshit", I suspect you have never had to work the really large or significant data sets. I suspect that you don't ever ask the server-side to aggregate for you. And I suspect you have never worked time-critical transactions across a "slow" link.
You can't have. You think of "NULL" in terms of equality.
I will give you the "syntatic" point that "Where X = NULL" ought to be unversal. But, for instance, the cartesian nightmare of having "NULL == NULL" in a join is beyond idiotic.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
No, they are not, but the way to rectify them is a bit extreme. Keep in mind that -- as Date says again and again -- there is a difference between the logical model and the physical model. I'll summarize the example he used: a EMP_SALARY table.
Let's start simple:
[Sorry, ecode doesn't seem to want to do vertical alignment.]
Now, what do those two NULLs mean; do they carry value? If Bob is unemployed, we could write 0, but then it could be confused with unemployed and employed for no pay. Perhaps we don't know how much money Chris makes, but we do know that he makes some money.
You could fix this by adding a TYPE enumeration column that would take on values like EMPLOYED, UNEMPLOYED and UNKNOWN, but you'll still need the NULL value for the UNEMPLOYED and UNKNOWN cases.
This last part can be solved by logically segregating the table above (sans TYPE column) into three tables (one per type). Both the UNEMPLOYED and UNKNOWN_EMPLOYMENT tables would lack the SALARY column -- they would have only the EMP_ID as there's no more information to add. All rows in the EMPLOYED table would have a known salary.
NULLs have been removed and the design is further normalized -- some would say to the extreme.
Now, how you would model that physically without using NULL and still managing some level of performance I do not know. But that at least explains the reasoning behind NULL not being necessary.
As for myself, after nearly fifteen years of database design and implementation, I'm quite satisfied with using NULL where appropriate. I've never been befuddled by it nor sidelined by its behavior with respect to logical operators. Learn the rules and move on.
Freedom to fear. Freedom from thought. Freedom to kill.
I guess the War on Terror really is about freedom!
It's one thing to say that either SQL or XQuery have their problems, because they do. It's quite another to say that SQL is bad because it doesn't live up to some arbitrary never-achieved (and perhaps unachievable) standard of relational purity that even Codd himself found superfluous. When Pascal does nothing but the latter, and in addition takes a dozen thoroughly unprofessional swipes at Chamberlin for having been involved in both SQL and XQuery, his professional jealousy is becomes thick enough to choke on. I wish he would, so we would be spared the incessant ranting of someone whose whole career has been marked by a lot of words and not a single deed to back them up.
Slashdot - News for Herds. Stuff that Splatters.
It is true that in many cases a customer will never change their database, so designing an application for portability seems pointless. However in my direct experience I have had to port an application from Sybase to Oracle, and it was a nightmare due to the number and complexity of the stored procedures.
The real problem with stored procedures is that PL/SQL is an awful programming language. The syntax is primitive. I didn't recognize it until I saw some examples of discredited languages of a bygone era. the parser is crappy; is returns crummy compilation errors and reformats your source. The DB eats your source, adding a level of indirection between your source and the implementation. And the runtime environment is very finicky.
And if I spent a couple of years learning about PL/SQL I would still know nothing about how to solve more general computing problems. Instead if I learn Java I can use that knowledge to program almost anything.
Case in point! FUD FUD FUD! Or is it just ignorance...
..." these fall under the category of --USELESS FEATURES--. The only exception there might be views, but even that is debatable.
The first three things you listed have all been MySQL features ever since they added the InnoDB engine (version 4.0 I think, two years ago). Maybe you should RTFM!
As for "stored procedures, views, triggers,
As for enterprise class, you can certainly pay somebody for support if you want to. No it's not the vendor, but they have the source code (along with anybody else who wants it.)
I seem to recall reading such things from Date et. al. for years. My copy of the second edition of the third manifesto (there's a mouthful!) is dated May 2000. This is a half decade old holy war. Not news, not really stuff that matters. For nerds maybe.
Cheers.
JE
"The point of al those repeated tags is that machine time and bandwidth is very cheap now, but human time is not"
yeah, you tak a file that has 1000 linse, each line has 1024 characters, and 300 fields. Transmitted from Brazilla at 28K.
Now convert the to an XML format, and then try sending it. suddenly it take 10 times longer to get the data. Literally going from a couple of hours to all day. Not exactly cost effective.
XML is easy to read, if you know the format and context in which your reading it. It still needs to be designed, you still need to go to meetings, and you still need to create paperwork on the structure.
It's got its place, but it is not the end all, or even cloce for that matter.
The Kruger Dunning explains most post on
I work with a 1960's era throwback of a database that uses Unidata, (or U2), a derivitive of a Pick database. It uses multivalue attributes and sub-multivalue attributes in fields. The technology predates SQL and is dying (unlike BSD).
This is relevent because this dying technology inherently supports non-flat data. If you want to store a purchase order as a single record you can with as many line items and as many deliveries per line item as you like. Field size is limited only by available disk space.
After working with this non-flat data for a couple of years, I have concluded that anyone who wants to go back to it is nuts. It does work, you can run a business on it, but it is cumbersome.
Yes it's true that building up an object is easy because you can get a whole object in one read, but you end up with a bigger problem on the other side. When you want to look at your data differently you end up twisting and bending to get it.
For example if you wanted all the purchase orders due on a certain day, you could easily end up reading every record to find them. Yes, carefull indexing solves the problem and you can index the value of a multivalue list to get the po number, but flat tables and relations will solve the same problem more simply.
Some years ago, Xplain was a very good solution to the SQL problem (in a project i worked on for the Guadalajara government) Someone uses it today?
RDF
Kowari
I agree with you that 99% of the time you would never want NULL to equal NULL. "BatMan" shouldn't ever equal "555-5555" the same way an unknown Name (NULL) is not the same as an unknown PhoneNumber (NULL). And my phone number, though you don't know what it is (NULL), is not the same as your phone number thought I don't know what it is (NULL).
However, since you asked, I can give you an example of when it would be nice to have NULL = NULL. In data warehousing, you have two types of tables - dimension tables and fact tables. There is always a 1-to-many relationship from dimensions to a fact table. Dimensions ALWAYS have a surrogate key (if your data modeler knows anything at all about dimensional modeling) and should only receive an insert if that record in its entirety (except for the surrogate key) does not already exist in the table. In order to check if a new dimensional record needs inserted or not, you have to do something like this:
insert into dimGeography
(
-- ALL COLUMNS FROM dimGeography EXCEPT FOR THE SURROGATE KEY
-- GeographyId (autonumber surrogate key)
Country,
State,
City
)
select distinct
(
Country,
State,
City
)
from
dimGeography_Stage a left join
dimGeography b on
a.Country = b.Country and
a.State = b.State and
a.City = b.City
where
b.GeographyId is null
Now where you run into problems is when a field in your join allows NULLs. For example, Mexico does not have states, so when adding a record for Mexico, you may have NULL as your State. So... no matter how many times ('Mexico', NULL, 'Mexico City') shows up in dimGeography_Stage, it will always be inserted into dimGeography as a new record. This is where NULL = NULL would be handy. However, there are some easy workarounds such as saying COALESCE(a.State, 'SOME DISTINCT VALUE') = COALESCE(b.State, 'SOME DISTINCT VALUE') in your join. Or, you could always disallow NULLs in the State column and use the Country as the State when inserting values from Mexico.
I wonder at the emotional energy tied up in railing against less efficient and elegant solutions. Yes, there are always more powerful and efficient ways of thinking about things. However, the majority of people may not be at a level where they can conceptualise in the way required by these higher technique.
An example related to the article is where people mistake the representation of relational databases in 2D tables of rows and columns as their underlying logical structure. It may turn out that conceptualising an N-dimensional logical structure is beyond these people.
However, these people should not then be browbeaten for their less efficient methods. People at whatever level need to work with tools and conceptual models that they can relate to. Thus if people want to use flat databases, let them do so and live by the limitations of that tool.
It is possible to get too stuck on the importance of efficiency and elegance. It is perhaps like some supposed transcendent being who can teleport instantaneously from one place to another pouring scorn on us lower mortals who still rely on physical movement to move about in space. That is our level of accomplishment and it is proper that we work at that level, whatever it's inefficiency.
So in summary, work at whatever level of efficiency and elegance is right for you, but be careful of criticising people who are working at a lower level than you. By being critical, you jeopardise the opportunity of appearing as a model for higher levels of ability when those people are ready to look around for more advanced techniques, and instead your criticism can instead help construct emotional blockages in those people against future progress.
(I just used my mod points in another thread, so I gotta hope other mods recognise the parent post.)
In the world of self taught dabblers, NULL is not well enough understood to be expected to do anything more than cause the kind of problems you alude to with the likes of '-' to (partially) imply what NULL should be used for.
SQL has to coexist with other components where an empty string and a numeric zero are assumed null and treated accordingly, the quantity shipped example you give being just as easy to understand and implement with zero meaning not shipped as with a separate null (just add "where quanity shipped > 0").
There are also several possible reasons for a data value to be left NULL or undef, not all of which are mutually exclusive. Is it "not yet", "not known" or "not applicable"? In the real world we sometimes need to pair a status enum and a (numeric or string) value column to properly represent a single logical datum which needs to sometimes take state values not sensibly representable by numbers or strings.
We used to use a string of 9s in a numeric key field to represent end of data and even today Perl's DBI interface uses the 0E0 kludge to represent a "true" zero.
-- Our systemic servants do not good masters make.
The real world issue is that SQL has built in support for sorting and sorting algorithms in most big databases are highly optimized
This is why I do sorting in the db. Aesthetically, I would rather do it in object-land so that the logic is kept separate from the data, but obviously the db vendor has more hours to throw at optimisation than I do. Also, I can't assume that my code is the only client hitting the db -- the db may be able to cache common queries / sorts between multiple clients.
Vino, gyno, and techno -Bruce Sterling
Pascal's entire line of argument fails to explain why I should care. I RTFA'd. I even read some of the links. This is a topic near and dear to me, as I fight for better design, better technologies at work against various technology and platform zealots.
I'd argue that if you have a thesis or hypothesis, you need to make a testable predication and test it against the real world. That's the gist of science. Pascal tests XML and SQL against the Gospel of Formal Data Model's according to Codd. Pascal's arguments are theological, not practical, not scientific.
I want to know how XQuery will or will not make my job of designing and implementing solutions that save my company money. How will the problems of XQuery, or the blasphemous SQL implementations cause me problems that cost time? How do these deviations for orthodoxy really make code harder to write, harder to maintain, more prone to error? These are the relevant questions. These are issues that can also be tested against production scenarios.
I don't give a hoot about how well XML falls into Codd's formal model. I care whether or not it saves me time. I'll happily accept that normalization, taken not too literally, does saves me time. XML also saves me lots of time: makes a great prototype to store data in (gasp!) before I implement the database layer. Even for a somewhat complex schema, I can blast out a small XML document in a few minutes in my text editor of choice, and test out how the components use the data that they store. Sometimes, for small data sets such as configuration information, I even leave it in XML for years before finding a compelling reason to deal with an RDBMS. Web services (the horror!) have made the middle tier much more accessible - I can drive access to a service based on policy now, rather than to whom I can distribute the damn DLL and it's dependencies. None of this is perfect, but XML works where it makes a difference to me.
In short, regardless of how well or poorly Pascal addresses his questions, they are the wrong questions addressed in the wrong way. He should go into theology or critique Chinese literature and leave us working folks alone.
My motto: "A cat is no trade for integrity."
Yes, IN SQL, it is ambigous.
That is *EXACTLY* the point.
If SQL were adaquate to represent "everything the relational model represents" then the ambiguity would be resolved by that NON-SQL language that doesn't exist.
SQL is inadaquate to the task of fully utilizing the RELATIONAL MODEL because it doesn't have any way to EXPRESS RELATIONSHIPS other than "table".
Hence the reason (which I was trying to explain in clear terms) the author of the article referenced in the news item was expressing lack-of-happiness for the way SQL works, and the way it is being further debased (his opinion, I haven't read XQUERY) by XQuery.
This also why I didn't use SQL legal identifiers in my examples, I instead used semantic place holders like "Customer ID". *I* havent invented an adaquate language to express the full set of relational operations *either*. 8-)
The place were we totally lost out is the COMPLETE ABSENCE of ANY standard language or RDBMS which actually has a means to "store relationships".
So we create tables, and indexes, and then ask the programmers to go look in documents to figure out how the tables interract. And they each come up with their join-of-choice across the SQL-forced pairs-of-tables.
And the people who really understand the model of what an "magically complete RDBMS" could really do, are forced to sigh and shake their heads.
As the market exists today, if you did make the non-SQL RDBMS that would answer this deficiency, you couldn't sell it because it *wouldn't* *be* *SQL*.
And yes, you could make a meta-system over an existing RDBMS that really understood the RELATIONAL MODEL. But you would pay a hefty entropy cost because you would take your higher understanding and have to decompose it into the SQL_Database operations which are not atomic enough to factor out of the transaction.
So your four-step over-wrapper (which I have toyed with as an OOS project idea) would suffer terrifically (eat memory or be really slow in round-trip costs) exactly because it cannot adaquately communicate with the underlying Query Optimizer (et al), or influence the real storage/communication/duplication cost.
It's like putting a future-car "coversion kit" on your sedan. It will change the look but it won't make it fly.
The shape of an N-way chain of inference, natively executed, would be far different than the shape of a series of 2-way executions that are recombined to look N-way-ish.
The closest you can come is to, in special cases, use sub-selects to create transient tables in the from clause to pre-filter and anti-join you data deep in the heart of your query engine. You can get some real wins, but again, this is you doing this and not the RDBMS.
If you truely doubt this, check out the Oracle extension that makes self referental queries work. I don't remember the syntax off the top of my head, but it allows you to do depth-first queries on tables that have a structure like:
table: Object_ID, Parent_ID, etc...
It it purely and obviously NOT SQL since it uses (if I recall) "PRIOR" (or something) as a verb and SQL returns "Sets" not "ordered sets".
But this prior/next kind of relationship (which builds an N-Tree within a table) is an example of one of the many kinds of relationships that the Real(tm) RDBMS over-layer would have to create.
Then the enforcement of relationships would become an endless string of cascading constraints expressed as (slow) trigger procedures.
So you might as well then just read the data sets into your own store and do your own work, elsewise the cost is multiplicative.
Hell, the number of systems that wont let you do the (legal) task of taking the database totally out of state mid-transaction is surprising. By that I mean you should be able to construct
Table1: Object_ID, feature, thing, whatever
Table2: Object_ID, property, attribute, whatever_else
and then constrain the two tables *mutually* such that any Ob
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
The "Pascal should go into Theology" post has a point.
But my problem with Pascal and his followers is that they beat everyone over the head by claiming that a true relational system would be the software equivalent of a perpetual motion machine. Every flaw of SQL or RDBMSs are critiqued by saying, they suck because if they had implemented a _real_ relational database (and query language) it could magically use lots of different kinds of underlying physical storage schemes that would optimize performance in situation X and use a different underlying scheme to optimize performance in situation Y.
That is like saying - "See how beautifully and elegantly this calculus function computes limits? If you implemented a perfect software solution that matched the theory of limits, it could do limits instantly and accurate to 6 billion digits.
One can use that type of argument to bash every software solution to every mathematically articulated system.
Second, for all their railing against NULLs, I believe Codd admited they were a sort of necessary evil.
As one poster noted, maybe simple predicate binary "Have you stopped beating your wife [Yes or No]?",
logic doesn't map well to the real world - OMFG! maybe the fact that the relational model is internally consistent doesn't mean that it is perfect when mapped to real world data and real world needs. Godel had something to say about perfectly internally consistent systems.
SQL is inadaquate to the task of fully utilizing the RELATIONAL MODEL because it doesn't have any way to EXPRESS RELATIONSHIPS other than "table".
Isn't this a limitation of the relational datamodel itself? I think only the entity-relationship model can distinguish between relationships and other data. Once you convert you're model to the relational datamodel, you've lost that distinction. This is not an SQL-specific problem.
At least this is what we learned at our university on a database course.
is beyond my mere predicate logic skills at this time.
Relational "math" as tought is only the underlying principles, not the final end-form. New operations that "wrap" the mathy stuff can be defined from the mathy operations such that they are easier to relate to.
It is similar to how we use AND, OR, and NOT as our Boolean logic primatives. However, in theory we only need NAND. But NAND is "ugly" to most humans. Thus, we extended the "language" of Boolean to closer match human language. As long as our extensions are defined using the primatives and don't break the rules of our system, we can have our cake and eat it to. True Boolean geeks can still use NAND if they want.
In other words, Dr. Codd was a brilliant theoretician, but a lousy marketer and packager.
We just have to agree on and/or find relational operators and syntax that we find more intuitive than those in the original papers. Sometimes I feel that "look-up" would be more intuitive than "Join", for example.
Relational as a practice is still young.
Table-ized A.I.
No system, SQL or otherwise, can automatically resolve a question that is conceptually ambiguous.
Again, I point to the example of "give me the ZIP code of sales order #12345". Even if you have a system that efficiently specifies and prioritizes relationships between tables, you've still got to explicitly tell it whether you mean the billing address or the shipping address. And the only ways to do that are (1) tell it to use BillingAddressID rather than ShippingAddressID when linking to the address table, or (2) give up normalization and store BillToZIPCode and ShipToZIPCode directly in the sales order table.
Yes, there are plenty of things that could be added to SQL, or added by tossing out some parts of SQL and revising them. I'm just pointing out that there are also plenty of things that absolutely require the designer to provide more information.
Gimme a break. Love it or hate it, SQL is an amazingly powerful way to work with arbitrarily complex data sets. Need proof? It is the backbone of nearly every non-toy scale data storage project. No amount of psuedo-academic argument can make irrellavent the fact that it works.
SQL is a lot like COBOL. It works and it is practical and runs the world. Few dispute that. However, most relational affectionados (me included) feel that we can do better. Rather than live with the COBOL of query languages (SQL) forever and ever, let's experiment a bit. We have thousands of procedural and OOP languages, but only a handful of relational query languages to learn and play with. It is time for a Cambrian Explosion of relational languages. The hardware is now up to it.
I tend to kick around a "dynamic relational" view of relational theory, as apposed to the "static" version offered by others in the field. Just as imperative languages tend to fit a dichotomy of compiled (static typed) or interpreted (dynamically typed), I feel that there is room for dynamic relational also. It may serve better in rapidly-changing ad-hoc and RAD environments.
Table-ized A.I.
One thing that everyone should understand: even though Pascal, Date, etc... argue that SQL is a bad implementation of the relational model, they *still* agree that it provides value, and that it is miles better than its hierarchical database predecessors. Since it is their job(s) to provide a reasoned critique of the field, it is only natural for them to rigorously compare SQL to the goals of the relational model. Many of their complaints fall in these categories (although there is much more):
1. Does too much -- too many ways of doing the same thing, and too many unecessary operations that could be better done another way.
2. Overly complex -- the SQL 1999 standard was something like 1200 pages.
3. Allows programmer to circumvent relational integrity. Things like "hidden identifiers", pointers, etc...
4. Too wrapped up in implementation -- users must spend a lot of time understanding the physical storage, rather than focusing on queries in abstraction.
5. Many small inconsistencies in SQL itself
But the problems with SQL are impossible to judge if you only know SQL. It's like the people who used to ask what was wrong with a perfectly good typewriter that made people want to use a word processor. To any who are curious, I suggest you do some reading. The absolute best simple introductions for these problems are in a two online documents by Hugh Darwen at www.thethirdmanifesto.com. Look for "The Askew Wall", and "The Importance of Column Names".
It's a not defined value (comparable with a division by zero). One of the consequences (for example) is that you can't join on NULL columns. Theoretically you can't even sort (ORDER BY) on null values.
It can be bad for performance. Sybase for example (they still exist and partially in some very performant databases, i.e. in the financial and telecom sector) internally represents NULL columns as VARCHAR, which can be a bad idea since it might invalidate fairly cheap in-place updates.
That said however, you are right that it doesn't make sense to implement an anal-retentive model in most physical database schemas in order to be compliant with clarity and purity as defined in a number of mathematical textbooks.
If you have to design a database able to handle 30 million bookings over a 24 hour period, then it might make a lot of sense to avoid NULLs. In your example to replace the NULL by an impossible death date (i.e. 31-Dec-9999) and let the application deal with it.
(And now, I don't want to argue about the Y10K problem...)
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
NULLs make a lot of sense if you understand fields as sets; so a field with value 11 would really represent the set {11}. Now NULL makes sense because NULL = {} the empty set. However, sometimes NULL is supposed to mean "unknown" in the meaning of "This value could be anything". What is missing from SQL is therefore the opposite of NULL, the value ALL, which would indicate e.g. for an integer field the set containing all numbers. This is where the logic of SQL is indeed often broken, because people use NULL to indicate "unknown" and then questions like "Who COULD have murdered Jane ?" can't be answered by "SELECT name, motive, place FROM motives, whereabouts WHERE motives.name=whereabouts.name AND whereabouts.place='house'" But maybe you can correct my SQL :-)
I'm still trying to figure out what people mean by 'social skills' here.
NULLs make a lot of sense if you understand fields as sets; so a field with value 11 would really represent the set {11}.
:-)
.. should have used preview)
Now NULL makes sense because NULL = {} the empty set.
However, sometimes NULL is supposed to mean "unknown" in the meaning of "This value could be anything".
What is missing from SQL is therefore the opposite of NULL, the value ALL, which would indicate e.g. for an integer field the set containing all numbers.
This is where the logic of SQL is indeed often broken, because people use NULL to indicate "unknown" and then questions like "Who COULD have murdered Jane ?" can't be answered by
"SELECT name, motive, place FROM motives, whereabouts WHERE motives.name=whereabouts.name AND whereabouts.place='house'"
But maybe you can correct my SQL
(Sorry I had the wrong format selected, so I resubmitted
I'm still trying to figure out what people mean by 'social skills' here.
You are wrong. The SQL:2003 standard specifies IDENTITY. See http://troels.arvin.dk/db/rdbms/#mix-identity
Wrong. DB2 is rather close to implementing SQL:1999's recursiveness. There is an article comparing Oracle and DB2's recursive features.
-And a patch exists for adding Oracle-style recursion to PostgreSQL.
Whether WITH RECURSIVE or CONNECT BY work effectively is another question. I haven't seen any experimental articles considering this.
Without using recursive SQL, different encoding schemes exist for the purpose:
Joe Celko has recently published a book dedicated to the subject.
See my RDBMS links page for more on this.
XML is all about programmers being able to understand the data!
No. XML is all about storing meta-data alongside the data. And you are implying that programmers will not understand data that is not easy readable, like XML; this implication is an oversimplification.
(...) reverse engineer another proprietary data format (...)
XML is certainly not the only well-documented interchange format.
(...) dig into a horsepile of documentation.
Is is fairly easy to create XML documents that require horsepiles of documentation, just like any other format. Just look at the XML output of Microsoft Office; XML is no magic bullet.
(...) (like user preferences) (...)
No, one should use an API for that and be independent of storage meganism.
However anybody using XML for long term data storage is a genius since other "more efficent" formats will be obsolete ten years from now and the software that can read it can be extreamly difficult to obtain
I would not like to retrieve all information from abovementioned Office XML files right now, let alone in ten years time.
So yes XML is self describing only to humans and that's the whole point of it.
Ehhm, no. Human readability is a side effect that probably led to its widespread adoption. Work is underway to make certain XML far less 'readable' to humans, but smaller and faster to process.
Formalizing data semantics is not the goal of XML (...)
No, because that would require AI, but formalizing meta-data semantics most certainly was a target set for XML to achieve the goal of better data interchange.
Wenn ist das Nunstueck git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput.
One foundamental error in today's operating systems is that they are datatype-agnostic. They simply don't know what the data they handle is. This task is left completely to applications, and each application usually provides its own way of managing data. This causes incompatibilities between applications, and these incompatibilities are not solved either by SQL or XML (that are nothing more than human-readable representations).
/. discussion when half of the posters say that SQL sucks and the other half saying that XML sucks!)
In my opinion, an operating system must primarily provide a data management solution. It must provide the common ways to organize, store, retrieve and process data. This means that the application should only care about the logic behind the data, not how data management is implemented. There are only a few methods of data organization anyway, and it is a shame that these methods are not available when an O/S is installed.
The algorithms that concern the data types should also be available along with the data types. This means that an operating system not only should provide data management, but it should be object-oriented: each "data node" in the system should be available as a class in the chosen programming language (if it supports such a concept).
The availability of data and their types on the O/S level would also boost security and safety, as it would not be possible for a 'devious' application to approach the data in any other way other than the intended one.
Finally, the concept of 'application' is also wrong, and I am saying this in the context of data: in our day and age, data not only multiply fast, but the types of data are frequently modified. The liquid status of data (and their data types) makes the concept of an 'application' (thousands of source code lines, cast in the stone, with a huge degree of coupling between them) a huge obstacle in really making computers useful. Applications need to be replaced by a live system of persistent objects that do simple jobs and inform the world (through events) about changes in their state or the results of their computations; the O/S should be responsible of organizing how objects communicate with each other (either in the same memory space, in different memory spaces, or in different computers).
Since the current situation is not exactly orthogonal (as described above), there are many misunderstandings and problems in defining concepts clearly; many thousands of dollars are spent in re-inventing the wheel, and many work hours and brain power is consumed in creating what should already be there...(and thus we can have a nice
Maps perfectly to the real world:
;-) If that doesn't give you the information you needed, you asked the wrong question.
1) You actually did beat your wife and have stopped doing so... answer the question with a YES.
2) You actually did beat your wife and still doe, so you didn't stop beating your wife... the answer is NO.
3) Any other case (like you don't have a wife, or never STARTED beating her, so you can't possibly stop), the answer must be NO.
In my case, I'd have to answer with a NO
Greetings,
Sebster
One foundamental error in today's operating systems is that they are datatype-agnostic.
Sorry but this (mis)use of the word "agnostic" really irks me. Surely you don't mean that today's operating systems think that the existence of datatypes is "unknowable"? Why can't you just say datatype-independent?
...is that you *needed* to append that final line for people to realise that you weren't deadly serious. Only on Slashdot...
I'm not going to try and reply in detail, but since I participate in the W3C XML Query Working Group and am also the w3C XML Activity Lead, a few comments may be useful.
:-)
The article seems to says "I don't like SQL and I don't like XML and I think XML Query is about mergin them although I don't understand it very well, so the people working on XML Query must be stupid, and in any case it's easier to attack people than understand a specification".
Perhaps that's unfair, but it's clear to me that the writer is a little fuzzy on the design goals of XML and also on the focus of SQL development over the past 10 or 15 years.
In both cases the story is about interoperability.
If you look at the XML Query Home Page you'll see approximately two dozen implementations of the XML Query draft, including a number of open source ones. If you look at the public mailing list for comments, you'll see we received over 1100 detailed technical comments at the last public review. So there's a lot of interest in this work.
Why is that? One reason is that, like Web services and SOAP, XML Query is able to replace a lot of proprietary and hard-to-maitain middleware. Another reason is that for the first time we'll have a standard way to search over multiple kinds of data source.
Don is the primary editor of the XQuery language, but the technical decisions reflected in the specification are a result of collaboration, and are agreed on by aconsensus process by a much larger number of particpants. The goal is to make a language that people agree to implement and to use. With support announced by Microsoft, Oracle, IBM, BEA and others (see Web page mentioned above) and judging by the public interest, I think it's fair to say that's going to happen.
It's pretty rare to see a large complex system that everyone is happy with. It's actually pretty rare to see a small system that everyone is happy with. There are people who are unhappy with some features in the Unix cat program, but it's better to have cat in every Unix system than to have millions of shell scripts break on systems where it's missing! The trick, then, is often to include features that will lead to massively wider adoption, even if some people would rather be without them.
Then we have (as part of W3C Process) a public call for implementations so that we can test to see how confident we are that all the major features can be implemented compatibly (i.e. interoperably) in multiple independent implementations.
Features that were not implemented get removed before the specifications are final.
Is XML Query a waste of time? Is XML evil? Is SQL evil? A lot of people think otherwise, and some of them are pretty smart, so if you are concerned, take the time to read the specs and decide for yourself.
Live barefoot!
free engravings/woodcuts
Yes. Consider the mathmatics of this question. Its far easier to convert SQL to XML, than it is from XML to SQL.
There's some history behind this subject. XML is the love child of the CODASYL standards created back in the 1970's. In the business shoot out for data bases was IBM's SQL(relational) winning over UNIVAC's ADML(hiearchial); only because IBM did its home work better by using math, and marketing. Computers back then were VERY expensive, and SQL proved to be cheaper than ADML. Hardware cost has changed since then.
But our universe is more Hiearchial, than Relational. In the fullness of time, we will see the imergence of XML over SQL, if only because of the cost of developement, and maintanence.
You are still missing the point. You are mistaking the simple conceptual example I originally provided for a complete and potent solution. Try to be a little more abstract.
There would have to be a way to disambiguate the full model. The SQL modle is disambiguated by the table name. A properly relational model would have "Some Other Mechanisim(tm)". I made no attempt to suggest what that mechanisim-of-syntax might be.
Now, as a BONUS, in your original discussion of ambiguity you *WRONGLY* assume that the ambiguous result would be incorrect. This is an example of the SQL polution the article writer was talking aobut. There are completely valid reasons to want to fetch, in one EXPOSE, both branches of the abmiguous tree. In SQL this compositing is done by the UNION operator.
So you state:
Address -> ZIP Code
Customer -> Billing Address
Sales Order -> Sales Order Number
Sales Order -> Customer
Sales Order -> Shipping Address
EXPOSE Sales Order Number, ZIP Code
WHERE Sales Order Number == "12345"
Are you asking for the ZIP code where the physical goods should be sent, or the ZIP code where the bill should be sent? The system has no way of knowing.
And yes, if I am a clod and ask this general question when I want a spesific answer, I will be disapointed.
On the other hand, if I were doing a marketing distribution graph, I would *EXACTLY* want the general answer that one might get from
"EXPOSE Part Nubmer, Zip Code" which would "naturally" find all the Zip Codes that were related to the part number. That way my map-of-the-US-grpah would have a push-pin in it for every address it could find, giving me the billing, shipping, contact, and corporate address zip codes.
Since the discussion at the conceptual level is beyond you, the "real language" implementation of expose would need limiters. So something like:
EXPOSE Sales Order Number, ZIP Code THROUGH Billing Address
WHERE Sales Order Number == "12345"
Or some such.
I can imagine several better syntax and a good number of operators that this "Real Language" would need. Things like "DISTANCE BETWEEN thing AND thing" to filter on how many Degrees Of Kevin Bacon are interesting to me, or concordancing where one of the "things" in an EXPOSE were an expression like street name(Billing Address) or even (street number, street name, zip code)(Billing Address) or some-such.
THE POINT was not for me to offer up a complete language, ready an whole, to be nominated as a replacement for SQL. It was just to *EXPLAIN* about the fact that SQL, and the HABBITS OF THOUGHT THAT SQL ENGENDERS IN ITS USERS (habbits like those you have demonstrated in your objection), discard whole-scale many of the things that the "relational model" says "ought to be(tm)" stored in the database along with the shema and the data.
Those "things" being the relationships themselves.
As long as the language of access doesn't allow you to think about storing and exploring "relationships" then the RDBMS you are using is just a bunch of lists with Sadly Basic Set Theory Operations on top of it.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
You are right that the problem isn't SQL spesific. The original article pointed to SQL as a sort-of poison-pill that prevented the development of proper relational modeling.
As a simple example, it is "impossible" to model "Six Degrees of Keven Bacon" entirely within an SQL query, or any "SQL Database". Oracle has an extension that lets you to walk a parentage tree (c.f. that "PRIOR"(?) stuff).
There is nothing, however, that limits us in the Relational Model itself that prevents the simple entity relationship "Movie ->> Actor" from providing us with all the information we need to find do the six-degrees operation.
We just can't do that operation if we think of the entity relationship as a table (directed vector) the way all of our tools have taught us to think.
You have, like the rest of us, learned to think about the subset of the model available in your tools as if it were the model itself. You think of the entity relationship as existing in a vertical table. In so doing you have deprived yourself of horizontal associations and N-dimensional bindings.
Six degrees of Kevin Bacon is a rooted tree; a mesh floating in space "anchored by" Kevin; once you really understand that both "Movie" and "Actor" are fully valid Candidate Keys. If you do a depth-first traversal of that tree following Movie to Actor AND Actor to Movie as a fully fleshed out "magically complete Relational Database" would allow, the problem becomes trivial. Further, there is no reason that the mesh needs to be anchored with Kevin; any node would do.
The problem is that everyone on the planet has been conditioned to see those table thingies as the core relationship even though the table isn't actually central to the model. The model does offer up the table as a natural core representation, in that we are predisposed to understand lists and any valid group of entities can be listed. But mistaking the representation of the data as a list for the constraint of having to access the data only in list-order is a mistake.
Since the tools constrain the thoughts, you tend to become blind to larger possible associations. You know that you *COULD* write a program to build the Six-Degrees mapping, but it would be a pain so you look for other ways to solve the problem.
So we live with the core abstractions, and every now and again someone tosses in an extension like "PRIOR" or "soundex indices" and we all update statistics for our query optimizer from time to time. But the relationships still exsit in our heads instead of our data base engines and the subtle data is hidden from all but the most subtle operators.
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
The simple reason for NULL being an integral part of relational databases comes from the foundations of the concept: relations are sets, and the whole idea is based on set theory.
A table is a set of rows, and a row is a set of attributes. Every set contains the null set as an element. Therefore, without changing the relational model to be based on some other premise, null must exist.
Granted, there are some problems, but it is what it is...
ascii art
If you need more help, please ask whoever made your sig. (If you made your sig, then change "Please learn how to make links" to "Stop being such a lazy bastard.")
Finally, your article at that link contains numerous grammatical and spelling errors (e.g., "Here is a How to Topic Maps", "Bare with me", etc.). Based on your URL, English is not your first language, so you should ask someone whose first language is English to proofread your articles (although, based on what I've seen on here and on other web sites, even people whose first language is English seem somewhat challenged in the grammar and spelling departments).
Mexico does not have states
Actually, it does. In fact, I have seen it referred to as "The United States of Mexico" in places.
- They're sort of like zero, except that there are already perfectly good notions of zero there to use already.
- Alternatively, NULL indicates that something is "unknown."
The fact that these two treatments are not used consistently either within or across products means that the use of NULLs tends to make it difficult to predict application behaviour.For instance, MySQL generally takes the approach is that NULL is "zero," and typically can't tell the difference between the two.
It would be reasonable to interpret, from that, that any time a NULL enters into a calculation, it should be treated as a "contagion" that makes the result unknown.
Thus, if field "quantity" contains any nulls, then select sum(quantity) from some_table should return NULL .
The more you use NULLs, the more trouble they tend to cause.
If you're not part of the solution, you're part of the precipitate.
Really what does it do that you can not do in SQL? New is not always better. As to more efficient that is a big maybe. The only real way to tell would be try it in a real world application.
I really worry that XML is becoming the univeral hammer and the world is starting to look like one big nail.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Every set has the empty set as a subset, not as an element.
Declaring SQL's concept of NULL to be an element of every set is problematic, because equality won't be reflexive on any set. (NULL = NULL) is always false.
SQL NULL is an unspecified value. I don't know of any analogue to it in set theory.
I've read the article (maybe i've misread it but its 8:30am and a saturday:)) and although I can see lots of bashing of SQL, XML and nulls going on I can't see any 'alternative' or solutions suggested.
----- I refuse to have an argument with an unarmed person
"Maybe a directory would be better"?! This is throwing the baby out with the bath water. Never mind that the industry supposedly figured out that hierarchical data stores where a bad idea 30+ years ago (yet they still persist... excuse the pun), changing data models is not going to solve the age-old problems concerning missing data. These problems have to do with fundamental logic and predate computers.
The more rigid relational design (with dead people in a separate table from all people) has several advantages including presenting clearer meaning... not just to the user but to the system. Note that the base tables involved can always be re-combined in a view to hide the "complexity" of having multiple tables from the logical perspective. Introducing nullable columns shifts what starts as a design problem, back to an implementation problem. In the case of the death date in the person table, the person table can no longer be "trusted" to state a single thing and must always be tested for what it means (is this person dead... if so).
As for performance... these are entirely logical issues so how can we even address performance without talking about a specific system. This said, lest someone assert that *no* DBMS will perform as well in the join case, let me note that: a) the assumption is that in general, users want to see the death information for a person, if the converse is true (and it probably is for all but the most morbid of applications) then retrieving only the person information will perform better on most systems; b) I have seen at least one DBMS with an architecture that allows it to performs just as well when joining tables as when selecting from base tables.