An Alternative to SQL?
Golygydd Max writes "Dave Voorhis from the University of Derbyshire has developed a program incorporating Tutorial D, a language designed to overcome of the shortcomings of SQL, and developed some years ago by Hugh Darwen and Chris Date. Until now, no-one had done anything with it but Voorhis is hoping for wider adoption; although we think it would be like pushing water uphill though." Update: 10/13 12:43 GMT by T : An anonymous reader writes "It's being picky I know, but the university in question is in fact called The University Of Derby, not Derbyshire."
Who remembers "Knowledgeman", that database language of 20 years ago which got eclipsed by dBase???
I really, really hate SQL. I have never found a way to generate SQL queries programmatically when there are many tables (and joins) involved. It quickly becomes quite hairy. But writing the queries explicitly causes massive code duplication.
I'd love some replacement for SQL that is suitable for dynamic query generation.
The only other solution I have is to query all the data and use only what I need, or to do a lot of little queries and join the data in my application.
Ever heard of TSQL? Neither would have I, if I hadn't been forced to read about it in college. It would seem that there has been a huge number of variants of SQL over the years that have tried to make it "better." The benefits just never seem to outweigh the cost of learning a new language.
For those of you that haven't been assimliated into the borg, microsoft's new version of SQL server accomodates for a new query language called XQuery which takes a lot of the best parts of XPath and XSLT and combines them and obviously the underlying framework is XML. This will cover a lot of the shortcomings over Transact SQL for those that are willing to adopt it, and honestly, it's really not that bad.
Have you ever seen a 25 way join or a 30 way UNION? I've seen queries that go past a given RDBMS's 32k query size limitation. Even worse, I've seen the code that GENERATES these horrendous queries. It's like seeing your parents having sex; it changes your life forever.
Please, please, there must be a sane way to query data from a highly normalized database.
Comment removed based on user account deletion
I agree, it seems more experiment based than anything. Its a good idea as well. We all know SQL has shortcomings (hey nothing is perfect) and the OO database concept seems to have died a death. There are a lot of ORM tools and persistence frameworks coming into mainstream adoption now to avoid SQL in general day to day development. Several of these also have their own 'query language', so why not have another look at the root language.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
It might be logically 'perfect' in terminal of relational math, but as a language, it's obtuse. The only book I could find on Bi Temporal database design ended up with all the tutorials written in this Utopian/Acadamian SQL language... holy crap was that annoying.
The reaons for wanting a change from SQL I agree with, but Tutorial D you'll never catch me using.
We need something to combine the power and speed of SQL query syntax with the nested filtering ability of XPath, yet doesn't require the entire DOM to be in memory to work.
-Malakai
A Dragon Lives in my Garage
I am not sure what's incosistent about the syntax you mentioned, but maybe that's just me. Though I'd be interested to see in what ways it is "very limited" (especially if those aren't the limitations of a particular databas engine, or relational databases in general).
sic transit gloria mundi
Comment removed based on user account deletion
Prior to PostgreSQL, the Postgres database
was based on another query language other than
SQL.
One of the main reasons why
PostgreSQL took off after this was that
it changed over to SQL in response to
community/industry request and requirements
(and openness to community contributions),
bringing it in line with the industry
standard.
I suspect that the next evolutionary step will
be to allow multiple query languages to be used
on the same DB engine.
Usually when you need to write queries for big databases, speed is a concern, so the lower the level, the better. I've never seen a GUI which could write SQL queries as well as I do.
Additionally, a high level interface is unable to undestand where a query can or cannot be optimized, but I can. For example: there are cases where queries have to be run on a regular schedule to update special optimization tables. These optimization tables are then used when user generated query (e.g.: from web input) comes, so that the user doesn't have to wait for the database to complete that subquery which could have ben run sooner. Only low level can give you such a control.
Small databases can well use high level interfaces, but those aren't the ones driving the standards anyway since the work is already easy for them. User-friendly interfaces such as Access, Query Builder, and crap like that already provide the required high level for the non-techies.
If anything comes to replace SQL, I think it won't stand standard long enough as vendors will start adding more and more odd extensions, so the story will probably repeat. Personally I don't care much about the language databases use as long as I feel in control and the general concepts of relational databases remain the same.
I was about to say SQL is like COBOL, but SQL seems to be even more persistent so perhaps it's not as flawed.
This sounds more like a job for an XML database than a relational one - tree structures rather than fixed relations would allow you to do that.
Servlet v2.4 container in a single 161KB jar file ? Try Winstone
Compare the symbolic forms:
Example, theta join
And the implementation in SQL:
SQL join example
Specifically in Tutorial D (and hence Rel) you would do this:And subsequently do shit with T1. That's it.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
I looked at the articles, which are too broad by the way, and I didn't see enough of an explaination to make out real detail.
However, it seems as if Tutorial D is nothing more that a purely mathematical improvement of SQL. In other words, it's like comparing Lisp to Prolog or ZF Set Theory to HBG. (Okay, I'm not a logician..., so you math wizards can hold off on the flames.) I'm all for improving SQL. It just seems like Tutorial D addresses the usual complaints about the hazy concept of a NULL in a database.
SQL or any derivative thereof will be inherently complex. This is because SQL is merely an implementation of Relational Algebra. That's the key. Real RDBMS's are inherently mathematical in nature. Complex SQL queries are tough. Too often the average database programmer couldn't even tell you what the definition of a set, function (i.e. the mathematcial definition), or a cross-product is. Yet, they'll write code (or not) that incorporates complex joins. Incidentally, they'll do the same thing with regexs too. Programming still has it's deeply mathematical roots. Not as much as I'd like, but still...
In short, if you're looking for some cool new English Query tool to save the day, you won't find it here. Still, if you're a SQL wonk looking for a new toy or an improved SQL, this may be of interest to you.
Oh, by the way, I used to have a copy of Date's database book. I sold it along with my copy of Foley and Van Dam, Sipser's Theory of Computation, and Aho and Ullman's compiler design book [the "dragon" book] because I needed to pay the rent. Interestingly enough, the books retained a pretty high resale value even after all these years.
What do you mean my sig is repetitive? What do you mean my sig is repetitive? What do you mean....
Even worse, I've seen the code that GENERATES these horrendous queries. It's like seeing your parents having sex; it changes your life forever.
Amen, I HEAR you, brother!
I've had to write "dynamically generated" queries - and they are a TOTAL MESS, with lots of crappy if statements and string appends...
300 lines of code to properly parse a search page response, to generate a 20-line query that executes against 5 tables.
Ugh.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
[...]
Null was meant to take the place of all the hack jobs that were used in older databases to signify non-value, NaN and so on.
Yes, and this is a big part of the point of D. The authors assert that the way that current DB structures are designed leads to bad design- that there shouldn't be nulls. They have a lot of good reasons for this belief, and it is true that SQL leads people to rely on nulls in cases is a sort of synergystic nightmare of lazyness.
(1.08X10^8 Clueless points to the first person who says they don't use nulls, and instead set everything that is be null to ''.)
The language goes hand in hand with designing databases differently, without the influence of SQL.
I forget what 8 was for.
More to the point, RTFM, i.e. Date's book, which explains what the relational calculus is, and thus what relational databases are all about.
At the very least go over an introductory text on set theory and prepositional logic so that you might at least have a basic understanding of what a database query even is.
Before that about all that can be said to answer OP's question is that SQL is mathmatically incomplete and incorrect, i.e., it gives the wrong answers.
It amazes me that people got all worked up over the Pentium rounding error when it wouldn't even effect most people, but businesses are willing to put up with provably far greater flaws in software that affects everyone's data, and with nary a peep of complaint about it.
KFG
I have been working on an SQL alternative myself for a while. My approach is more functional (not procedural) in nature. If the language is designed based on this, then new operations can be added without having to add to the syntax of the language. This would help vendor compatibility because if vendor B does not offfer an operation that vendor A does, then a DBA can simply add a clone of the "function" without tweaking the language parser.
Tutorial D uses infix notation, which tends hard-wire operations to a syntax parser. Prefix (functional-style) is more flexible, consistent, and easier to parse. For example, new parameters can be added to prefix without changing existing calls. It is just an extra, perhaps optional, parameter. It is hard to do with with infix.
My relational replacement would also make it syntactically easier to perform relational operations on things such as column name lists. The column list is simply a table in its own right (perhaps with syntactical shortcuts); thus it can have table operations (relational algebra) done on it just like tables. It is "conceptual reuse" you can say.
Table-ized A.I.
We (BitMover, the people who produce BitKeeper) are building a database on top of our SCM system (yes, that's a little weird but it works out well).
/xyzzy/ {print $ID}' /rob|lm|wscott/ && $STATE =~ /open|assigned/ {print $ID, $SUMMARY}'
We played around with straight SQL syntax, flirted with ruby as the syntax, and ended up with awk (actually nawk). It was the best balance between simple enough for normal people to understand and powerful enough to get the job done.
So we can do stuff like
bk db -s'select from bugs
bk db -s'select from bugs where $OWNER == "rob" {print $ID}'
bk db -s'select from bugs where $OWNER =~
etc. The where clause and the print clause (print is implied) are full on awk scripts, do whatever you want with them.
We're big fans of not reinventing the wheel. Awk is a little weird but it's 95% of the way there and a few tweaks made it perfect for us. So why invent a whole new language? Especially when all the awk code is about 100K of text on x86 and includes a full on regular expression library.
It would be perfectly possible in a relationally complete language that includes a relational MINUS operator, without NULLs entering into it at all. And Tutorial D, on which the object of this article is based, does of course include such an operator. Whether NULLs are desirable or not is a matter of ongoing raging debate. I've found them easy to avoid, and queries of all sorts easy to understand without them.
Lisps allow usage of SQL-like syntax inside programs without stuff like "SELECT X FROM Y WHERE Z = ? ORDER BY A" or "SELECT X FROM Y WHERE Z = '" + something + "' ORDER BY A". Look here and here.
Maybe try an object-oriented database....Relational databases aren't the only game in town
Object DB's don't have enough solid theory behind them. There is too much inconsistency and lack of consensus in the attempts to formalize them.
It may be because one man, Dr. Codd, created relational theory, and thus there is a narrow set of principles to follow. OO has yet to find a Great Consolidator of similar caliber. (I am skeptical it will happen, but won't rule it out.)
OO-DB's tend to resemble the type of systems that Dr. Codd was actually trying to "fix" when he came up with relational theory. Thus, OO-DB's tend to seem "primitive" to relational fans.
Part of the problem is that database themselves run counter to "pure" OO encapsulation. Encapsulation dictates and/or allows each entity to define and handle its own operations. This creates at least two problems:
1. There is no guarentee that the operations will be compatible or consistent across classes. One class might have a Find operation, but another call it Search, for example.
2. Encapsulation requires that a single entity (class) handle its own implimentation. However, automated optimization requires the implementation be consistent and known to something outside the entity. In other words, implementation efficiency coordination breaks encapsulation. Objects can't be selfish about their implementation.
Attempts to solve these tend to make it turn into relational in the end anyhow, making it seem futile to keep OO-ness.
Table-ized A.I.
Prolog! Logic languages are well suited to relational data, where a table maps to a predicate. The logic programming community spent a couple decades trying to convert everybody from SQL and nobody listened.
Funny you should mention that. I'm now working on a program that takes user queries (from a Perl program), converts them to Prolog, and finds the solutions against a SQL database.
It seems to work great, so far.
> Usually, IMO, performance tuning, when it requires altering the storage schema, is done using one of two methods: > 1) Denormalization > 2) Higher normal forms > 3) Summary tables I agree with you if you mean that these are the tricks the (optimal) RDBMS could use "under the cover" to optimize the access plan (making them "transparent" to the application). Let's take the "Denormalization" as an example: we have two entities A and B that are loaded once a day with a batch. We know that they're mostly joined together in the queries. My idea of an "optimal" RDBMS is that I can define the two entities and the application can query them individually or joined together. Then I decide to start the "tuning" phase and I give the RDBMS an "hint" to change the physical layout of the data and keep them just in a single "data store". I keep inserting tuple in A,B entities and the RDBMS would keep the "abstraction" of two indipendent entities. The joined query (that is the more frequent) can get the data from the 2 entities from the same blocks. If the application evolves in a way that the individual queries become more frequent I'd have to "revoke" the hint and separate the datastores WITHOUT changing the application. The SQL RDBMS are beginning to have these features (what I describe here is more or less the Oracle CLUSTERED TABLE) but on other databases the solution is to change the logical schema when they just want to change the phisical layout of the data.
This message doesn't need a sig
Am I missing a requirement? This seems like it shouldn't be an issue.
::jafomatic
A couple valid points here and in the other few posts.
;)
I'm not an academic, and I do write SQL a lot. Actually, I end up trying to write as little SQL as possible, by insulating it into small functions, but I'm sure many do the same for portability and plain old good form.
As for NULLs, the example of an empty street address is best represented by ''. Because the upper layer (the client used to fill in the values) won't make a distinction between '' and NULL.
In most cases, I simply ignore NULLs, and force propper default values - I'm sure most do the same.
The question we should be placing is "is there a better way"? Is there room for improvement? And please don't bring XQuery to the equation, that's something driven by need, not by carefully designing a solution.
At least fostering discussion and looking for alternatives, we might be able to achieve something.
"I don't mind God, it's his fan club I can't stand!" E8
My point is, basically, this: C was designed by programmers who wanted to create a versatile, powerful language that would get the job done, while Pascal was designed by a theoretician who had a fanatically pure vision of what a programming language should be ... and as you say, people like C, and it shows, which is why the bulk of programming today is done in C or one of its descendants, while Pascal is essentially a footnote in computing history at this point.
SQL vs. Tutorial D (or any of the other "more correct," i.e. closer to the relational calculus, DB languages that have come and gone) strikes me as much the same sort of thing. FWIW, I do like SQL, and in much the same way as I like C -- I may grumble at its limitations and weirdnesses, but by and large it gives me an intuitive, effective way to get the job done.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.