How Would You Improve SQL?
theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"
Right now there is no standard way of maintaining indexes. Most databases have some sort of CREATE INDEX query, but it is by no means standardized.
Wonder what the public key field is for?
Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.
Bogtha Bogtha Bogtha
You can't write poetry in SQL. So it remains an inferior language compared to Perl.
(yes, well, I ran out of ideas)
May contain traces of nut.
Made from the freshest electrons.
If you want to get an idea of some cool SQL improvements, check out theq /
http://msdn.microsoft.com/netframework/future/lin
LINQ (Language Integrated Query) project for c# 3.0. Some cool stuff tht i never really thought about.
For example, their select statements go backwords ie from table, select column1, n2, n3 etc... Seems kinda wacky at first, but it makes sense since you really should know what table your'e selecting from before you specify the columns.
ex.
public void Linq3() {
List products = GetProductList();
var expensiveInStockProducts =
from p in products
where p.UnitsInStock > 0 && p.UnitPrice > 3.00M
select p;
Console.WriteLine("In-stock products that cost more than 3.00:");
foreach (var product in expensiveInStockProducts) {
Console.WriteLine("{0} is in stock and costs more than 3.00.", product.ProductName);
}
}
Top 10 Reasons To Procrastinate
10.
If this is your main problem with SQL, then you have other problems as well. Who in their right mind needs a table with 100 columns? If you have 100 columns, you seriously need to normalize your database.
Ok, I might not be a database buff. Actually, my experience with SQL is purely academical (although I've worked with object-oriented databases). But if I were to improve SQL, my attempts would be in the direction of making it into a more pure mapping of a relational database, not in adding yet more syntactic sugar.
Did you know there have been people working on a general algebra for data management for, what 40 years now? Did you know, this is basically a SOLVED PROBLEM? Ever heard of "D"? Or Tutorial D? The Third Manifesto?
.. it tells the computer what you want, it doesn't tell it how to do it and in what order, and 2) algebraic notation is *completely general*. You can nest arbitrarily with parentheses, and you can clearly see what's a variable and what's a value and what's an operator. Easy to create, understand, and *optimize*.
You know, I totally understand why Fabian Pascal is always pissed off.
Here's something for you to chew on:
Why do programmers write this:
A + 3
when they want to add 3 to A? Why do we not write some lovely crap like:
OPERATE ON A WITH 3 USING ADDITION
why do we write:
(A + 3) * 2
and not:
OPERATE ON (OPERATE ON A WITH 3 USING ADDITION) WITH 2 USING MULTIPLICATION
Why do we do that?? Because algebraic notation is 1) declarative
Do you agree with me that the verbose syntax clouds your thinking? Keeps you from seeing the underlying operations? Makes it difficult to apply the basic algebraic skills you learned in high school? Makes it difficult for the compiler writer to do the same?
Now I ask you, why do we write:
SELECT * FROM Order
and not
Order
Why do we write:
SELECT * FROM Order JOIN OrderItem WHERE Order.order_id = OrderItem.order_id
and not
Order JOIN OrderItem
And here you are, dwelling on some little detail about projecting columns.. this is an easy one: use an "ALL BUT" operator for example:
RelvarWith100Attributes ALL BUT (Attribute100)
Once you see that relational algebra is just values, variables, and operators nested in any arbitrary way, just like arithmetic, you have opened the door a little more to understanding the fundamental theory of data management and how backwards and primitive "modern" data management is.
And let's not even get into all the crap that SQL gives us like duplicate rows, NULLs, brain-dead table-oriented storage, lack of 100% updateable views, lack of arbitrary constraints, (often) lack of composite types (why the hell do we splat objects into MULTIPLE COLUMNS?? They should be stored in ONE column). SQL also confuses logical and physical layers (keys vs. indexes), and has basically kept the database industry in the dark ages for decades now.
So the answer to your question is pretty simple: I would ditch SQL and use something that looks like relational algebra, which has been understood and documented for a probably longer than you've been *alive*. No offense.
I'd *love* it if "group by *" (or some such syntax) would just group by all the items in the select in order... It's so bloody tedious to have to re-write each item being selected for the group clause.
"Ignorance more frequently begets confidence than does knowledge"
- Charles Darwin
I think a solid grasp of English syntax would be helpful before moving on to SQL.
To respond to his example, though, I doubt there are many situations where excluding a handful of columns would be particularly useful. Any query including that many fields is almost certainly taking place within a program of some sort, which should be able to ignore extra columns pretty trivially. I'm sure you could come up with some tortuously Goldberg-esque situations where such syntax might be nominally handy (I've thought of a couple myself), but I suspect on the whole it wouldn't be worth the extra complication to the parser.
Of course, that's not to say SQL is flawless. I just don't have any brilliant ideas off the top of my head to improve it that haven't already been implemented.
In what way isn't it relational?
"It's too bad that stupidity isn't painful." - Anton LaVey
It would be great if there was a common language for store procedures. The RDBMS may support many languages, but that scripting language would be available on most platforms since it was part of the SQL standard.
Another feature missing is optimization hints. Oracle uses special SQL comments to hint to the database how a query should be optimized. Other databases have their own syntax and methods. There should be a way to set those hints in standard SQL.
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
I'd replace it by a special-purpose Lisp, and compose it like s-expressions. Mix and match query elements in a flexible manner, yet never run the risk of injections, because it all happens in a structured way. I've done things like this on a small scale (contact information database), and it works really nicely.
Please correct me if I got my facts wrong.
In case you are using an object-oriented programming language, you may ask yourself if SQL really is a good choice. SQL is not object-oriented, it's use is usually neither typesafe nor compile-time checked in OO languages, it can't be refactored from an object-oriented IDE and it does not follow OO principles.
e papers/
q /
Food for thought:
Native Queries
http://www.db4o.com/about/productinformation/whit
DLINQ
http://msdn.microsoft.com/netframework/future/lin
db4o - open source object database for Java and
No standard way to extract/load data. No standard way to get all tables in a database. Basically DDL is entirely separate and each type of database has its own way of doing things. Let's not talk about embedded SQL and optimizing queries (like Oracle hints.. *ugh*).
In what way isn't it relational?
.. I think this is related to lack of table equality.
1. SQL syntax doesn't look like relational algebra (see my long rant above). This clouds thinking and hides the simplicity of the underlying model.
2. Relations are sets. SQL allows duplicate rows, so its tables aren't sets, and therefore aren't relations. (This property alone is enough to make it "not relational" by the way).
3. Relation *attributes* (the column names) are also sets. SQL allows columns with the SAME NAME in a query result!!
4. SQL has no "table equality" operator. You'd think the first operator you'd implement for a data type, especially a fundamental data type, would be equality! Imagine a programming language with no integer equality for instance.
5. Relations require each attribute to be drawn from a single type or domain. SQL allows NULLs, which are values not drawn from the column's type. And SQL gives you very little to help you work without NULLs. To add insult to injury, the default for columns is NULLable.
6. (related) Relational algebra requires boolean logic. SQL uses three-valued logic because of NULLs. And it uses it *inconsistently*.
7. The relational model does not specify a type system, it just requires one. Yet SQL specifies it's own particular type system (integers, chars, etc). What if you want to store XML or audio in one of your columns?
8. The relational model specifies nothing about physical implementation. Yet, almost every SQL product stores the columns of tables "together" in such a way that makes joins needlessly expensive.
9. SQL distinguishes between "base tables" and "views". The relational model requires them to be indistinguishable to the end user. Specifically, most SQL implementations don't let you update views! Pretty unbelievable. Imagine a programming language that didn't let you pass arguments to any function for instance.
10. SQL lets you do meaningless things like multiply the primary key values of two tables or add a weight to a height. This is related to the type system issues.
11. SQL confuses KEYs (logical) with INDEXes (physical implementation).
12. (This one gets me all the time) SQL has an EXISTS operator (is this statement true for at least one value of this result?) but not a FOREACH operator (is this statement true for all values in this result?)
13. SQL implementations don't have ANY brains whatsoever. They don't know that book_id from column A and book_id from column B are equal in a join, and that you don't need both of them in the query result. They don't "look inside" your CHECKs and foreign keys to deduce information about your database and use that information to optimize queries.
I'm sure if you picked up a basic theory book you'd find plenty of other nitpicks for the syntax, the semantics, and the basic underlying model of SQL.
And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.
If you are interested in learning more, get Date's O'Reilly book "Database in Depth". It's very short, roughly 200 pages, and tells you all you need to know about data management theory.
I agree. One of the great (potential) advantages of relational data bases (unlike previous horrors like IMS -- shudder) is that they had a theoretical basis in relational algebra. SQL loses some of that advantage.
Chris Date has written a lot about the deficiencies of SQL (e.g., how joins work). Check out his book Database in Depth, published by O'Reilly.
1. Standard date functions and handling.
2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.
Bad User. No biscuit!
It always bugs me that you write UPDATE and INSERT totally differently. I'd much rather see them essentially the same (update obviously would need a WHERE clause). Not a big deal, I admit, but it makes my life harder.
Fundamentally, though, what we need is much better interfaces to our applications. Having to convert C++ data into some format appropriate for SQL and back again is a pain in the behind. Every existing interface between the two that I've seen is crap. Heck, most of the C++ interfaces don't even let you use the std::string type. What are we, back in 1995? Forget vectors and maps, or the fancy boost multiindexed templates. Anyway, these really aren't problems for SQL to solve. And yes, there are object-oriented databases and the like. But relational databases are still pretty much universal. I just mention it because it is a pain in the behind.
Oceania has always been at war with Eastasia.
Perhaps I'm ignorant to some important detail of databases, but I feel that it would be much easier to work with them if the server could represent the relationships between the tables in some meaningful way.
I used up all my sick days, so I'm calling in dead.
Out of all the annoying issues, I've pulled out the most hair over unique id's, and INSERT vs UPDATE.
Most SQL implementations give you some way of assigning unique id's to newly INSERT'ed rows. It would be nice if there were a standardized way, but that's a side issue. Once rows have unique id's, you can identify rows to be updated by id. This is very fast and simple.
Except .. in order to find out what id the DBMS has assigned a row, I usually have to follow my INSERT with a SELECT, to read the id column. Slow and annoying. Sometimes the DBMS I am working with takes a few seconds to perform the INSERT, and ten minutes to perform the SELECT. That takes it beyond an optimization issue, and into a workability issue.
Also, if I do not yet know if a data record has been INSERT'ed, and I need to either UPDATE the existing record or INSERT a new one (say, with just a new timestamp), then I need to either attempt an UPDATE and then fall back on INSERT if the UPDATE fails (ew!) or attempt a SELECT and either INSERT or UPDATE depending on whether it returned any rows (ew!).
If SQL came up with a standardized way to associate unique id's with newly INSERT'ed rows, it would be very, very nice if the id column(s) assigned were returned to the client in the same packet as the message confirming that the INSERT succeeded. Nearly zero additional overhead, neat, fast, and easy.
To solve the UPDATE/INSERT issue, I'm less sure. Say, for instance, that I have a daemon which periodically scans the filesystems in a cluster of machines, and it wants to UPDATE the "exists" column of a given row identified by a ( hostname, mountpoint, path, filename) tuple with the current time, if that row already exists, or INSERT a whole new row for that file if it does not exist. Perhaps there could be a "WRITE" command which is just like INSERT but overwrites a row if it already exists? That seems like the wrong solution, too. In the meantime, I play with caches of hashes to unique id's and lose more hair.
-- TTK
My complaint about it when I first started learning sql (ms flavor, only way offered at school at the time) and got to wildcards, where someone writing the spec chose % instead of * and so on. To top it off, I use sql so infrequently (months between writing it, then only a few statements) that this difference bites me in the butt just about every time I try to use a wildcard.
Don't blame me, I voted for Kodos
Actually, it'd be faster if you listed out every column name. If you're talking about faster to write out the code for, you're obviously not writing a query for a program that's intended to be used much. There's absolutely no reason you should be deploying code containing a query that does "select *" or anything like it. You're making the database do the work of looking up the list of columns names every time that query runs. There are much more useful things to spend your caching space on (if you have any).
If you really can't stand to write queries containing the actual column names, you should be using some type of abstraction layer in whatever language you're writing your code in.
If you're not writing code and just making queries by hand to test the results, then you're even further off your rocker. (this also applies in general to the statement you made) Why would you ever NOT want to select that last value out of 100? is it going to keep your output from wrapping? (lol)
Also, those of you saying that you should never have 100 columns in your table, you're certifiable lunatics as well. If you have 100 columns that are used in every record and have very little or no duplication per row, there is no reason you should break this up into multiple tables!!! Then the database has to do joins, which again require more processing power and disk usage. It's also hard to maintain multiple tables when you really have one table after you normalize it.
For those of you that say this isn't normalized... I'm not even really sure how to answer that.... If you have several tables all with a strict 1:1 relationship, they should be in ONE table. Anything else is considered denormalized, not yet normalized. (aside from being just plain BAD!)
For those of you that say you'd never need that many columns in one table or split across multiple tables, however you'd like to think the world should work. I have an example of just that. My wife does genetic research, primarily statistical analysis of sequence data (in various forms, but that's the easiest way to sum it up). We've had discussions on this particular topic, where she had been told by someone else that she would get better performance in Oracle if she split her one table into several tables containing a smaller number of columns, each.
This is just simply not true. It also is a perfect example of a situation where you would actually need a large number of columns. There were specific bits of data that needed to be looked up quickly (like, 45'ish). You can't store it all in one column (or even just a few) and use regexes to find the bits you're looking for. You also don't want to be doing a lot of joins unless you really need to, you know.. when you actually have data that would fit into some form of normalization. Technically, you CAN do this stuff, but not if you want decent performance. If you didn't want decent performance, you could just leave the data in a text file and shell out a grep command. *sigh*
Anyway, enough ranting, but seriously people... Get a clue. Get some experience with these issues. Don't just pipe up because "hey, I've worked with databases and while I probably don't understand them very well, I don't know anybody else that understands them at all, so I'm kind of an expert!"
I think the basic problem with that is what if you got two tables, let's say, logins and projects.
logins had an id and a name
projects has, among other things, submit by and coded by, both of which reference the logins table.
So if you do default joining, which field should it join by?
If you limit the default join to when there's one and only one reference to a given table, then existing code may break once you add a new reference.
And lastly, what if you really want a cartesian join? Granted I've needed this about twice out of a very large number of queries I've written, but it my usage doesn't necessarily reflect everyone else's.
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
Lets look at something a little more realist:
c e.InvoiceNum,
SELECT
Lease.LeaseNum,
Lease.LesseeNum,
Invoice.InvoiceNum,
Invoice.AmountBilled
FROM
Lease INNER JOIN
Invoice ON
Lease.LeaseNum = Invoice.InvoiceNum
WHERE
Lease.LeaseNum = "1234"
ORDER BY
LeaseNum, InvoiceNum
Okay, that's pretty big to get some basic lease and invoice info. Now how you you write that?
Lease.LeaseNum,
Lease.LesseeNum,
Invoi
Invoice.AmountBilled
Lease JOIN
Invoice ON
Lease.LeaseNum AND Invoice.LeaseNum
Lease.LeaseNum = "1234"
Lease.LeaseNum
Invoice.InvoiceNum
??? All that's been accomplished is the removal of key words. I'm not seeing any benefit, and I'm seeing the pitfall of it being hard as hell to read.
-Rick
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
In the world of J2EE you can use container-managed persistence on entity beans (a flavor of EJB). Such object will handle the interaction with the database for you so you don't have to worry about SQL. And it will also address transaction issues so no need to fiddle with manual rollbacks.
With CMP, instead of fooling around with SQL queries you deal with business objects. Why bother with "update inventory" or "insert into cashRegister" when you can call methods on objects, like inventoryItem.substract() or cashRegister.feed() ? This provides you with a good layer between the application and the database, so whenever one has to change you don't have to mess with both. All you got to do to have all this magic is to keep some XML configuration up to date, and if you have the big bucks you can even get JBuilder to do it for you!
SQL is sooo yesterday. Stop wasting time on technical issues: add business value to your applications instead. And with JBoss you don't even have to pay for a good EJB container. (Of course you'll need a good CPU and a little more RAM, but hey, magic has a price!).
lucm, indeed.
You're missing the point. You really have to study the theory, and you'd get something like this (the exact syntax is unimportant of course):
((Lease JOIN Invoice) WHERE LeaseNum = "1234")
[LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
ORDER BY whatever
Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.
Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.
Why do I leave off the Theta from the join (the equality test)? Because I *already* set the foreign keys on those tables. The DBMS should be able to *deduce* which columns to join on and generate an error if it can't.
Here's what's happening:
I used the JOIN operator on two base relation values, stored in variables with the names Lease and Invoice. I got a third (anonymous) relation with all of the combined columns.
I then applied the RESTRICT ("where") operator on that relation, along with a boolean expression, and I got a fourth relation with just the rows where that expression was true.
I then PROJECTED that relation to get a fifth relation with just the desired columns.
Note that ORDER BY is a non-relational operator. It turns the relational result (unordered set) into an an ordered array. So for a final step, the relation was 1) turned into a regular array and 2) ordered.
Of course I didn't actually perform those steps. I told the DBMS what I wanted, and *it* did the work.
Do you see how the SQL conceals the underlying algebra? And how it makes YOU do the work (in the join for instance)? You might not see it. Study the theory more, and you will. Compare with the A+3 example. Imagine a 12-table join. Imagine having to do a query like "where all rows of table X are a subset of all rows in table Y" right in the middle. An algebraic notation would make this MUCH easier. Just break it down, and apply the next operator to the result of the last. You do it all the time when programming your favorite language, why not in SQL?
And yes, the differences get MUCH deeper than this. SQL can't even represent all possible relational queries!
I have yet to encounter a DBMS that didn't have an efficient, straightforward way to get the ID after an insert, but YMMV. However, the INSERT/UPDATE issue is a fundamental syntactical problem and it really should be fixed. INSERT and UPDATE do almost the same thing, yet have completely different syntax.
INSERT INTO someTable (fld1,fld2) VALUES ("foo","bar")
UPDATE someTable SET fld1=foo, fld2=bar
It is REALLY annoying when you have to write some code that generates a SQL statement because you must code for two completely different syntaxes. Someone replied about the Oracle MERGE which seems like a nice way to go.
Fortunately though, there are lots of good frameworks around SQL that make it so that writing SQL is becoming a thing of the past. I would like to see SQL treated like HTTP - nobody writes HTTP. It's a protocol. Let it be. Just use the tools. I guess it will never go away though...
I feel lucky to have not been burned by this, but you can bet a lot of people have. If you really want to delete everything, you should have to type something like "delete from table where true" or "truncate table".
The SELECT syntax is usually good enough for me, but the capabilities of most implementations leave much to be desired. They only support a small subset of what the syntax suggests.
And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.
This is what I thought of before even reading your second sentence:
SELECT Customer.* FROM Customer C
WHERE (SELECT COUNT(DISTINCT O.ProductId)
FROM Orders O
INNER JOIN Products P1 ON O.ProductID = P.ProductID
WHERE O.CustomerID=C.CustomerID
AND P1.ProductLine = 'X')
= (SELECT COUNT(*) FROM Products P2 WHERE P2.ProductLine= 'X') )
And I have never studied relational theory formally; only SQL (and, I admit, Joe Celko's posts).
SELECT * FROM whatever the hell my customers have in mind
Go hug some trees.
Not for those who haven't learnt data fundamentals.
By defining and manipulating relations.
It is not. Lisp is, Scheme, Haskell, not SQL. Never was, nor intended to be.
SQL tables are not relations. The very words relation and relational have been dropped from the ISO SQL standards since 1999 at least.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
SQL tables aren't relations, SQL data types are way too limited, NULLs are broken, classes are equated to tables, there are pointers... the ISO SQL standards themselves have long given up on even using relational terms.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.
Result, C naming syndrome.
LeaseNumber and Lease.Number are functionally equivalent, they're both unique, however the latter lends itself to convenient simplification in contexts where it wouldn't create ambiguity. Otherwise you end up repeating yourself a lot. LongTableNamePrefixFoo, LongTableNamePrefixBar, LongTableNamePrefixBaz etc.
Perhaps, instead of replacing SQL, let's just add a clause to the update statement (or insert statement):
insert into table tabname (field1, field2) values (1, 2) [ WITH UPDATE [ALL,FIRST,ADD,NONE] ON FIELD1=1 ];
Thus, if the row exists with field1=1, it will update that row to with value field2=2.
If there are multiple rows where field1=1, define either update all of them (uses 'ALL'), update the first one found (uses 'FIRST'), insert one more (uses 'ADD'), or updates none of them (uses 'NONE').
Alternately, you could add this syntax to the UPDATE verb:
UPDATE TABLE tabname set field=1, field2=2 where field3=3 [ WITH [ NO ] INSERT ]
This would do the select for field3=3. If one or more records were found, update per request. If no records were found, insert a record with field1=1, field2=2 (field3 would be NULL since it's not specified in the 'set' subclause, only in the 'where' clause).
This solves your problem, provides two interesting methods.
Unitarian Church: Freethinkers Congregate!
I've never been a fan of the join syntax. 'Inner' and 'outer' and 'left' and 'right' and 'full' joins? Let's for gosh sakes use the simple set-theory math terms we all understand.
Instead of:
select A.f1, b.f1, b.f2 from A join B on A.f1 = b.f1
do:
Unitarian Church: Freethinkers Congregate!
USE Database;
SELECT * FROM table WHERE column !="99";
Did I get the job? Or for that matter DID YOU?
But I've been getting into MDX lately (data cube technology). I'll tell ya, if you want to do super-fast reporting of hierarchical data, cubes are the way to go. And MDX is a very feature rich cube querying language. It's such a breeze to do a query in MDX that would take all kinds of sub-queries, inner/outer joins and calculated fields to do with SQL. If you're finding yourself wracking your brain with complex SQL for reports you should really check out cubes and MDX. The learning curve will be well worth it!!
The biggest thing needed is the ability to return rows of differing column makeup for object oriented purposes. You need to be able to go:
SELECT FROM base-class
and get back all the fields from sub-classes so that you can populate entire objects from them. Real object oriented databases let you do that.
Most SQL dialects include some sort of exclusion operator.
SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;
or
SELECT * FROM A EXCEPT SELECT "B" FROM A;
Other engines do it differently. I think one of the best things about SQL is that it's a loose standard. You can easily choose the engine that works best for you... unless you are from the Cult of Microsoft (SQL Server). DB/2, Oracle, and even Sybase have very cool features that make queries much more powerful.
While SQL is hard to use at times (remembering double outer joins), it's that way for a reason. You don't want to be as easy to use as VB, for instance. Being forced to think in terms of lists and cartesean products forces you to think about speed and abstraction.
SQL is as easy as it should be, IMO. Specializing the access modifiers will only add to the complexity and make query optimization an impossibility. If you don't care about speed, then your needs probably aren't serious enough for a full blown SQL RDBMS. Text, XML, or even MS Access could be better suited.
Complaining about SQL is like complaining about Linear Algebra. These systems exist for exceptionally good reason. They are constrained to reduce or eliminate unsolvable situations.
Actually Prolog can do anything an RDBMS can do, but transactions (along with the ACID properties: atomicity, consistency, isolation, durability) are missing. It would be simplest to add transaction processing to existing Prolog implementations and kiss the current RDBMS/SQL implementations goodbye.
So what I'm looking for is something like a "select into" where I can override the values of specific columns, perhaps with a lookup from another table. It should also be smart enough to handle auto_increment/identity type of columns so that you don't get conflicts.
If you don't want crime to pay, let the government run it.
The worst part about SQL is that it was designed at a time when there was this notion that you should be able to read computer languages like English.
As we all know, unfettered application of this principle leads to some terrible languages (see BASIC, COBOL). SQL is one such language: every extension, every new functionality that looks "sql-like" (eg, not a user function) has to be integrated into the grammar. Furthermore, although perhaps not oft considered by SQL veterans, there are key words that, in a very real sense, mean the same thing. WHERE and HAVING in a SELECT are such an example. They both operate on intermediate tables (in the conceptual sense) and accept or reject a tuple, but before or after grouping. There are also inconsistencies in the order with which one supplies operations: in the case of SELECT, WHERE, GROUP BY (a two part keyword, of which the second part is context sensitive! AGH!), the key word leads the parameters. But when you have to do a JOIN, UNION, or INTERSECT you are using infix notation.
My conclusion is that SQL-the-language is a veritable hellhole.
An earlier poster mentioned using S-expressions, and I have to say that I am in complete agreement. SQL queries are usually functional in nature, generally not generating side effects (besides triggers). Then the grammar would be more or less zero size and vendors extending functionality would not result in a huge grammar, just a huge library of functions that are called identically grammatically. It would also make extensions easier to write.
Now, why do I suggest syntax free? Simply because syntax is hard to get right. Syntax-free languages have the handy property of being easy to convert to since they so trivially represent abstract syntax trees, and that means that anyone with a good idea, luck, and some talent in language design could mess around in defining their own language.
At the C2 wiki some of us have been thinking about this question for years. Here are some key topics on it, including my pet SQL replacement suggestion:
http://www.c2.com/cgi/wiki?SqlFlaws
http://www.c2.com/cgi/wiki?TqlRoadmap
Table-ized A.I.
and start over. SQl is an abomination. Even well constructed SQL is an UGLY, syntacticly obscene, maintenance pain. Code to generate SQL is usually pretty nasty too bahhhh...
I have decided that prefix notion is superior to infix (as used by Tutorial-D). The problem with infix notation is that not all operators have only 2 parameters. If some have more than 2 params, then you have to either introduce goofy syntax adjustments, or mix infix with prefix. If you use all prefix then you don't have to worry about mixing. Plus, prefix can be implemented as regular sequential functions in most languages so that one can emulate the query language in a native programming language more easily.
Consider the case where you have an operation that has 2 required parameters and one optional parameter. That means that sometimes it is written like:
p1 op p2
and sometimes like:
op(p1, p2)
so that we can have:
op(p1, p2, p3)
when you decide to use the 3rd param.
Table-ized A.I.
It seems stupid to me that insert would have a different format from update.
e.g. (field1,field2,field3) values ('1','2','blah') vs field1='1', field2='2', field3='blah'. Kinda dumb.
Plus why wasn't there a command to "merge/replace/put" rows into a table much earlier?
SQL's badly designed, but looks like we'll have to live with it for decades to come.
Why do I need 99/100 columns? Perhaps a poor example, but the obvious answer is: Because some nitwit built this database and I just have to live with it.
Do you think most SQL queries are executed by people who created the database in the first place, or even have any control over how it is designed? Don't be silly. It's the usual problem with software. Computers are so flexible, and commercial software is so inflexible, that a huge amount of energy is expended trying to get data from point A to point B when there is no direct route. Hence Perl.
Assembly is the reverse of disassembly.
fwiw a few solutions.
in MySQL, the statement REPLACE INTO will perform an update or an insert, depending on whether the primary key value exists in the table. It performs exactly like your WRITE command would.
in MySQL, you can perform a SELECT LAST_INSERT_ID() to get the last inserted value.
in MSSQL, use a SELECT @@IDENTITY to get the same. (check in the docs whether you need @@IDENTITY OR @SCOPE_IDENTITY or the third version, I always forget).
Fricking lasers! That's what would improve SQL - frickin' lasers!
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
No doubt it has defects, but SQL has a strong theoretical underpinning in set theory. This has made it a very durable language and one that scales to sizes probably unimagined by Dr Codd when he outlined its roots in 1970 in his article "A relational model of data for large shared data banks".
Computings needs for well structured access and manipulation of large data sets has been well served by SQL.
A clear replacement has yet to emerge. There are pretenders to the throne, of which Tutorial D is certainly technically nice, XQuery is a mess and ODBMSs (and their query tools) really haven't caught on.
Its just that SQL passes a simple test - its good enough for the job and relatively ubiquitous. And standards do exist (that every major vendor breaks. sigh.).
Note that the layers exist only before the query optimizer, no extra performance cost beyond that.
Only that it forfeited it when IBM chose to populate System R with people who never understood Codd's work and refused to learn from him, in a move many interpreted as trying to create a failure in order to preserve the IMS/DB cash cow.
Yep, but this is not a technical issue, only a market one.
Only that no one defined yet an Industrial D. Rel may yet define it, perhaps, if it ever changes its current exclusive educational focus.
Thanks God, this would be a 35-years regression into graph-based, pre-relational systems.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
And how will using full Relational Algebra change our existing ORM systems? And will it impact the object model required of the programming languages? And are there examples of existing commercial software packages that offer a Relational Algebra interface? How much money could we make if we decided to build one?
And who could do it faster, MS, Oracle or Open Source?
And how do we justify buying it to management?
I18N == Intergalacticization
Programmers shouldn't be writing logic directly in SQL anyhow; that's what we have *programming* languages for. SQL is a data language, like XML or YAML, a language that computer programs should be using to transmit data amongst themselves. However, the programmer shouldn't be hardcoding the SQL directly into the program logic. That should be handled by a library, possibly a multi-layer stack of libraries (e.g., a subclass of Class::DBI running over DBI with a DBD connector for the database in question).
Then you write your program logic in a programming language (preferably a multiparadigmatic VHLL), which will always have much more expressiveness than trying to do the equivalent thing in a data language like SQL.
Cut that out, or I will ship you to Norilsk in a box.
All I want is for every database to have the same functions; a standardized way to do the basics.
Every vendor seems to have their own ways to define (or arbitrarily break standard) date functions (add/compare/convert/get current timestamp), string manipulation (like uppercasing, substrings and concatenation), getting the generated id from an inserted row (identity/serial/auto_increment), limiting the number of rows returned (TOP or FIRST?), getting a subset of rows (ie a standard way to get rows 100-150 that works with most DBs) or even getting a list of tables or viewing the schema.
Trying to make an app portable across DBs is next to impossible, and that's not even counting stored procedures or different behaviours for the same syntax (like NULL handling across the various functions). This is very irritating and should have been fixed long ago. Instead, we get this crap which makes the differences we see across different web browsers look like child's play.
You can accomplish anything you set your mind to. The impossible just takes a little longer.
Are you sure about that? Because when you spell them out, the database still has to verify you haven't asked for something it doesn't have. Obviously it needs the names in a SELECT *, but it doesn't need to error check them. Also, with databases that keep track of the version number of the table, I doubt they bother flushing the cache on "SELECT *" unless the schema changes. I mean, what would be the point since it knows if it's changed in advance?
I don't really know as I've never written a database or gone into optimization that far, so I'm asking you to elaborate for me, if you don't mind.
I agree with the 100 cols thing, although once you have that crap in there and you have users who are used to that model of thinking in writing their own reports, it's hard to get people to think otherwise. E.g., recently I had a bitch of a time convincing another employee that they should just use the stored proc I wrote directly rather than having us run the damn thing in a cron job every night to populate a column. But that's just one battle I won.
It's a crazy world out there and sometimes you just have to bite the bullet and add that 101st col while holding your nose!
-l
Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
SQL would be easier for people to learn if the order of a statement's clauses was the same as the order in which they are processed. Specifically, move the SELECT (projection) clause to the end of the statement, instead of the beginning:
... ... ... ... ...
FROM
WHERE
GROUP BY
HAVING
SELECT
Declarative languages are fine, but it always helps a programmer to be able to imagine how a statement will be processed, because order of evaluation makes a difference.
A RDBMS query language needs to be structured around relational variables & relational operations. Such a language makes it easy to do things that SQL's limited syntax make much too difficult.
Now, if we could just have RDBMSs that actually did the M part for you. Or at least one that isolated the management so only the DBA saw it & presented a clean, normalized view to the outside world.
1. End all fixed width field requirements. (Not so much a SQL misfeature as a database misfeature, but still...)
2. Implement standard data types for dates, currency, names, and so on. Give them some smarts, so you can read a date and have it return the actual correct value, not some string that could be formatted in any random way and has a missing time zone.
3. Make the grammar less fussy. Half the problem I have with writing SQL queries is that when I'm done, I have to mess with the pieces of the query until I get them in the right order that the system will accept them. Or even better...
4. Get rid of that stupid English-like syntax and give us a proper notation based on Scheme or Ruby or something.
5. Standardize the bits that aren't standardized yet but every real database needs, like referential integrity constraints.
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
I write it as follows:
B illed -
[Modify dsname] JOIN ONE LEASE TO MANY INVOICE VIA LeaseNum to InvoiceNum
MODIFY dsname JOIN INNER
SET DS LEASE
FIND LeaseNum "1234"
PRINT -
Lease.LesseeNum,Invoice.InvoiceNum,Invoice.Amount
BY Lease.LeaseNum - !automatically preceeds print item list
BY Invoice.InvoiceNum Suppressed
Is that simpler?
It's not what you Warg, it's how you Snarf
That looks more like a sequential process as opposed to a single request. Interesting approach though.
-Rick
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
I'd just as soon give the DB a time interval to analyze and decide by itself what indexes to create based on the SELECT/INSERT/UPDATE/DELETE operations that happen in that interval.
Thanks for the pointer. It confirms what had previously been my sneaking suspicion. Referring to a database as "data persistence" shows ignorance of at least the reporting capabilities of a real database, and likely other things as well. These "object persistence" abstractions will make a 2-day project out of a report that would take 2 minutes to write in SQL. E.g., you wanted to see what magazines people in zip code 90210 subscribe to in order of popularity.
Try :
SELECT * FROM table LIMIT y OFFSET x
That would make it even usable with PostGreSQL and most DB, since the x,y is a mysql extention.
I never liked tables. Too much impedance mismatch. I would get rid of tables.
Just talking about this to a co-worker yesterday. I hate the fact that I can't do this:
select (first_name||' '||last_name) as name
from customers
where name like 'z%'
instead i have to do this:
select (first_name||' '||last_name) as name
from candidates
where (first_name||' '||last_name) like 'z%'
even thought I JUST defined the alias! granted, i've only ever used Oracle, I don't know how other DBs handle the above.
DELETE * ?
The Raven
I've been doing SQL since 1997, and every time I've used GROUP BY I've used exactly the non-aggregate expressions in the SELECT clause. Can you give an example where it would be something else? I too would like GROUP BY DEFAULT or somesuch.
Is there a standard way to create or alter a table? The databases I've worked with (MySQL, Postgres, Oracle) all seem to use similar syntax but have slight variations, besides the fact that they use different names for the various column types, and certain column types only exist in one or a few DBMS's. And that doesn't even take into account certain database abstractions that are DBMS-specific.
Surely your local university has some kind of basic databases course. If you are not a student yourself then visit the course info page (or email the lecturer) and find out what text books they recommend.
consider hibernate.....I tune sql by mapping, and its been ages since I wrote by hand a query....well that's of course if using java
Although there are very few times when it makes sense to have a table spread across multiple tables, there are times when it is better to have 1:1 relationships in tables.
If you have a large table that gets used for more than one purpose, and one of those purposes uses relatively few columns, and it occurs more frequently than the other purposes, it may be better to have its information in a seperate table. You could also use a multi-column index, but only if the table has a significant number of writes per read, it may be a problem.
Next, with our two purpose concept again -- if the purpose that is called more often uses few columns, and those columns are all fixed width, you may see advantages in moving them to a seperate table.
And, once again -- two purposes, and the security levels are different for who has access to the data -- I might split up the data, if the sensitive data is relatively small in relation to the entire table, so that I only have to protect the smaller amount of data. (I know -- you're thinking 'But I can use views, or set up rights in the database', but you're forgetting that you also have to protect the backups, and it's a whole lot easier to keep track of one tape, as opposed to a whole rack of tapes, depending on the amount of data you're dealing with.)
So ... are there times I'd use tables for a one to one relationship when it wasn't a sparesely populated table? Yes. Are there times when it's used and it probably shouldn't have been? Yes, but every situation is different, and if someone can back it up with benchmarking using real data, or has a damned good reason for doing so, it's better than blindly always doing one thing without understanding all of the implications.
Build it, and they will come^Hplain.
Ignore the negative reviews in Amazon, by the way. It was one of my undergraduate degree texts and had no problems with it. And of course, considering that the negative reviews seem to be pissed at it because (a) it covers the theory properly; (b) it's written by a bunch of Scots, so it's "British" and that's apparently bad - I don't know how we survive over on this side of the pond with all those terrible American books ;-); (c) it isn't like a Dietel & Dietel book, and that's somehow a bad thing; (d) they were looking for Databases and SQL for Dummies and discovered that this wasn't it. The only relevant criticism made is that it's a little wordy. Ah well.
I don't like trolls and mod against me if you like, but I'd prefer if you'd reply.
Create a view that has those tables joined in the way you want. Then execute your queries against that view.
Well, like I said, I'm no SQL expert. The only experience I have with SQL is some very simple web projects, and the in-house inventory/order tracking software which was written by someone who overused stored procedures and joins, and never used views or foreign keys. Since the front end for the order tracking software was written in MS Access (which is responsible for enforcing all security), it's not really a surprise. It might mean my view on SQL is a little warped.
I used up all my sick days, so I'm calling in dead.
I'd bin it and start again, aiming to create a more complete standardised language. If you look at the SQL standard, there's not even a common means for creating tables and databases. I can only assume that's by design, but it does prevent you from writing vendor agnostic SQL.
Fair enough
2. Relations are sets. SQL allows duplicate rows, so its tables aren't sets, and therefore aren't relations. (This property alone is enough to make it "not relational" by the way).
This is true
3. Relation *attributes* (the column names) are also sets. SQL allows columns with the SAME NAME in a query result!!
Is this true (the first part)? i.e. can you (in a "pure" relational language) have a query like SELECT (SELECT favouritecolumn FROM staff WHERE name='Smith') FROM sometable to select the favourite columns of people called Smith?
4. SQL has no "table equality" operator. You'd think the first operator you'd implement for a data type, especially a fundamental data type, would be equality! Imagine a programming language with no integer equality for instance.
Yes, this is a shortcoming
5. Relations require each attribute to be drawn from a single type or domain. SQL allows NULLs, which are values not drawn from the column's type. And SQL gives you very little to help you work without NULLs. To add insult to injury, the default for columns is NULLable.
Wrong, if the column is NULLable then NULL is part of the column's type—it should be treated as a value distinct from any other value in that type. What is true is that SQL's treatment of NULLs is broken (and inconsistent between implementations).
6. (related) Relational algebra requires boolean logic. SQL uses three-valued logic because of NULLs. And it uses it *inconsistently*.
Yes, NULL-handling is broken.
7. The relational model does not specify a type system, it just requires one. Yet SQL specifies it's own particular type system (integers, chars, etc). What if you want to store XML or audio in one of your columns?
Then do so :-)
8. The relational model specifies nothing about physical implementation. Yet, almost every SQL product stores the columns of tables "together" in such a way that makes joins needlessly expensive.
What's that you say? "almost every SQL product"? So you admit that this is not a flaw of SQL itself? And that it's not even something that makes SQL non-relational? The fact that some (even all) implementations are lacking in some respect does not make for a flaw in the language. (SQL has enough flaws in of itself, there's no need to make them up :-)
9. SQL distinguishes between "base tables" and "views". The relational model requires them to be indistinguishable to the end user. Specifically, most SQL implementations don't let you update views! Pretty unbelievable. Imagine a programming language that didn't let you pass arguments to any function for instance.
Your comparison just doesn't make sense. You know perfectly well why "base tables" and "views" are distinguished, and this problem exists in a pure-relational model too.
10. SQL lets you do meaningless things like multiply the primary key values of two tables or add a weight to a height. This is related to the type system issues.
There is nothing about the relational model that stops one adding a weight to a height... as you say, this is more related to the choice of type system.
11. SQL confuses KEYs (logical) with INDEXes (physical implementation).
Fair enough.
12. (This one gets me all the time) SQL has an EXISTS operator (is this statement true for at least one value of this result?) but not a FOREACH operator (is this statement true for all values in this result?) .. I think this is related to lack of table equality.
This is disingenuous as well, since the EXISTS operator does not work as you suggest (it's "is this statement true for at least one tuple")—the corresponding operator would be FORALL which would involve checking a
Need to type accents and special characters in Windows? Use FrKeys