How Would You Improve SQL?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Thursday November 3, 2005 @10:25AM from the select-*-except-column99-from-table dept.

theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"

4 of 271 comments (clear)

Min score:

Reason:

Sort:

Hierarchical queries by Bogtha · 2005-11-03 10:38 · Score: 4, Interesting

Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.

--
Bogtha Bogtha Bogtha
Re:Better NULL handling? by Johnno74 · 2005-11-03 11:10 · Score: 5, Interesting

I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized

I've got mod points but I just have to give them up to reply to this....

You're wrong. I agree totally with the grandparent post.
You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.

And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.

A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
Dates by omibus · 2005-11-03 11:30 · Score: 3, Interesting

1. Standard date functions and handling.
2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.

--
Bad User. No biscuit!
Re:Winning the special olumpics and debating an AC by Anonymous Coward · 2005-11-03 12:38 · Score: 3, Interesting

You're missing the point. You really have to study the theory, and you'd get something like this (the exact syntax is unimportant of course):

((Lease JOIN Invoice) WHERE LeaseNum = "1234")
[LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
ORDER BY whatever

Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.

Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.

Why do I leave off the Theta from the join (the equality test)? Because I *already* set the foreign keys on those tables. The DBMS should be able to *deduce* which columns to join on and generate an error if it can't.

Here's what's happening:

I used the JOIN operator on two base relation values, stored in variables with the names Lease and Invoice. I got a third (anonymous) relation with all of the combined columns.

I then applied the RESTRICT ("where") operator on that relation, along with a boolean expression, and I got a fourth relation with just the rows where that expression was true.

I then PROJECTED that relation to get a fifth relation with just the desired columns.

Note that ORDER BY is a non-relational operator. It turns the relational result (unordered set) into an an ordered array. So for a final step, the relation was 1) turned into a regular array and 2) ordered.

Of course I didn't actually perform those steps. I told the DBMS what I wanted, and *it* did the work.

Do you see how the SQL conceals the underlying algebra? And how it makes YOU do the work (in the join for instance)? You might not see it. Study the theory more, and you will. Compare with the A+3 example. Imagine a 12-table join. Imagine having to do a query like "where all rows of table X are a subset of all rows in table Y" right in the middle. An algebraic notation would make this MUCH easier. Just break it down, and apply the next operator to the result of the last. You do it all the time when programming your favorite language, why not in SQL?

And yes, the differences get MUCH deeper than this. SQL can't even represent all possible relational queries!