How Would You Improve SQL?

← Back to Stories (view on slashdot.org)

Posted by Cliff on Thursday November 3, 2005 @10:25AM from the select-*-except-column99-from-table dept.

theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"

12 of 271 comments (clear)

Min score:

Reason:

Sort:

Indexes by diamondmagic · 2005-11-03 10:30 · Score: 2, Interesting

Right now there is no standard way of maintaining indexes. Most databases have some sort of CREATE INDEX query, but it is by no means standardized.

--
Wonder what the public key field is for?
Hierarchical queries by Bogtha · 2005-11-03 10:38 · Score: 4, Interesting

Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.

--
Bogtha Bogtha Bogtha
Re:Better NULL handling? by Otter · 2005-11-03 11:05 · Score: 2, Interesting

I don't work with databases, just with enormous flat text and SAS files spit out by them, but -- isn't the proper way to do that with one table with user, question# (1-100), and response (0,1) and another with question# and question?
Your method may be technically correct (I could never get those normal things straight) but as the GP points out, it's an unwieldy, inflexible way to do it.

--
What I'm listening to now on Pandora...
Replace it by Lisp by RAMMS+EIN · 2005-11-03 11:06 · Score: 2, Interesting

I'd replace it by a special-purpose Lisp, and compose it like s-expressions. Mix and match query elements in a flexible manner, yet never run the risk of injections, because it all happens in a structured way. I've done things like this on a small scale (contact information database), and it works really nicely.

--
Please correct me if I got my facts wrong.
SQL is only 1/2 the story by klui · 2005-11-03 11:08 · Score: 2, Interesting

No standard way to extract/load data. No standard way to get all tables in a database. Basically DDL is entirely separate and each type of database has its own way of doing things. Let's not talk about embedded SQL and optimizing queries (like Oracle hints.. *ugh*).
Re:Better NULL handling? by hawkbug · 2005-11-03 11:09 · Score: 2, Interesting

You could easily do it the way you propose - the only thing is, you now have 100 rows where before you had 1 - which I'm not saying is the wrong way... but either way you're going to have 100 columns or 100 rows. Performance wise, I'm not sure which is better or if it matters.
Re:Better NULL handling? by Johnno74 · 2005-11-03 11:10 · Score: 5, Interesting

I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized

I've got mod points but I just have to give them up to reply to this....

You're wrong. I agree totally with the grandparent post.
You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.

And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.

A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
Dates by omibus · 2005-11-03 11:30 · Score: 3, Interesting

1. Standard date functions and handling.
2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.

--
Bad User. No biscuit!
Re:Better NULL handling? by boxxa · 2005-11-03 11:42 · Score: 2, Interesting

I think its better performance wise to have more rows than columns. I develop a lot of PHP/MySQL applications, comerically and OSS and I find that using multiple small tables with more rows seems to load and run smoother than tables with a large number of columns. Maybe something with the indexing but I have no actual proof of what works better. On the original topic, I believe that MySQL has its downfalls compared to other database systems but for me personally, I find that I can work around the problems with it to build highly deployable PHP/MySQL web applications very cheaply. Hey, when its free, I can deal with it.

--
Bryan
Re:Better NULL handling? by Naikrovek · 2005-11-03 12:23 · Score: 2, Interesting

if you put 0's in unanswered fields, then you'll have an awful lot of zeros, and since you're using bits, that's a lot of 'no' answers, which isn't what's really there.

any DBA worth his weight in salt knows that you ever see a single table with 100 columns then you have a major design issue.

splitting that up into several tables in the same database will offer a significant performance increase.
Re:Winning the special olumpics and debating an AC by Anonymous Coward · 2005-11-03 12:38 · Score: 3, Interesting

You're missing the point. You really have to study the theory, and you'd get something like this (the exact syntax is unimportant of course):

((Lease JOIN Invoice) WHERE LeaseNum = "1234")
[LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
ORDER BY whatever

Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.

Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.

Why do I leave off the Theta from the join (the equality test)? Because I *already* set the foreign keys on those tables. The DBMS should be able to *deduce* which columns to join on and generate an error if it can't.

Here's what's happening:

I used the JOIN operator on two base relation values, stored in variables with the names Lease and Invoice. I got a third (anonymous) relation with all of the combined columns.

I then applied the RESTRICT ("where") operator on that relation, along with a boolean expression, and I got a fourth relation with just the rows where that expression was true.

I then PROJECTED that relation to get a fifth relation with just the desired columns.

Note that ORDER BY is a non-relational operator. It turns the relational result (unordered set) into an an ordered array. So for a final step, the relation was 1) turned into a regular array and 2) ordered.

Of course I didn't actually perform those steps. I told the DBMS what I wanted, and *it* did the work.

Do you see how the SQL conceals the underlying algebra? And how it makes YOU do the work (in the join for instance)? You might not see it. Study the theory more, and you will. Compare with the A+3 example. Imagine a 12-table join. Imagine having to do a query like "where all rows of table X are a subset of all rows in table Y" right in the middle. An algebraic notation would make this MUCH easier. Just break it down, and apply the next operator to the result of the last. You do it all the time when programming your favorite language, why not in SQL?

And yes, the differences get MUCH deeper than this. SQL can't even represent all possible relational queries!
There are extensions by pestilence669 · 2005-11-03 15:32 · Score: 2, Interesting

Most SQL dialects include some sort of exclusion operator.

SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;

or

SELECT * FROM A EXCEPT SELECT "B" FROM A;

Other engines do it differently. I think one of the best things about SQL is that it's a loose standard. You can easily choose the engine that works best for you... unless you are from the Cult of Microsoft (SQL Server). DB/2, Oracle, and even Sybase have very cool features that make queries much more powerful.

While SQL is hard to use at times (remembering double outer joins), it's that way for a reason. You don't want to be as easy to use as VB, for instance. Being forced to think in terms of lists and cartesean products forces you to think about speed and abstraction.

SQL is as easy as it should be, IMO. Specializing the access modifiers will only add to the complexity and make query optimization an impossibility. If you don't care about speed, then your needs probably aren't serious enough for a full blown SQL RDBMS. Text, XML, or even MS Access could be better suited.

Complaining about SQL is like complaining about Linear Algebra. These systems exist for exceptionally good reason. They are constrained to reduce or eliminate unsolvable situations.