Slashdot Mirror


How Would You Improve SQL?

theodp asks: "It was the best of languages, it was the worst of languages. SQL's handy, but it can also drive you nuts. For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names. Wouldn't it not make sense to provide a Google-like shorthand notation like 'SELECT * -ColumnName' (or DROP=syntax like SAS)? So how would you improve SQL?"

4 of 271 comments (clear)

  1. ugh... by Anonymous Coward · · Score: 5, Insightful

    Did you know there have been people working on a general algebra for data management for, what 40 years now? Did you know, this is basically a SOLVED PROBLEM? Ever heard of "D"? Or Tutorial D? The Third Manifesto?

    You know, I totally understand why Fabian Pascal is always pissed off.

    Here's something for you to chew on:

    Why do programmers write this:

    A + 3

    when they want to add 3 to A? Why do we not write some lovely crap like:

    OPERATE ON A WITH 3 USING ADDITION

    why do we write:

    (A + 3) * 2

    and not:

    OPERATE ON (OPERATE ON A WITH 3 USING ADDITION) WITH 2 USING MULTIPLICATION

    Why do we do that?? Because algebraic notation is 1) declarative .. it tells the computer what you want, it doesn't tell it how to do it and in what order, and 2) algebraic notation is *completely general*. You can nest arbitrarily with parentheses, and you can clearly see what's a variable and what's a value and what's an operator. Easy to create, understand, and *optimize*.

    Do you agree with me that the verbose syntax clouds your thinking? Keeps you from seeing the underlying operations? Makes it difficult to apply the basic algebraic skills you learned in high school? Makes it difficult for the compiler writer to do the same?

    Now I ask you, why do we write:

    SELECT * FROM Order

    and not

    Order

    Why do we write:

    SELECT * FROM Order JOIN OrderItem WHERE Order.order_id = OrderItem.order_id

    and not

    Order JOIN OrderItem

    And here you are, dwelling on some little detail about projecting columns.. this is an easy one: use an "ALL BUT" operator for example:

    RelvarWith100Attributes ALL BUT (Attribute100)

    Once you see that relational algebra is just values, variables, and operators nested in any arbitrary way, just like arithmetic, you have opened the door a little more to understanding the fundamental theory of data management and how backwards and primitive "modern" data management is.

    And let's not even get into all the crap that SQL gives us like duplicate rows, NULLs, brain-dead table-oriented storage, lack of 100% updateable views, lack of arbitrary constraints, (often) lack of composite types (why the hell do we splat objects into MULTIPLE COLUMNS?? They should be stored in ONE column). SQL also confuses logical and physical layers (keys vs. indexes), and has basically kept the database industry in the dark ages for decades now.

    So the answer to your question is pretty simple: I would ditch SQL and use something that looks like relational algebra, which has been understood and documented for a probably longer than you've been *alive*. No offense.

  2. Re:Better NULL handling? by Johnno74 · · Score: 5, Interesting
    I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized

    I've got mod points but I just have to give them up to reply to this....

    You're wrong. I agree totally with the grandparent post.
    You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.

    And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.

    A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
  3. Re:Drop it for something relational by Anonymous Coward · · Score: 5, Insightful

    In what way isn't it relational?

    1. SQL syntax doesn't look like relational algebra (see my long rant above). This clouds thinking and hides the simplicity of the underlying model.

    2. Relations are sets. SQL allows duplicate rows, so its tables aren't sets, and therefore aren't relations. (This property alone is enough to make it "not relational" by the way).

    3. Relation *attributes* (the column names) are also sets. SQL allows columns with the SAME NAME in a query result!!

    4. SQL has no "table equality" operator. You'd think the first operator you'd implement for a data type, especially a fundamental data type, would be equality! Imagine a programming language with no integer equality for instance.

    5. Relations require each attribute to be drawn from a single type or domain. SQL allows NULLs, which are values not drawn from the column's type. And SQL gives you very little to help you work without NULLs. To add insult to injury, the default for columns is NULLable.

    6. (related) Relational algebra requires boolean logic. SQL uses three-valued logic because of NULLs. And it uses it *inconsistently*.

    7. The relational model does not specify a type system, it just requires one. Yet SQL specifies it's own particular type system (integers, chars, etc). What if you want to store XML or audio in one of your columns?

    8. The relational model specifies nothing about physical implementation. Yet, almost every SQL product stores the columns of tables "together" in such a way that makes joins needlessly expensive.

    9. SQL distinguishes between "base tables" and "views". The relational model requires them to be indistinguishable to the end user. Specifically, most SQL implementations don't let you update views! Pretty unbelievable. Imagine a programming language that didn't let you pass arguments to any function for instance.

    10. SQL lets you do meaningless things like multiply the primary key values of two tables or add a weight to a height. This is related to the type system issues.

    11. SQL confuses KEYs (logical) with INDEXes (physical implementation).

    12. (This one gets me all the time) SQL has an EXISTS operator (is this statement true for at least one value of this result?) but not a FOREACH operator (is this statement true for all values in this result?) .. I think this is related to lack of table equality.

    13. SQL implementations don't have ANY brains whatsoever. They don't know that book_id from column A and book_id from column B are equal in a join, and that you don't need both of them in the query result. They don't "look inside" your CHECKs and foreign keys to deduce information about your database and use that information to optimize queries.

    I'm sure if you picked up a basic theory book you'd find plenty of other nitpicks for the syntax, the semantics, and the basic underlying model of SQL.

    And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.

    If you are interested in learning more, get Date's O'Reilly book "Database in Depth". It's very short, roughly 200 pages, and tells you all you need to know about data management theory.

  4. Re:No poetry by plover · · Score: 5, Funny
    SELECT * FROM roses, violets WHERE roses.color = 'red' AND violets.color = 'blue'

    --- -- ----
    All my base
    are belong to you.

    2 row(s) returned.

    --
    John