How Would You Improve SQL?

Indexes by diamondmagic · 2005-11-03 10:30 · Score: 2, Interesting

Right now there is no standard way of maintaining indexes. Most databases have some sort of CREATE INDEX query, but it is by no means standardized.

--
Wonder what the public key field is for?

Re:Indexes by Anonymous Coward · 2005-11-03 17:16 · Score: 0

Far more than that isn't standardised.

For instance MySQL has the rather handy LIMIT n,m
Oracle on the other hand has a 'hidden' column called rowno so you need to do rowno m too.
Re:Indexes by Trevin · 2005-11-04 03:37 · Score: 1

I thought that you couldn't use 'rowno' in a WHERE clause without using a subquery?
Re:Indexes by Trevin · 2005-11-04 07:47 · Score: 1

I was slightly off. I found this from the O'Reilly book*:

ROWNUM returns a number indicating the order in which the row was retrieved from the table, but this is not always the order in which the row is displayed. For example, if a SQL statement includes an ORDER BY clause, rows will not be displayed in ROWNUM sequence, since ROWNUM is assigned before the sort operation.

MySQL's LIMIT, on the other hand, applies after sorting. This makes LIMIT much more useful for returning pages results (e.g. for a web-based application). To do the same thing in Oracle, you have to retreive the entire result set in a subquery, then use ROWNUM from the outer query to select the range of rows you want.

caveat: it's an old book, for Oracle 8
Re:Indexes by jmccay · 2005-11-04 13:29 · Score: 1

Or use a view if it is possible.

--
At the next eco-hypocrisy-meeting, count the private jets used to get to the meeting. Should be interesting to see that
Re:Indexes by hey · 2005-11-07 04:33 · Score: 1

Indexes should be hidden from the SQL programmer by now. Just like C compuilers basically ignore the "register" directive and make "should I put this in a register" decisions for the programmer.
Re:Indexes by danheskett · 2005-11-07 07:43 · Score: 1

Yeah, I have to go ahead and disagree with you on your assesment. Maybe auto-indexing would be generally good, but there are times when I as programmer can be 100% sure I want to avoid a full table scan that would not be optimized for by any automated process. I have one table in database I work with that contains 190 columns and and about 1.5B rows. I have a query I run once, maybe twice a year, that returns only a handful of rows but requires a WHERE on about 1/3 of the columns. When I do need to run the query I can't be waiting a few days for it: it's generally needed within 1 hr. An automated process that analyzes table usage by previous queries would not optimize appropriately for this unless you started adding all kinds of hints to the process. Maybe there is some value in an automated process but generally speaking any type of serious use is going to require some skilled tuning.

Drop it for something relational by leandrod · 2005-11-03 10:32 · Score: 0, Flamebait

SQL has too many defects -- it isn't relational, it is badly designed. Better dump it for Tutorial D or D4.

--
Leandro GuimarÃ£es Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin

Re:Drop it for something relational by Mr2cents · 2005-11-03 10:44 · Score: 1

In what way isn't it relational?

--
"It's too bad that stupidity isn't painful." - Anton LaVey
Re:Drop it for something relational by Anonymous Coward · 2005-11-03 11:10 · Score: 0

This guy comes out with this crap every time SQL is mentioed, constantly parroting his (apparent) hero, Fabian Pascal. Needless to say, he never has any proof of what he says, or anything remotely resembling an argument.

In fact, the only thing FP seems to be whinging about is that SQL doesn't deliberately prevent you from not creating a purely relational database.
Re:Drop it for something relational by Johnno74 · 2005-11-03 11:19 · Score: 0

SQL has too many defects -- it isn't relational

Umm, WHAT?

Back AWAY from the crack pipe, buddy.

Your comment makes no sense. How can a language be relational? SQL a functional language designed to query a relational structure.
Re:Drop it for something relational by Anonymous Coward · 2005-11-03 11:24 · Score: 5, Insightful

In what way isn't it relational?

1. SQL syntax doesn't look like relational algebra (see my long rant above). This clouds thinking and hides the simplicity of the underlying model.

2. Relations are sets. SQL allows duplicate rows, so its tables aren't sets, and therefore aren't relations. (This property alone is enough to make it "not relational" by the way).

3. Relation *attributes* (the column names) are also sets. SQL allows columns with the SAME NAME in a query result!!

4. SQL has no "table equality" operator. You'd think the first operator you'd implement for a data type, especially a fundamental data type, would be equality! Imagine a programming language with no integer equality for instance.

5. Relations require each attribute to be drawn from a single type or domain. SQL allows NULLs, which are values not drawn from the column's type. And SQL gives you very little to help you work without NULLs. To add insult to injury, the default for columns is NULLable.

6. (related) Relational algebra requires boolean logic. SQL uses three-valued logic because of NULLs. And it uses it *inconsistently*.

7. The relational model does not specify a type system, it just requires one. Yet SQL specifies it's own particular type system (integers, chars, etc). What if you want to store XML or audio in one of your columns?

8. The relational model specifies nothing about physical implementation. Yet, almost every SQL product stores the columns of tables "together" in such a way that makes joins needlessly expensive.

9. SQL distinguishes between "base tables" and "views". The relational model requires them to be indistinguishable to the end user. Specifically, most SQL implementations don't let you update views! Pretty unbelievable. Imagine a programming language that didn't let you pass arguments to any function for instance.

10. SQL lets you do meaningless things like multiply the primary key values of two tables or add a weight to a height. This is related to the type system issues.

11. SQL confuses KEYs (logical) with INDEXes (physical implementation).

12. (This one gets me all the time) SQL has an EXISTS operator (is this statement true for at least one value of this result?) but not a FOREACH operator (is this statement true for all values in this result?) .. I think this is related to lack of table equality.

13. SQL implementations don't have ANY brains whatsoever. They don't know that book_id from column A and book_id from column B are equal in a join, and that you don't need both of them in the query result. They don't "look inside" your CHECKs and foreign keys to deduce information about your database and use that information to optimize queries.

I'm sure if you picked up a basic theory book you'd find plenty of other nitpicks for the syntax, the semantics, and the basic underlying model of SQL.

And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.

If you are interested in learning more, get Date's O'Reilly book "Database in Depth". It's very short, roughly 200 pages, and tells you all you need to know about data management theory.
Re:Drop it for something relational by richg74 · 2005-11-03 11:27 · Score: 1

SQL has too many defects -- it isn't relational, it is badly designed
I agree. One of the great (potential) advantages of relational data bases (unlike previous horrors like IMS -- shudder) is that they had a theoretical basis in relational algebra. SQL loses some of that advantage.
Chris Date has written a lot about the deficiencies of SQL (e.g., how joins work). Check out his book Database in Depth, published by O'Reilly.
Re:Drop it for something relational by leandrod · 2005-11-03 13:20 · Score: 2, Informative

Your comment makes no sense.

Not for those who haven't learnt data fundamentals.

How can a language be relational?

By defining and manipulating relations.

SQL a functional language

It is not. Lisp is, Scheme, Haskell, not SQL. Never was, nor intended to be.

designed to query a relational structure.

SQL tables are not relations. The very words relation and relational have been dropped from the ISO SQL standards since 1999 at least.

--
Leandro GuimarÃ£es Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Re:Drop it for something relational by leandrod · 2005-11-03 13:24 · Score: 1

In what way isn't it relational?

SQL tables aren't relations, SQL data types are way too limited, NULLs are broken, classes are equated to tables, there are pointers... the ISO SQL standards themselves have long given up on even using relational terms.

--
Leandro GuimarÃ£es Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Re:Drop it for something relational by spagetti_code · 2005-11-03 22:33 · Score: 4, Insightful

No doubt it has defects, but SQL has a strong theoretical underpinning in set theory. This has made it a very durable language and one that scales to sizes probably unimagined by Dr Codd when he outlined its roots in 1970 in his article "A relational model of data for large shared data banks".

Computings needs for well structured access and manipulation of large data sets has been well served by SQL.

A clear replacement has yet to emerge. There are pretenders to the throne, of which Tutorial D is certainly technically nice, XQuery is a mess and ODBMSs (and their query tools) really haven't caught on.

Its just that SQL passes a simple test - its good enough for the job and relatively ubiquitous. And standards do exist (that every major vendor breaks. sigh.).
Re:Drop it for something relational by leandrod · 2005-11-03 23:50 · Score: 1

SQL has a strong theoretical underpinning in set theory

Only that it forfeited it when IBM chose to populate System R with people who never understood Codd's work and refused to learn from him, in a move many interpreted as trying to create a failure in order to preserve the IMS/DB cash cow.

A clear replacement has yet to emerge.

Yep, but this is not a technical issue, only a market one.

Tutorial D is certainly technically nice

Only that no one defined yet an Industrial D. Rel may yet define it, perhaps, if it ever changes its current exclusive educational focus.

XQuery is a mess and ODBMSs (and their query tools) really haven't caught on.

Thanks God, this would be a 35-years regression into graph-based, pre-relational systems.

--
Leandro GuimarÃ£es Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
Re:Drop it for something relational by psmears · 2005-11-08 02:48 · Score: 1

1. SQL syntax doesn't look like relational algebra (see my long rant above). This clouds thinking and hides the simplicity of the underlying model.
Fair enough
2. Relations are sets. SQL allows duplicate rows, so its tables aren't sets, and therefore aren't relations. (This property alone is enough to make it "not relational" by the way).
This is true
3. Relation *attributes* (the column names) are also sets. SQL allows columns with the SAME NAME in a query result!!
Is this true (the first part)? i.e. can you (in a "pure" relational language) have a query like SELECT (SELECT favouritecolumn FROM staff WHERE name='Smith') FROM sometable to select the favourite columns of people called Smith?
4. SQL has no "table equality" operator. You'd think the first operator you'd implement for a data type, especially a fundamental data type, would be equality! Imagine a programming language with no integer equality for instance.
Yes, this is a shortcoming
5. Relations require each attribute to be drawn from a single type or domain. SQL allows NULLs, which are values not drawn from the column's type. And SQL gives you very little to help you work without NULLs. To add insult to injury, the default for columns is NULLable.
Wrong, if the column is NULLable then NULL is part of the column's type—it should be treated as a value distinct from any other value in that type. What is true is that SQL's treatment of NULLs is broken (and inconsistent between implementations).
6. (related) Relational algebra requires boolean logic. SQL uses three-valued logic because of NULLs. And it uses it *inconsistently*.
Yes, NULL-handling is broken.
7. The relational model does not specify a type system, it just requires one. Yet SQL specifies it's own particular type system (integers, chars, etc). What if you want to store XML or audio in one of your columns?
Then do so :-)
8. The relational model specifies nothing about physical implementation. Yet, almost every SQL product stores the columns of tables "together" in such a way that makes joins needlessly expensive.
What's that you say? "almost every SQL product"? So you admit that this is not a flaw of SQL itself? And that it's not even something that makes SQL non-relational? The fact that some (even all) implementations are lacking in some respect does not make for a flaw in the language. (SQL has enough flaws in of itself, there's no need to make them up :-)
9. SQL distinguishes between "base tables" and "views". The relational model requires them to be indistinguishable to the end user. Specifically, most SQL implementations don't let you update views! Pretty unbelievable. Imagine a programming language that didn't let you pass arguments to any function for instance.
Your comparison just doesn't make sense. You know perfectly well why "base tables" and "views" are distinguished, and this problem exists in a pure-relational model too.
10. SQL lets you do meaningless things like multiply the primary key values of two tables or add a weight to a height. This is related to the type system issues.
There is nothing about the relational model that stops one adding a weight to a height... as you say, this is more related to the choice of type system.
11. SQL confuses KEYs (logical) with INDEXes (physical implementation).
Fair enough.
12. (This one gets me all the time) SQL has an EXISTS operator (is this statement true for at least one value of this result?) but not a FOREACH operator (is this statement true for all values in this result?) .. I think this is related to lack of table equality.
This is disingenuous as well, since the EXISTS operator does not work as you suggest (it's "is this statement true for at least one tuple")—the corresponding operator would be FORALL which would involve checking a

--
Need to type accents and special characters in Windows? Use FrKeys

Hierarchical queries by Bogtha · 2005-11-03 10:38 · Score: 4, Interesting

Extremely useful when you need to produce a result tree instead of a result list (e.g. Slashdot's nested comments). Oracle does this with "CONNECT BY", there is also a PostgreSQL patch available. Of course there are hacks to do this, such as adding extra fields to keep track of where you are in the tree, but they are a real pain in the arse compared with using the information that's already present in the database.

--
Bogtha Bogtha Bogtha

Re:Hierarchical queries by StormRider01 · 2005-11-03 17:42 · Score: 1

Check out Common Table Expressions in MS SQL 2005, you can do hierarchical queries with them.
Re:Hierarchical queries by Sweep+The+Leg · 2005-11-04 03:17 · Score: 0

Oracle also has syntax for essentially putting together a stored hierarchy, but does not bring it back as a hierarchy. You can essentially glue it together into a flat result set and query over n levels of the hierarchy.

I would say though that it is not particularly useful on the front-end to have a hierarchy as your results. What data structure would you store this as? Even if you create a hierarchial data structure, they aren't particularly efficient and don't warrant themselves well to direct display on a webpage without some work anyway.

I would say the major issue for me is a standard way amongst almost all databases to query n levels deep in a hierarchy and bring back the results flat with the depth level and/or parent/child without me having to store all this info in the DB or do expensive queries to produce the results. No, recursive queries are terrible and not the answer and don't dare use a cursor.
Re:Hierarchical queries by IPFreely · 2005-11-04 07:00 · Score: 1

That's one I would love to see. Thee are several things in Oracle I would love to be able to use elsewhere, like the inline conditional IF-THEN-ELSE. I treat SQL like a programming language. It feels a lot like LISP once you get going. But the SQL interpreter needs to be flexable enough to allow lots of subqueries in lots of strange places. Oracle does. Many others do not.
A short list of things I would like to be able to do (don't know how many of these are available as features in various DBs)
1. PIVOT - Convert rows to columns. (new in MS-SQL 2005 or so I hear)
2. un-PIVOT? - Convert columns to rows.
3. Create rowset from an enumerated list. (ie. embed a list like (1,2,3,4,5) and use it as a five row single column table for use in a subquery, kind of like the "Where x IN ({list})" syntax.)
4. easier/standard method of finding exclusion sets (ie. find all items not in this set, above and beyond the "Not In" clause which only works on one key)

--
There is nothing so silly as other peoples traditions, and nothing so sacred as our own.
Re:Hierarchical queries by K-Man · 2005-11-05 17:37 · Score: 1

Do some searches for "recursive sql". It's in the standard, but nobody put together a full implementation, unless something's come out in the last few years when I stopped following the issue. Briefly, it's an extension of SQL to allow selection of recursive sets, i.e. sets which refer to themselves. An example is "the set consisting of the root comment, union the comments whose parents are in the set" to get the full comment hierarchy.

In general this feature should allow traversal of any connected graph, with cycles, etc. Oracle's hack only works on trees, and fails on cycles, last time I checked.

--
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger

No poetry by Fred_A · 2005-11-03 10:38 · Score: 3, Funny

You can't write poetry in SQL. So it remains an inferior language compared to Perl.

(yes, well, I ran out of ideas)

--

May contain traces of nut.
Made from the freshest electrons.

Re:No poetry by Tablizer · 2005-11-03 17:35 · Score: 1

You can't write poetry in SQL.

SELECT * FROM roses, violetes WHERE roses.color = 'red' AND violetes.color = 'blue'

(you just had to tempt me, didn't you)

--
Table-ized A.I.
Re:No poetry by plover · 2005-11-03 19:19 · Score: 5, Funny

SELECT * FROM roses, violets WHERE roses.color = 'red' AND violets.color = 'blue'
--- -- ---- All my base are belong to you.

2 row(s) returned.

--
John
Re:No poetry by indifferent+children · 2005-11-04 00:41 · Score: 1

SELECT * FROM brillig WHERE toves IN (SELECT toves FROM the_wabe WHERE (slithy and (action='gyre' or action='gimble')))

--
Censorship is telling a man he can't have a steak just because a baby can't chew it. --Mark Twain
Re:No poetry by mccoma · 2005-11-04 01:35 · Score: 4, Funny

roses are red
violets are blue
you just did a Cartesian Product
your DBA will be talking to you
thank you, thank you, shows at 7 and 10, remember to tip the waitstaff
Re:No poetry by Flwyd · 2005-11-04 04:58 · Score: 1

select me from datingpool where ibask in ('your', 'gaze') group by seven, oclock order by eight, oclock

insert into womb (select * from man natural join woman)

delete me from thisworld where mylife is null

--
Ceci n'est pas une signature.
Re:No poetry by Spock+the+Vulcan · 2005-11-04 07:27 · Score: 2, Funny

It is theoretically impossible to get the results you listed with that SQL query.
Re:No poetry by PureCreditor · 2005-11-04 08:45 · Score: 1

hahha good one =)

select * from roses join violets on roses.color != violets.color

?
Re:No poetry by plover · 2005-11-04 09:04 · Score: 2, Funny

It is theoretically impossible to get the results you listed with that SQL query.
And it's also theoretically impossible to get this far in life without a sense of humor.
I guess we were both wrong.

--
John
Re:No poetry by Tablizer · 2005-11-05 06:23 · Score: 3, Insightful

A Cartesian Join is how you generate every combination of poetry so that you can patent everything.

--
Table-ized A.I.

Check out LINQ... by 0kComputer · 2005-11-03 10:38 · Score: 3, Informative

If you want to get an idea of some cool SQL improvements, check out the
http://msdn.microsoft.com/netframework/future/linq /
LINQ (Language Integrated Query) project for c# 3.0. Some cool stuff tht i never really thought about.

For example, their select statements go backwords ie from table, select column1, n2, n3 etc... Seems kinda wacky at first, but it makes sense since you really should know what table your'e selecting from before you specify the columns.

ex.

public void Linq3() {
List products = GetProductList();

var expensiveInStockProducts =
from p in products
where p.UnitsInStock > 0 && p.UnitPrice > 3.00M
select p;

Console.WriteLine("In-stock products that cost more than 3.00:");
foreach (var product in expensiveInStockProducts) {
Console.WriteLine("{0} is in stock and costs more than 3.00.", product.ProductName);
}
}

--
Top 10 Reasons To Procrastinate
10.

Re:Check out LINQ... by Rayban · 2005-11-03 11:23 · Score: 1

The only thing worse than SQL itself is SQL embedded within another language.

*shudder*

--
æeee!
Re:Check out LINQ... by T-Ranger · 2005-11-03 11:32 · Score: 1

Except that its the columns that are relevent, and what your interested in, where as the tables, joins, etc are an implementation detail. I could start talking like Larry Wall and use nifty terms like "end weight", sufficent to say that with a syntax for fetching data, the data is the important part, and the important part should come first.
Re:Check out LINQ... by Usquebaugh · 2005-11-03 12:55 · Score: 1

Back in the day Burroughs had a 4gl called LINC, 'Language Interface Network Compiler'. Most devs substituted 'Laugh I Nearly Cried'.

Prehaps MS meant 'Laugh I Never Queried' or 'Laugh I Nearly Quit'
Re:Check out LINQ... by arkanes · 2005-11-04 01:32 · Score: 1

That would be true, except that when you're writing SQL, the important (and hard) part is describing what you want - which is the tables and the joins and the clauses and the ordering, not so much the column names.
Re:Check out LINQ... by PickyH3D · 2005-11-04 17:18 · Score: 1

Actually, one of the interesting things about LINQ is that it is not necessary to link it to SQL. It can be up against XML, or SQL, and I think a few other schemes (may even plainly linear arrays?).
My only problem with LINQ is the implementation of the var keyword and idea. Yes, it is dynamically linked to the proper data type at runtime, but with the inclusion of templates ("generics") in C# 2.0, then in C# 3.0, the var should be a generic and STATED as such, or else it should be plainly, and strongly typed as an ordinary loop would require in C# 1.0. This is just a technicality though because at the end of the day I assume the compiler is probably implementing some sort of generic's scheme for you if you use var (since you CAN use a strongly typed variable if you want).

Better NULL handling? by joto · 2005-11-03 10:39 · Score: 4, Insightful

For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names.

If this is your main problem with SQL, then you have other problems as well. Who in their right mind needs a table with 100 columns? If you have 100 columns, you seriously need to normalize your database.

Ok, I might not be a database buff. Actually, my experience with SQL is purely academical (although I've worked with object-oriented databases). But if I were to improve SQL, my attempts would be in the direction of making it into a more pure mapping of a relational database, not in adding yet more syntactic sugar.

Re:Better NULL handling? by hawkbug · 2005-11-03 10:54 · Score: 1

"If this is your main problem with SQL, then you have other problems as well. Who in their right mind needs a table with 100 columns? If you have 100 columns, you seriously need to normalize your database."

You're kidding, right? I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized. Say you had a questionaire on a web form. Let's say it had 100 questions. Let's say you had a checkbox next to each. When the user checked the box, the "bit" field you set up for question number 1 gets set to 1. Same applies to 2, it gets set to 1 from 0. You could have 100 columns, one for each question - one row for each user. I see nothing wrong with that. Sure, there are different ways to do it, but there is nothing wrong with doing it that way. Also, think of a tax form - imagine all the questions for each form. Then imagine having a table for each form. Just because it's a large number doesn't mean it's not normalized - some things are just large to begin with, and you don't want to fracture them too much.
Re:Better NULL handling? by Otter · 2005-11-03 11:05 · Score: 2, Interesting

I don't work with databases, just with enormous flat text and SAS files spit out by them, but -- isn't the proper way to do that with one table with user, question# (1-100), and response (0,1) and another with question# and question?
Your method may be technically correct (I could never get those normal things straight) but as the GP points out, it's an unwieldy, inflexible way to do it.

--
What I'm listening to now on Pandora...
Re:Better NULL handling? by diamondmagic · 2005-11-03 11:09 · Score: 1

That is what the array column type is for.

--
Wonder what the public key field is for?
Re:Better NULL handling? by hawkbug · 2005-11-03 11:09 · Score: 2, Interesting

You could easily do it the way you propose - the only thing is, you now have 100 rows where before you had 1 - which I'm not saying is the wrong way... but either way you're going to have 100 columns or 100 rows. Performance wise, I'm not sure which is better or if it matters.
Re:Better NULL handling? by Johnno74 · 2005-11-03 11:10 · Score: 5, Interesting

I can think of a lot of reasons to have 100 columns, it simply depends on what you're working with - and it is perfectly normalized

I've got mod points but I just have to give them up to reply to this....

You're wrong. I agree totally with the grandparent post.
You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other.

And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space.

A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
Re:Better NULL handling? by Anonymous Coward · 2005-11-03 11:19 · Score: 1, Interesting

I'm sorry to bitch but that's so so wrong. What happens when your questionaire has a question added?

Something like:
ALTER TABLE Quiz ADD COLUMN Answer101 BIT NULL

So now you have to deal with all the previouslly submitted quizzes only having the first 100. Same for how to do you delete quizzes? Add another quiz? Now you've got to deal with stuff like [Quiz12].[Question50].

How about this, sorry about errors, dinner's on the stove:

CREATE TABLE Quiz
{
ID INT IDENTITY(1, 1)
-- add any other attributes
)

CREATE TABLE Question
(
ID INT IDENTITY(1, 1),
Text VARCHAR(1000)
)

CREATE TABLE QuizQuestion
(
QuizID INT, --add FK
QuestionID INT, --add FK
QuestionNumber INT
)

CREATE TABLE QuizResponse
(
ID INT IDENTITY(1, 1),
QuizID INT, --add FK here
Submitted VARCHAR(1000)
-- add any other attributes
)

CREATE TABLE Answer
(
QuizResponseID INT, --add FK whatever here
QuestionID INT, --add FK whatever here
Answer VARCHAR(1000) --or BIT, INT, whatever
)

Now you can have as many quizzes, questions, or answers as you need. Yes, it's a little more complicated and odds are I missed a major component of what's needed, but you should see the flexibility that this solution would offer over:

CREATE TABLE Quiz1
{
Answer1 BIT,
Answer2 BIT,
Answer3 BIT, .....
AnswerXX BIT
}

Sorry to ramble, but I've just spent a great deal of time fixing issues exactly like what you describe.

In conclusion, no. Bad. Please don't do any database work.
Re:Better NULL handling? by hawkbug · 2005-11-03 11:19 · Score: 2, Insightful

"You DON'T need 100 columns, ever. If you have that many columns you should be breaking the table up into several tables with 1:1 joins. Seriously. There will always be some fields that aren't needed. Make the rows smaller by putting commonly used fields in one table, rarely used rows in the other."

I completely disagree. Breaking up the questionaire into seperate tables does nothing. I want every question every time, it's all one HTML page with checkboxes and links to the questions. If they were seperate tables, I'd just have a lot of joins every time the page loads, which isn't going to help performance. In my example, there are no rarely used rows - all rows are used equally.

"And your example of a questionaire (1 row per answer, one column per question) is not even close to normalised. What happens if there is a new question? you have to alter the schema. what happens if some questions are not answered? you'd have to have nulls, and wasted space."

No, I never said one row per question - I said one row per person - big difference. Also, the default values for the bit columns would be 0, since the questions are not answered. When they get marked as answered, they are changed to 1. No wasted space and every column is relevant in my example. EVERY time I want EVERY column, no exceptions. Yes, if you added a question you'd add a column. And give it a default value of 0, and then run this simple query "Update Questionaire SET NewQuestion = 0" and then you have no nulls, no wasted space.
Re:Better NULL handling? by bshensky · 2005-11-03 11:28 · Score: 1

Oh yes you do....the real world says so. Sad, but true.

I have worked with large, multi-column data loads from third-party sources, loading these into a single "staging" table that matches the third-party source, /then/ break the data into a set of smaller 1:1 tables in the local app, where we have the most control.

In those instances, there really is no easier way than to create a massive multi-columnar table that matches the source. You then use the native data load utility (like Oracle's SQL*Loader) to get the file into a table. A "staging" table.

I agree with you in principle, but not in practice. Where I am, we've got GM, Ford and Chrysler shoving data-load specs at us that would make a grown man weep - that whole "spec-for-the-least-common-denominator" thing. Solutions for shtuff like this ain't pretty, but ugly problems sometimes call for ugly solutions.

--
Makin' money, makin' friends, makin' whoopee and wearin' Depends
Re:Better NULL handling? by rho · 2005-11-03 11:33 · Score: 4, Insightful

If they were seperate tables, I'd just have a lot of joins every time the page loads, which isn't going to help performance.
That is not a SQL or database issue, it's an issue with your scripting language and Web server. This would be cached.
You certainly can have keep the table at 100 rows. But at that point, you might ask yourself why you're using a database at all. A flatfile will probably have less overhead, even with file locking issues. Especially considering the simplistic questionaire you're using as an example--a long line of 0s and 1s would do you.

--
Potato chips are a by-yourself food.
Re:Better NULL handling? by boxxa · 2005-11-03 11:42 · Score: 2, Interesting

I think its better performance wise to have more rows than columns. I develop a lot of PHP/MySQL applications, comerically and OSS and I find that using multiple small tables with more rows seems to load and run smoother than tables with a large number of columns. Maybe something with the indexing but I have no actual proof of what works better. On the original topic, I believe that MySQL has its downfalls compared to other database systems but for me personally, I find that I can work around the problems with it to build highly deployable PHP/MySQL web applications very cheaply. Hey, when its free, I can deal with it.

--
Bryan
Re:Better NULL handling? by Otter · 2005-11-03 11:42 · Score: 1

As someone else has already explained, your approach is using the database as a giant Excel table, not as a database. You could do exactly the same thing with a text file and get your performance advantage that way.

--
What I'm listening to now on Pandora...
Re:Better NULL handling? by hawkbug · 2005-11-03 11:46 · Score: 1

Ok, people are completely misunderstanding what I was saying - I'm not talking about answers or responses on a quiz. I'm talking about a list of the questions, and whether or not they are completed on a main page. Simple as that. Your way is definitely elegant, but my way is quick and to the point - and my example, the questions don't change - and if it was a one time thing, they would never change. I can't stress enough, the quiz itself is not in the table I gave as an example, JUST the user, and the questions numbers - each question number can have a 1 or a 0 as a value. If I were building the actual quiz table, I would do it similar to how you suggested.
Re:Better NULL handling? by Dixie_Flatline · 2005-11-03 12:12 · Score: 1

That it's possible to work around the problem doesn't mean that the problem is invalid.

Yes, you can come up with a way that you can organize data and tables so that this problem is easier to solve, but there's certainly a way that the data and the language can be made so that you don't have to fiddle with that sort of crap.

And that said, someone may have an instance where, for whatever reason, the data absolutely should be laid out in a table that large. They shouldn't be punished for using the system in that manner if the system itself doesn't restrict them from doing something like that.

Languages and databases are our TOOLS. If we have to work around their deficiencies that much, they're pretty lousy tools.
Re:Better NULL handling? by Naikrovek · 2005-11-03 12:23 · Score: 2, Interesting

if you put 0's in unanswered fields, then you'll have an awful lot of zeros, and since you're using bits, that's a lot of 'no' answers, which isn't what's really there.

any DBA worth his weight in salt knows that you ever see a single table with 100 columns then you have a major design issue.

splitting that up into several tables in the same database will offer a significant performance increase.
Re:Better NULL handling? by HawkingMattress · 2005-11-03 12:58 · Score: 1

A much better structure is to have a table of questions, and a table of responses (with something like a response id, and maybe an identifier on who answered the questionare) and a question-answer table with each row pointing at a response and a question, and giving the answer that person gave for that question.
An answer has to point to a specific question and makes no sense with just an answerid and a userid, so you should have the questionid in the answers table, it's a nonsense to asssociate them in another table in this instance. It takes more space, and requires an useless join. Unless the questions are just choices the user can choose from, like in a poll.
Re:Better NULL handling? by Nuttles1 · 2005-11-03 13:00 · Score: 1

"Who in their right mind needs a table with 100 columns"

I will tell you one case. In BILLING! As a business rule we have to have a record of exactly what was billed and this can be handled in one of two ways or both. The first way is to write a record that has everything you could possibly need to bill (LOTS OF FIELDS). Lots of redundent data, yes, but the data is easy to get to when needed. The other way is some sort of auditing. We do both. Auditing may seem like a cleaner way to implement this business rule in the database but if you really need to know beyond a shodow of a doubt what a specific piece of billing information was at the time of billing it is easier to go to the exact record used to bill with. Of course IDs are stored in this record to link to lets say the current customers record that allow access to current data if needed.

my experience with SQL is purely academical

To this I would first like to say, I loved being in school and I love book learning. BUT, I would like to add that things are not always what they seem in school. First of all Third Normal form is not the end all, sometimes you need to denormalize and that is ok. An instance where 3rd normal form is could possible not be the best solution is with speed, joining a 'large' number of tables in school may be cool, but it may just take too much time when you have 100s or 1000s of users pounding away at a database with tables of 100s of thousands or even millions of records.
Re:Better NULL handling? by HawkingMattress · 2005-11-03 13:02 · Score: 1

Hum after reading the GP post again he does indeed seem to be referring to static yes/no responses, sorry about that :)
Re:Better NULL handling? by neuroxmurf · 2005-11-03 13:23 · Score: 1

Without regard to whether or not the grandparent's example is a good one, I have here a database table with over 150 columns (about 200 or so). It's a database of every census tract in the US (one per row) with all of the data about that tract that the census collected, including population in many different categories, area, location, and so forth. This is a perfectly reasonable case of a 100+-column database.

To claim that no such cases exist only proves that you only play with toy databases, instead of working with real ones.
Re:Better NULL handling? by HawkingMattress · 2005-11-03 13:31 · Score: 3, Insightful

The schema you're talking about is wrong, wrong, wrong. You've made a schema based on how you want to display the information, not based on what this information is carrying.
How can you make a request to find which user responded to more than 3 questions, for example ? you'd have to test each column individually, and then you'd have to modify the request if you add or remove some columns. Furthermore, you have to change the schema itself each time you add or remove a question.
And finally, you don't use the relationnal part of the database, at all. Your schema supposes that there is a question table, with a questionid. And in your anwsers table, you have columns named after the ids, right ? Now how can the database check if you delete a question that there are no answers pointing to it ? it can't, because you're using the database as if it was a flat file. So the database can't check data integrity for you, because your schema isn't normalized at all. as Johnno74 says, you need at least a question table, a response table, and an user/question/response association table. If you don't do that, the database has zero advantages over a flat file.
Re:Better NULL handling? by schon · 2005-11-03 13:42 · Score: 2, Informative

people are completely misunderstanding what I was saying

And you're completely misunderstanding what they're saying.

I'm talking about a list of the questions, and whether or not they are completed on a main page. Simple as that.

If you're just displaying a list of questions that's not gonna change, why are you using SQL at all? Why are you not simply using a flat text file?

my way is quick and to the point

And wrong.
Re:Better NULL handling? by period3 · 2005-11-03 14:04 · Score: 1

An arbitrary restriction on the number of 'columns', or a suggestion "you don't need 100 columns" doesn't make sense. If I have 99 'columns' that are functionally dependent on another 'column', then I need 100 'columns'. Anything else would be unnormalized.
Re:Better NULL handling? by Johnno74 · 2005-11-03 14:21 · Score: 4, Insightful

Yeah, I know what you mean, I work in the real world too. A lot of the time we have to work with some awful database structures, and we just have to grin and bear it.

Doesn't make it right tho... If you are given a clean sheet to design a system as the grandparent post suggested (questionare) and you put the results in one table with a row for each question, then you deserve to work with fucked-up systems for the rest of your life.
Re:Better NULL handling? by tzanger · 2005-11-03 14:22 · Score: 1

I will tell you one case. In BILLING! As a business rule we have to have a record of exactly what was billed and this can be handled in one of two ways or both. The first way is to write a record that has everything you could possibly need to bill (LOTS OF FIELDS). Lots of redundent data, yes, but the data is easy to get to when needed.

What's wrong with something like Order Number, Quantity, Part Number, Description, Price, Discount? That's 6 columns, 2 which would be copies from your global part list (Price and Description) but the copies are absolutely necessary since price and descriptions may change as the master list's updated. Your order table becomes huge in terms of rows, but that's not bad, and that's what things like views can help with. All your data's there, you have another order info table wiht order number, customer info, etc. and another table with shipping info, and another with billing info...

Your post sounds like you're doing this for a living, so I am certain I'm missing something crucial here. I have, however, done this too for a large manufacturing company. Please help enlighten me as to why a billing table would have hundreds of columns.
Re:Better NULL handling? by Johnno74 · 2005-11-03 14:29 · Score: 1

To claim that no such cases exist only proves that you only play with toy databases, instead of working with real ones.

Don't be so pompus. Recently I worked with a database containing all the census results for New Zealand. Ok, its not as large as the US, but its still not trivial.

The DBA who built & loaded the database from the raw CSV data we got from the department of stats really knew his shit. The database was structured with a table containing the questions, and a table containing the unit measures (each data point data was given for several ranges - for neighbourhoods, for suburbs/towns, for cities and also nationally.

The census results data was in a table that had a question ID, a measure ID, a year, and the figure.
I forget how big the db was in the end, about 1.5gb or something. The strucure made it really easy to write the queries that extracted data for several detail levels (neighbourhood, suburb, city, national) which was how the reporting app worked.
Re:Better NULL handling? by superpulpsicle · 2005-11-03 15:42 · Score: 1

One day there should be a race to see if the best group of programmers and mathematicians can come up with a way to query off a text file faster than the fastest relational DB.
Re:Better NULL handling? by Nuttles1 · 2005-11-03 16:10 · Score: 1

I am not saying hundreds of fields, but around 30 or 40. In my reply I considered this close enough to the hundred(s) that the original message referred too. Enough so that the table is largely used simply for a snapshot of how things were at the time of the billing, other tables are used for current information. I was replying to the question from the viewpoint of a programmer from the 'real' world as opposed to 'academia'.
Re:Better NULL handling? by hawkbug · 2005-11-03 16:11 · Score: 0

"splitting that up into several tables in the same database will offer a significant performance increase."

Again, what don't you understand?? It will NOT give a performance increase by splitting my example into multiple tables, absolutely no way. For the millionth time, I'm getting every column every time. Anybody who says different didn't read my other posts explaining what what I gave as an example.
Re:Better NULL handling? by hawkbug · 2005-11-03 16:18 · Score: 1

Yes, my example is wrong for what *you* think I should do with it, but NOT wrong for what I use it for. Again, I don't need the database to check the integrity of anything of me. I won't be adding columns. It's as simple as that. Back to the original point - when somebody says a table with 100 columns is done wrong, they aren't factoring in every possible use for one. Am I using this specific table like a flat file? Yes. There is no rule that says I can't. It doesn't mean the schema doesn't work for the application I apply it to. If I were doing more with this table, you would be correct - but I gave an example of a table with a sole purpose of displaying checkboxes either yes or no, nothing more, nothing less.
Re:Better NULL handling? by sheldon · 2005-11-03 17:44 · Score: 2, Informative

Breaking up the questionaire into seperate tables does nothing.

Well from my experience, breaking up the questionaire into seperate tables does accomplish making maintenance of the app simpler. Which depending on whether or not you want to be stuck supporting the same dumb app for the rest of your life or not, is important.

Now you auto generate your questionaire based on a Questions table... and as an added bonus when someone decides they really need to ask Question #101, they can input it through a maintenance screen.

Your way, in order to add a new question, you have to modify the application.

Granted, querying the results get's to be a bit more complicated. You have to "pivot" the data into a matrix. It's not terribly hard to do. You can do it in your app code, or using a stored procedure you can parse the data and send back one answer per column.

A lot of reporting tools have this kind of function built in, so it's not that hard to utilize.
Re:Better NULL handling? by mikkom · 2005-11-03 22:15 · Score: 2

You are absolutely wrong on this issue. Any experienced developer knows that you should not have 100 columns on table or use the kind of one-time solution you are implying.

--
My quality social news site.com.
Re:Better NULL handling? by Flaming+Foobar · 2005-11-03 22:53 · Score: 1

Your way is definitely elegant, but my way is quick and to the point - and my example, the questions don't change - and if it was a one time thing, they would never change.
And this just shows your lack of experience. Here are some of the lessons I've learned in the past few years working extensively with databases and web apps:
1. Things always change, even when everyone things they won't.
2. Things always change, even when everyone is 100 % certain they won't.
3. Things simply change. Companies get bought, you get bought, your web site grows, your web site changes, in Soviet Russia your website changes you.
4. Quick n' dirty hacks will haunt you until you fix them.
5. The longer you wait, the more difficult it will be to fix them quick n' dirty hacks.
6. You're married to whatever software you write. No matter if you wrote it 5 or 10 years ago, working for some obscure company in another state. Once your quick n' dirty hack needs fixing, your phone WILL ring.

--
while true;do echo -e -n "\033[s\n\033[u\134_\033[B";done
Re:Better NULL handling? by indifferent+children · 2005-11-04 00:46 · Score: 1

Aside from performance, look at maintainability. If you use rows instead of columns for your questions, you can add questions without having to change your table structure! You could even have multiple kinds of survey in the same tables, one survey with 8 true/false questions and one survey with 3 t/f plus 4 numeric, etc. Just have a survey_template_id field in the top-level table, next to the subject's name.

--
Censorship is telling a man he can't have a steak just because a baby can't chew it. --Mark Twain
Re:Better NULL handling? by indifferent+children · 2005-11-04 00:49 · Score: 1

Go ahead, violate First Normal Form.

--
Censorship is telling a man he can't have a steak just because a baby can't chew it. --Mark Twain
Re:Better NULL handling? by indifferent+children · 2005-11-04 00:56 · Score: 1

Even with the way that you describe the problem (as I understand your post), it would probably be better to use rows than columns so that you script could iterate through a result set in a FOR loop to build the page, and can iterate again to see if answers were entered. This is small, simple code that doesn't require you to know the format of the survey at the coding stage (like 100 field names!)

--
Censorship is telling a man he can't have a steak just because a baby can't chew it. --Mark Twain
Re:Better NULL handling? by indifferent+children · 2005-11-04 01:00 · Score: 1

That sounds like an absolutely horrible billing system. To add a new product to your catalog of offerings, you have to change the table structure, and probably the code that uses that table. Ick! This problem has been solved thousands (millions?) of times, and most of those solutions are better (at least from a maintenance standpoint) than the one that you describe. And no, I am not an academic and haven't been for 12 years.

--
Censorship is telling a man he can't have a steak just because a baby can't chew it. --Mark Twain
Re:Better NULL handling? by Anonymous Coward · 2005-11-04 01:06 · Score: 0
Many people (myself included) start of as programmers, and only later start to work with databases. After a while, the developer notices two things about databases:
1. Tables have a fixed set of columns; adding a column to a table involves changing any application code that accesses it.
2. There are lots of "lookup" tables that typically have a code and a description.
Being a natural problem solver, and knowing that code re-use is a "good thing", the developer thinks he can do better. Quite often, there is no one more experienced around to advise against, and so the developer implements his ideas; this is so common that both ideas have a name: the One True Lookup Table (OTLT) and the Entity-Attribute-Value (EAV) model.

One True Lookup Table (OTLT)

The idea: instead of having lots of "lookup" tables like these:

create table order_status (status_code varchar2(10), status_desc varchar2(40) );

create table country (country_code varchar2(3), country_name varchar2(30) );

create table priority (priority_no number(1), priority_desc varchar2(40) );

...

... why not just have ONE lookup table like this?:

create table lookup (lookup_type varchar2(10), lookup_code varchar2(20), lookup_desc varchar2(100) );

Great! Now we only need one "maintain lookup" screen instead of 3 (or 33 or whatever).

The trouble is, the developer doesn't consider the disadvantages of this. Most importantly, we can no longer use foreign key constraints to protect the integrity of the data - unless one of the following conditions is true:
- lookup_code is unique within table lookup
- every child table uses 2 columns referencing (lookup_type,lookup_code) for its foreign keys
In most cases, neither of the above applies, and responsibility for data integrity resides solely in the application code. Show me such a database, and I'll show you some data where the code does not match anything in the lookup table.

Entity-Attribute-Value (EAV)

The idea: instead of "hard-coding" columns into the table like this:

create table emp (empno integer primary key, ename varchar2(20), sal number, job varchar2(10));

insert into emp (empno, ename, sal, job) values (1234,'ANDREWS',1000,'CLERK');

... why not have total flexibility like this:

create table emp (empno integer primary key );

create table emp_value (empno references emp, code varchar2(20), value varchar2(100));

insert into emp (empno) values (1234);

insert into emp_values ('NAME','ANDREWS');

insert into emp_values ('SAL','1000');

insert into emp_values ('JOB','CLERK');

Great! Now we are free to invent new "attributes" at any time, without having to alter the table or the application!

However, consider a simple query: "show the names of all employees who are clerks and earn less than 2000".

With the standard emp table:

select ename from emp where job='CLERK' and sal < 2000;

With the EAV tables:

select ev1.name

from emp_values ev1, emp_values ev2 emp_values ev3

where ev1.code = 'NAME'

and ev1.empno = ev2.empno

and ev2.code = 'JOB'

and ev2.value = 'CLERK'

and ev1.empno = ev3.empno

and ev3.code = 'SAL'

and TO_NUMBER(ev3.value) < 2000;

Not only is that much harder to follow, it is likely to be much slower to process too. And this is about the most simple of queries!

Conclusion

OTLT and EAV are "generic" approaches that are seductive to programmers, but are actually a bad idea for most databases. Resist the temptation!
Re:Better NULL handling? by vrai · 2005-11-04 01:13 · Score: 1

Slight alteration ...
You are absolutely wrong on this issue. Anyone who isn't a complete and total retard knows that you should not have 100 columns on table or use the kind of one-time solution you are implying.
... you were being far too polite in the face of such stupidity.
Re:Better NULL handling? by Daniel+Boisvert · 2005-11-04 01:27 · Score: 1

For example, if you want all 100 columns from a table, 'SELECT *' works quite nicely. However, if you want all but 1 of the 100 columns, be prepared to spell out 99 column names.

If this is your main problem with SQL, then you have other problems as well. Who in their right mind needs a table with 100 columns? If you have 100 columns, you seriously need to normalize your database.

There's another problem here. Using 'SELECT *' is a really really really bad idea. The first reason that comes to mind is that many languages return recordsets with indexed columns, not named ones. This means that if some DBA you've never met sticks a column into the DB at the wrong spot, your application is now either crashing because of mismatched data types (I assume that if you use 'SELECT *' you don't bother to check these either) or displaying all kinds of ugliness.

Let's assume you are working with a language that indexes recordsets by fieldname instead of by numerical index. Now if a fieldname in the DB changes (not that this should happen), you have to touch your code wherever that field is used instead of making a one-line change to your SQL specifying the new name.

You also get the great performance benefits of retrieving all new fields added to the DB if they're added correctly to the end of the table. You know how fast your query was running before? Now let's also retrieve a 2MB BLOB with each record. Have fun figuring out why your application is suddenly performing like utter crap.
Re:Better NULL handling? by arkanes · 2005-11-04 01:38 · Score: 1

On the other hand, using rows instead of columns complicates any interesting data manipulation you're going to do on your web quiz signifigantly, because now you need to use subqueries and aggregate functions to get anything usefull out of it. You can reduce this somewhat by using a view to flatten the table before working with it, but it's still annoying.
Regardless, the OPs point stands - these are not normalization issues, they are database and application design styles. It is perfectly reasonable to have 100 column tables, and still have a highly (even perfectly) normalized table structure.
It's worth mentioning that these sort of dynamic issues (including your example of multiple types of quiz) is a real weakness of the relational model compared to OO design - this is essentially polymorphism which databases manage poorly if they do it at all, but OO excels at.
Re:Better NULL handling? by arkanes · 2005-11-04 01:42 · Score: 1

Any experienced developer knows not to make stupid-ass statements like that one. I can think of at least a half dozen circumstances why you might want such a table (and I've worked with tables with upwards of 50 columns, all normalized). You're making ridiculous and stupid claims, like there will never ever be a data system such that there can be 100 required, important, related pieces of information per item. And that's both stupid and wrong.
Re:Better NULL handling? by Anonymous Coward · 2005-11-04 03:47 · Score: 0

I see unemployment in your near future, if you actually work with DB's. You have no clue.
Re:Better NULL handling? by Nuttles1 · 2005-11-04 04:49 · Score: 1

I was talking about the parent post, not you in my academic comment. Also, I don't think you understand the 'billing' table I am talking about. The billing is a snap shot of what was billed, current data is viewable elsewhere. In my earlier comment I was stating that IDs are saved so links can be made if necessary to current data.
Re:Better NULL handling? by Anonymous Coward · 2005-11-04 04:52 · Score: 0

No, every fresh kid out there tries to over-normalize their data as the best ting eVar... and then when the expert comes by to do perf analysis they see the horrible number of joins that is sapping the query speed, and they optimize it by denormalizing.

Joins are not free. They take time, and without an index it's a table scan for each. With any appreciable data set that can be quite expensive. Now multiply that by 10 because you've over-normalized commonly used data into 10 different tables instead of having 100 columns.

You only need/want an index for columns you actually use regularly in perf-sensitive scnearios. Indexes aren't free, they take up space and slow down updates.

I'm not just flaming either, if you know of a good way to avoid the join costs and index maintenance costs in tables that contain hundreds of thousands to millions of rows, then I have a well-paying job for you.
Re:Better NULL handling? by Anonymous Coward · 2005-11-04 05:13 · Score: 0

Query for what?

Google matches your description. But are the results what you wanted?
Re:Better NULL handling? by Zathrus · 2005-11-04 05:15 · Score: 3, Informative

On the other hand, using rows instead of columns complicates any interesting data manipulation you're going to do on your web quiz signifigantly

No, it simplifies them. RDBMS's make it very easy to do things on a row basis, not so much on a column basis. It is utterly trivial to get "usefull" [sic] data out of a table structure that's properly designed, but nightmarish to get it if it's not. For instance, if you have 100 columns how do you answer the question "what percentage of respondants answered at least 80% of all questions?". You can't easily. And if you add or remove a question then you will have to touch a great deal more code than if you had implemented the table sanely in the first place.

he OPs point stands - these are not normalization issues

Yes they are! This is practically a poster child of normalization (or lack thereof). In fact it's an utterly trivial example of normalization that doesn't actually raise any difficult issues over speed, accessibility, etc. where you often run into problems with normalization.

is a real weakness of the relational model compared to OO design

There's a vast difference between relational databases and object oriented databases. I've yet to see anyone (including Oracle) do objects-in-relational decently. And SQL is inherently a relational paradigm -- you wouldn't want to use it for an OODB because it would just be inappropriate.
Re:Better NULL handling? by Zathrus · 2005-11-04 05:33 · Score: 1

Am I using this specific table like a flat file? Yes. There is no rule that says I can't.

Abusing the crap out of a design method does not make it a valid design. You even say you're using this table like a flat file -- so you've invalidated the entire purpose behind putting it into a table. Use a flat file. It'll be faster in every regard, use less space in every regard, and be easier to interface with.

Right tool, right job. You propose some shit like this in the real world, you deserve to be fired.
Re:Better NULL handling? by SnapShot · 2005-11-04 06:29 · Score: 1

This reminds me of an old Mad Magazine cartoon I saw in my misspent youth. I only remember vague details but it was somthing like this.

Panel One - the set up:
Guy on moped and a guy on a huge, powerful, expensive Harley-Davidson style motorcycle.

Guy on HD says, "You wanna race!".

Guy on Moped says, "Sure, but I get to pick the track.".

Panel Two - the punch line:
Guy on moped and guy on HD are lined up at the starting line. The race "track" is about 6" long and ends at a brick wall. The guy on the big motorcycle is looking concerned. The guy on the moped is smiling.

The moral.
Define the parameters of your contest before making any assumptions about the winner.

--
Waltz, nymph, for quick jigs vex Bud.
Re:Better NULL handling? by hawkbug · 2005-11-04 06:41 · Score: 1

Thank you - that is exactly my point, NOBODY can know every situation and saying that a table with 100 columns is poorly designed is just plain ignorant and wrong.
Re:Better NULL handling? by jag164 · 2005-11-04 12:56 · Score: 1

What school did you go to? I want to put it on my list of schools not to hire from.
Re:Better NULL handling? by mikkom · 2005-11-04 22:53 · Score: 1

Just go ahead and give us a good example where you really think there is a need for 100 column table - all previous examples here have been really, really bad examples how you can make your life harder in the long run.

--
My quality social news site.com.
Re:Better NULL handling? by Breakfast+Pants · 2005-11-05 00:33 · Score: 1

A Hammer is pretty deficient at screwing in screws. The guy should be using a flat file.

--

--

WHO ATE MY BREAKFAST PANTS?
Re:Better NULL handling? by Anonymous Coward · 2005-11-05 05:05 · Score: 0

> If you have 100 columns, you seriously need to normalize your database.

You don't sound like you do much database work. Life is much more complicated than a text book. For our ERP system we have more than a dozen tables with over 300 columns. Especially when keeping track of inventory you need to keep lots of data about each and every item. On any inventory system as a bare minimum you have to keep the SKU, UPC, reorder qty, qty on hand, length, width, height, weight, price, unit cost(for COGS), sales income account, cost of goods sold account, active flag, and description. There are many things in business where you have to keep more than 100 pieces of information about each item. The data is normalized and correct. It, like the real-world it models is just complicated.
Re:Better NULL handling? by jayminer · 2005-11-05 12:06 · Score: 1

Lesser and lesser people are showing respect to the first rule of 1st NF:
Every data field must be atomic.
Re:Better NULL handling? by cafeman · 2005-11-07 15:47 · Score: 1

You've also got the following problem (which there may be a solution for that I'm not aware of using SQL).

Assume you have a survey with 100 questions. Assume you gave the survey to 1000 people. Each response constitutes a single record. So, you've got 101 (one column for the respondent identifier) columns by 1000 records. With the 100 column (question) schema, you can query and return all the answers for 50 responses.

So, with one query, you can find out the average response for questions 1, 10, and 25 for a subgroup of respondents.

With the vertical schema, you've got 100 records for each response. So, you've got 3 columns (one respondent identifier, one question lookup, one question answer) by 100*1000 records.

While it's now easily extensible, AFAICS you can't do the example above. Say I want to get the average response for questions 1, 10, and 25 for a subgroup of respondents. With another language like SAS it's trivial to pivot the records into columns for viewing and analysis, but with SQL, I need to execute at least 3 separate queries.

It's possibly my inexperience with SQL. But, a real-world example I once worked with was basically similar to the above, but with around 1,200 different columns. The DB used the vertical structure, which made for great storage and single-variable reporting, but it made simultaneous analysis of multiple variables extremely hard.

--
This is your life, and it's ending one minute at a time.
Re:Better NULL handling? by klausboop · 2005-11-08 14:12 · Score: 1

>You DON'T need 100 columns, ever.

It is highly unlikely that you'll need 100 columns in a normalized database. In a data warehouse, where your fact tables are denormalized to increase access speed, it is not terribly uncommon to see > 100 columnn tables.

--
Some of you already have those cute little shirts on that say disco sucks, right? That's not all that sucks.-Frank Zappa

ugh... by Anonymous Coward · 2005-11-03 10:39 · Score: 5, Insightful

Did you know there have been people working on a general algebra for data management for, what 40 years now? Did you know, this is basically a SOLVED PROBLEM? Ever heard of "D"? Or Tutorial D? The Third Manifesto?

You know, I totally understand why Fabian Pascal is always pissed off.

Here's something for you to chew on:

Why do programmers write this:

A + 3

when they want to add 3 to A? Why do we not write some lovely crap like:

OPERATE ON A WITH 3 USING ADDITION

why do we write:

(A + 3) * 2

and not:

OPERATE ON (OPERATE ON A WITH 3 USING ADDITION) WITH 2 USING MULTIPLICATION

Why do we do that?? Because algebraic notation is 1) declarative .. it tells the computer what you want, it doesn't tell it how to do it and in what order, and 2) algebraic notation is *completely general*. You can nest arbitrarily with parentheses, and you can clearly see what's a variable and what's a value and what's an operator. Easy to create, understand, and *optimize*.

Do you agree with me that the verbose syntax clouds your thinking? Keeps you from seeing the underlying operations? Makes it difficult to apply the basic algebraic skills you learned in high school? Makes it difficult for the compiler writer to do the same?

Now I ask you, why do we write:

SELECT * FROM Order

and not

Order

Why do we write:

SELECT * FROM Order JOIN OrderItem WHERE Order.order_id = OrderItem.order_id

and not

Order JOIN OrderItem

And here you are, dwelling on some little detail about projecting columns.. this is an easy one: use an "ALL BUT" operator for example:

RelvarWith100Attributes ALL BUT (Attribute100)

Once you see that relational algebra is just values, variables, and operators nested in any arbitrary way, just like arithmetic, you have opened the door a little more to understanding the fundamental theory of data management and how backwards and primitive "modern" data management is.

And let's not even get into all the crap that SQL gives us like duplicate rows, NULLs, brain-dead table-oriented storage, lack of 100% updateable views, lack of arbitrary constraints, (often) lack of composite types (why the hell do we splat objects into MULTIPLE COLUMNS?? They should be stored in ONE column). SQL also confuses logical and physical layers (keys vs. indexes), and has basically kept the database industry in the dark ages for decades now.

So the answer to your question is pretty simple: I would ditch SQL and use something that looks like relational algebra, which has been understood and documented for a probably longer than you've been *alive*. No offense.

Re:ugh... by B'Trey · 2005-11-03 11:22 · Score: 1

I don't disagree with a thing you say. However, I have to ask you - what's more likely? That the entire world ditches SQL and moves to a better system? Or that a new, updated version of SQL includes fixes for some of its most heinous sins? I think the answer clearly is the latter, which leads back to the question - what should we do to fix SQL?

--
"The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.
Re:ugh... by Anonymous Coward · 2005-11-03 11:52 · Score: 0

Do you still use FORTRAN?

How do you explain XQuery? That new Microsoft LINQ crap?

Look, we work in an industry that changes buzzwords every few months. Do you *really* think it's impossible to change SQL to something else? Or more precisely, to introduce products that don't depend entirely on SQL?

How about all these people that write Object-SQL mappers and claim that the benefit is to let you "change databases" without affecting the app? (Of course, truly relational database wouldn't NEED any complex mapping layer). Won't they be able to switch?

All it takes is one smart guy to create a relational DB and hype it up, and people will flock to it. Look at Ruby on Rails. Any real innovation there? No, just the best practices we've been preaching for years, wrapped up with a ribbon of hype. Now somebody just needs to create a relational DB that uses Ruby syntax and Ruby types and Rails fans will think "wow, how INNOVATIVE". How "POST RELATIONAL"!! Of course next year, they'll flock to something else, but at least the product will *exist*.

To answer the specific question, you find where SQL is not relational, and you fix it! For instance, don't allow duplicates in tables. Store columns in such a way that multi-table joins are just as cheap as single-table queries. Get rid of NULLs and implement missing data using views on underlying null-free tables. Let people create composite types. Create adapters that let people write code and queries in their favorite programming language. Make natural joins the default for joins. Add some brains to SQL implementations so they can derive and deduce properties of your data and make views updateable. Make the syntax simpler (make some of it optional). Add some of the missing relational operators. It's ain't rocket science (but it *is* science).
Re:ugh... by rho · 2005-11-03 12:01 · Score: 4, Insightful

SELECT * FROM Order JOIN OrderItem WHERE Order.order_id = OrderItem.order_id and not Order JOIN OrderItem
Why do you assume Order and OrderItem will be joined on order_id? They don't have to, you know. So you have a "Order JOIN OrderItem ON order_id" format. Except, why do you assume both columns will have the same name? So you now have a "Order JOIN OrderItem ON order_id AND order_id_submitted" format.
And before long, you've got an equally baroque language for describing a query as SQL is now. SQL's point is to have something reasonably human-readable. It's amazingly flexible, and easy for beginners to pick up with simple queries. Your New and Improved query language had better be leaps and bounds ahead, not just simpler to type.
(Especially since probably 90% of queries are written only once, and stored for future use in a script or as a stored procedure. Building a query can be arduous for complex data, but it doesn't have to be typed every time you use it.)

--
Potato chips are a by-yourself food.
Re:ugh... by oyenstikker · 2005-11-03 12:12 · Score: 1

I found SQL very hard as a beginner. I learned "connect to mydatabase user oyenstikker using lousypassword". Fine. I sounds like English. So I remembered the thoughts and not the syntax. The 2nd, 3rd, 4th, 7th, 12th, and 18th time, I typed "open mydatabase user oyenstikker password lousypassword", "connect mydatabase as oyenstikker with lousypassword", or something like that. If the language is an abstraction, it is easier to remember the syntax.

--
The masses are the crack whores of religion.
Re:ugh... by benwb · 2005-11-03 12:19 · Score: 1

The relation between tables should be encoded in the database metadata via foreign key relationships, so in the simple and most common case you do not need to specify column names for joins.
Re:ugh... by Clover_Kicker · 2005-11-03 12:34 · Score: 1

> Look, we work in an industry that changes buzzwords every few
> months. Do you *really* think it's impossible to change SQL to
> something else?

Looks that way. How long has Codd been prophesying from the wilderness?

Has anyone ever implemented one of these relational calculus languages?
Re:ugh... by rho · 2005-11-03 12:51 · Score: 1

Agreed, but you must still take into account queries that have little or nothing to do with the foreign keys. SQL is flexible in that way--SQL's replacement must be flexible in the same way, and at that point, what benefit is SQL's replacement bringing?

--
Potato chips are a by-yourself food.
Re:ugh... by rho · 2005-11-03 12:54 · Score: 1

That's some strange SQL you've got there.

--
Potato chips are a by-yourself food.
Re:ugh... by AuMatar · 2005-11-03 12:58 · Score: 1

I know a lot of people who still do use Fortran. I know a lot more who still use C (30 years old) and C++ (20 years old). In fact, the last 2 are the majority of programmers. We work on either Unix (30 years old) or Windows (15 years old).

The thing with the buzzword of the month is- after its month is up, it dies. The stuff that works sticks around for a long, long time.

--
I still have more fans than freaks. WTF is wrong with you people?
Re:ugh... by lscoughlin · 2005-11-03 13:05 · Score: 2, Funny

That's cute.

Now start inserting and updating.

maybe joining.

yeah -- sql is stupid, sure. But you're proposal here really isn't any brighter.

--
Old truckers never die, they just get a new peterbilt
Re:ugh... by tzanger · 2005-11-03 14:26 · Score: 1

Well, for starters, nobody in their right mind uses SELECT * for any kind of production environment. :-)
Re:ugh... by Tablizer · 2005-11-03 17:44 · Score: 1

I don't disagree with a thing you say. However, I have to ask you - what's more likely? That the entire world ditches SQL and moves to a better system? Or that a new, updated version of SQL includes fixes for some of its most heinous sins?

A very good question. I generally think we need two query languages: one for the "masses" who only occasionally do queries, and one for power users that is more math-like, meta-enabled, and easier to factor out repetative structures. If you spend all day with databases and queries, then it makes a difference. SQL gets uuuuugly for complex stuff. It is run-on-sentence COBOL-like at that stage.

--
Table-ized A.I.
Re:ugh... by commanderfoxtrot · 2005-11-03 22:48 · Score: 1

Looks like you've been using DB2. However, once you're past the "connect to" bit, life improves.

Even a small child could understand SELECT toy FROM shop WHERE price 10.00 AND colour = 'blue'

--
http://blog.grcm.net/
Re:ugh... by Anonymous Coward · 2005-11-03 23:29 · Score: 0

Why do we not write some lovely crap like:

OPERATE ON A WITH 3 USING ADDITION

Because we don't like COBOL.
Re:ugh... by arkanes · 2005-11-04 01:54 · Score: 1

And if that were all anyone ever used SQL for, it'd be in good shape. However, real work involves big tables and big databases and awkward datasets (occasionally made worse by over-educated Coddites who've decided that they know how to store your data, and the fact that you need to transform the data *every single time* you need to work with it is of no importance compared to the Grail of Third Normal Form), and the triple-self joined aggregate hierarchal sub-queries you need to output data the way you want it are not quite as simple as SELECT toy FROM shop.
If one of the posters from earlier on in the thread got ahold of that, it'd be more like SELECT toy FROM shop WHERE EXISTS (SELECT colour FROM shop_colors JOIN shop) AND EXISTS (SELECT dollars FROM currencies JOIN shop_prices JOIN shop) anyway.
Re:ugh... by Clover_Kicker · 2005-11-04 05:34 · Score: 1

However, real work involves big tables and big databases and awkward datasets (occasionally made worse by over-educated Coddites who've decided that they know how to store your data, and the fact that you need to transform the data *every single time* you need to work with it is of no importance compared to the Grail of Third Normal Form), and the triple-self joined aggregate hierarchal sub-queries you need to output data the way you want it are not quite as simple as SELECT toy FROM shop.
Nice rant. 9.5/10, needs more profanity.
Re:ugh... by slapout · 2005-11-04 07:23 · Score: 1

Why do we not write some lovely crap like:

OPERATE ON A WITH 3 USING ADDITION

You've obviously never programmed in Cobol. :-)

--
Coder's Stone: The programming language quick ref for iPad
Re:ugh... by Anonymous Coward · 2005-11-04 09:23 · Score: 0

C99(6 years old), C++98(7 years old), SUSv3(4 years old), Windows(XP-4 years old), and I'll add in Java 5.0(1 year old).
Re:ugh... by Anonymous Coward · 2005-11-05 13:50 · Score: 0

try q, http://kx.com/q/d/q.htm
Re:ugh... by Breakfast+Pants · 2005-11-07 02:33 · Score: 1

That is the point, it is easy to understand (it reads like english), but hard to write (you can't just write any valid english).

--

--

WHO ATE MY BREAKFAST PANTS?
Re:ugh... by pxpt · 2005-11-07 05:48 · Score: 1

Has anyone ever implemented one of these relational calculus languages?
Well... yes.
There is an educational language called 'LEAP' that implements a relational algebra type of language: LEAP Overview
Re:ugh... by Clover_Kicker · 2005-11-07 06:09 · Score: 1

Thank you, that looks interesting. I see that's a minix-style teaching tool, i.e. intentionally limited in features and scope.

I read Codd and Date as an undergrad many years ago, complaining about the state of the art in commercial SQL products, and advocating their own style of query language. At the time I found their arguments interesting, but less than convincing since they hadn't bothered to come up with an implementation. All these years later, it makes Duke Nukem Forever look good.

sort by... by Atzanteol · 2005-11-03 10:40 · Score: 1

I'd *love* it if "group by *" (or some such syntax) would just group by all the items in the select in order... It's so bloody tedious to have to re-write each item being selected for the group clause.

--
"Ignorance more frequently begets confidence than does knowledge"

- Charles Darwin

Re:sort by... by usrusr · 2005-11-03 11:36 · Score: 1

you are asking for duplicate elimination as in relational algebra.

easily achieved in SQL by using SELECT DISTINCT instead of SELECT.

i guess what you really mean is "group by everything which is not explicitly in some form of aggregation in the select statement" which is a more special case, ironically one that fits exactly into the "* except x" pattern of the original poster.

--
[i have an opinion and i am not afraid to use it]
Re:sort by... by Atzanteol · 2005-11-03 14:40 · Score: 1

Only, a "select distinct" won't give me the "count(*)" (which I want).

select fielda, fieldb, fieldc, fieldd, count(*) from foo where status=1 group by fielda, fieldb, fieldc, field;

Just a personal gripe I know, but it seems redundant to tell it what to group by when 99% of the time it's the same order as the fields are selected by...

--
"Ignorance more frequently begets confidence than does knowledge"

- Charles Darwin
Re:sort by... by Sweep+The+Leg · 2005-11-04 03:33 · Score: 0

SELECT DISTINCT is often very slow. On a lot of systems it will bring back ALL the results, then filter them. This is often terribly slow vs. joining results out on a well designed database.
Re:sort by... by usrusr · 2005-11-04 17:38 · Score: 1

which is the reason why it is not default.

of course the grandparent did not mean distinct but that's what he described and using group by instead of distinct is not supposed to have different performance except for on the most moronic systems

--
[i have an opinion and i am not afraid to use it]

Um.... by djSpinMonkey · 2005-11-03 10:41 · Score: 1

Wouldn't it not make sense to....

I think a solid grasp of English syntax would be helpful before moving on to SQL.

To respond to his example, though, I doubt there are many situations where excluding a handful of columns would be particularly useful. Any query including that many fields is almost certainly taking place within a program of some sort, which should be able to ignore extra columns pretty trivially. I'm sure you could come up with some tortuously Goldberg-esque situations where such syntax might be nominally handy (I've thought of a couple myself), but I suspect on the whole it wouldn't be worth the extra complication to the parser.

Of course, that's not to say SQL is flawless. I just don't have any brilliant ideas off the top of my head to improve it that haven't already been implemented.

Standard stored procedure/trigger language by Kunta+Kinte · 2005-11-03 10:53 · Score: 1

Using stored procedures or triggers in a database is a sure way to tie yourself to a particular database platform. For appications that have to be cross-platform, I usually try very hard to avoid triggers and stored procedures.

It would be great if there was a common language for store procedures. The RDBMS may support many languages, but that scripting language would be available on most platforms since it was part of the SQL standard.

Another feature missing is optimization hints. Oracle uses special SQL comments to hint to the database how a query should be optimized. Other databases have their own syntax and methods. There should be a way to set those hints in standard SQL.

--
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW

Re:Standard stored procedure/trigger language by JohanV · 2005-11-03 22:27 · Score: 1

There is a common language for stored procedures / triggers. SQL Persistent Stored Modules (SQL/PSM) is defined in ISO/IEC 9075-4:2003 and is currently implemented by at least IBM DB2, MySQL 5 and Mimer SQL. An implementation effort for PostgreSQL / EnterpriseDB is under way.

If your database does not support SQL/PSM and is not working on implementing it you have only your choice of database to blame for that.

Be practical by Commander+Spock · 2005-11-03 10:59 · Score: 0

If there are 100 columns, but you only want to select 99 of them, just select * and ignore the unwanted column. It'll be faster anyway.

Re:Be practical by neura · 2005-11-03 11:52 · Score: 4, Informative

Actually, it'd be faster if you listed out every column name. If you're talking about faster to write out the code for, you're obviously not writing a query for a program that's intended to be used much. There's absolutely no reason you should be deploying code containing a query that does "select *" or anything like it. You're making the database do the work of looking up the list of columns names every time that query runs. There are much more useful things to spend your caching space on (if you have any).

If you really can't stand to write queries containing the actual column names, you should be using some type of abstraction layer in whatever language you're writing your code in.

If you're not writing code and just making queries by hand to test the results, then you're even further off your rocker. (this also applies in general to the statement you made) Why would you ever NOT want to select that last value out of 100? is it going to keep your output from wrapping? (lol)

Also, those of you saying that you should never have 100 columns in your table, you're certifiable lunatics as well. If you have 100 columns that are used in every record and have very little or no duplication per row, there is no reason you should break this up into multiple tables!!! Then the database has to do joins, which again require more processing power and disk usage. It's also hard to maintain multiple tables when you really have one table after you normalize it.

For those of you that say this isn't normalized... I'm not even really sure how to answer that.... If you have several tables all with a strict 1:1 relationship, they should be in ONE table. Anything else is considered denormalized, not yet normalized. (aside from being just plain BAD!)

For those of you that say you'd never need that many columns in one table or split across multiple tables, however you'd like to think the world should work. I have an example of just that. My wife does genetic research, primarily statistical analysis of sequence data (in various forms, but that's the easiest way to sum it up). We've had discussions on this particular topic, where she had been told by someone else that she would get better performance in Oracle if she split her one table into several tables containing a smaller number of columns, each.

This is just simply not true. It also is a perfect example of a situation where you would actually need a large number of columns. There were specific bits of data that needed to be looked up quickly (like, 45'ish). You can't store it all in one column (or even just a few) and use regexes to find the bits you're looking for. You also don't want to be doing a lot of joins unless you really need to, you know.. when you actually have data that would fit into some form of normalization. Technically, you CAN do this stuff, but not if you want decent performance. If you didn't want decent performance, you could just leave the data in a text file and shell out a grep command. *sigh*

Anyway, enough ranting, but seriously people... Get a clue. Get some experience with these issues. Don't just pipe up because "hey, I've worked with databases and while I probably don't understand them very well, I don't know anybody else that understands them at all, so I'm kind of an expert!"
Re:Be practical by Anonymous Coward · 2005-11-03 18:30 · Score: 0

Like any rule, claiming that you'll ever need a 100+ column table has its exceptions, just like you'd worry about having a 100+ field object in an algorithmic programming language. You can certainly always reduce any huge number of columns to a smaller number using normalization techniques, but it doesn't always make sense. I'm not surprised you came up with an example so easily, although I'll probably never need that many columns. In most applications, though, it's generally a good sign that you're doing something wrong, and should definitely raise alarm bells.

More to the point, if you really need to routinely select all but X columns from a table, you're definitely doing something wrong, so the original example of 99 columns out of 100 is still stupid, as you noted.
Re:Be practical by Luyseyal · 2005-11-04 03:17 · Score: 1

You're making the database do the work of looking up the list of columns names every time that query runs.

Are you sure about that? Because when you spell them out, the database still has to verify you haven't asked for something it doesn't have. Obviously it needs the names in a SELECT *, but it doesn't need to error check them. Also, with databases that keep track of the version number of the table, I doubt they bother flushing the cache on "SELECT *" unless the schema changes. I mean, what would be the point since it knows if it's changed in advance?
I don't really know as I've never written a database or gone into optimization that far, so I'm asking you to elaborate for me, if you don't mind.
I agree with the 100 cols thing, although once you have that crap in there and you have users who are used to that model of thinking in writing their own reports, it's hard to get people to think otherwise. E.g., recently I had a bitch of a time convincing another employee that they should just use the stored proc I wrote directly rather than having us run the damn thing in a cron job every night to populate a column. But that's just one battle I won.
It's a crazy world out there and sometimes you just have to bite the bullet and add that 101st col while holding your nose!
-l

--
Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!

Replace it by Lisp by RAMMS+EIN · 2005-11-03 11:06 · Score: 2, Interesting

I'd replace it by a special-purpose Lisp, and compose it like s-expressions. Mix and match query elements in a flexible manner, yet never run the risk of injections, because it all happens in a structured way. I've done things like this on a small scale (contact information database), and it works really nicely.

--
Please correct me if I got my facts wrong.

Re:Replace it by Lisp by davez0r · 2005-11-03 12:46 · Score: 1

i seem to remember reading about how the guys that do orbitz don't use a database, they use a GINORMOUS custom lisp data access shizzle
Re:Replace it by Lisp by Anonymous Coward · 2005-11-03 20:59 · Score: 0

Shizzle? Please, no technical terms.
Re:Replace it by Lisp by arkanes · 2005-11-04 02:29 · Score: 1

Heh, talk about Greenspunning....
In the specific case mentioned, though, it seems like a reasonable solution - the type of queries they are doing are very hard (not NP-complete), with severe time constraints, and the total data set is small enough to fit within physical ram (although those were very high end machines in 2001, when it was written). SQL and standard database access won't support those kind of limits, so writing your own data acess code makes sense in this case. Note that what they're doing isn't relation data manipultion (heresy, cry the relational gurus! All data is relational, if you look hard enough!) per se, and they'd only use a database in this case as heavyweight indexing engine.

Native Queries / DLING by Carl+Rosenberger · 2005-11-03 11:07 · Score: 1

In case you are using an object-oriented programming language, you may ask yourself if SQL really is a good choice. SQL is not object-oriented, it's use is usually neither typesafe nor compile-time checked in OO languages, it can't be refactored from an object-oriented IDE and it does not follow OO principles.

Food for thought:

Native Queries
http://www.db4o.com/about/productinformation/white papers/

DLINQ
http://msdn.microsoft.com/netframework/future/linq /

--
db4o - open source object database for Java and .NET

Re:Native Queries / DLING by Tablizer · 2005-11-03 17:17 · Score: 2, Funny

SQL is not object-oriented, it's use is usually neither typesafe nor compile-time checked

This sounds like the fuel for a massive OO-vs-Relational and/or Dynamic-vs-Static typing holy war geek battle. Been in them and seen them rage for 1000+ messages. Batten down the hatches and hide the women and children.

--
Table-ized A.I.
Re:Native Queries / DLING by teknomage1 · 2005-11-03 20:02 · Score: 1

A) You shouldn't be refactoring the database anyway. B) It is typesafe, it's garunteed to be of type 'row' or 'row_set'.

--
Stop intellectual property from infringing on me

SQL is only 1/2 the story by klui · 2005-11-03 11:08 · Score: 2, Interesting

No standard way to extract/load data. No standard way to get all tables in a database. Basically DDL is entirely separate and each type of database has its own way of doing things. Let's not talk about embedded SQL and optimizing queries (like Oracle hints.. *ugh*).

Re:SQL is only 1/2 the story by JohanV · 2005-11-03 22:33 · Score: 1

No standard way to get all tables in a database.
Yes there is. As defined in ISO/IEC 9075-11:2003 you can use: SELECT * FROM INFORMATION_SCHEMA.TABLES

This currently works in at least MS SQL Server, MySQL, PostgreSQL and Mimer SQL. This is not an omission in the SQL standard, only in certain implementations. Whine at the vendor of those implementations if you want that to change.
Re:SQL is only 1/2 the story by klui · 2005-11-04 03:35 · Score: 1

Unfortunately, a defined standard that's not used by everyone doesn't really make it good one. I'm currently using Oracle 9i.
Re:SQL is only 1/2 the story by Anonymous Coward · 2005-11-04 04:56 · Score: 0

Maybe you should upgrade to a real database?

Dates by omibus · 2005-11-03 11:30 · Score: 3, Interesting

1. Standard date functions and handling.
2. Allow for SELECT statement reordering. I should be able to have the FROM first. This would be a BIG help to SQL editors!
3. Column aliases. So if I have a column in the select that is ColA+ColB as "My Value", I can use the "My Value" in the WHERE, GROUP BY, and ORDER BY instead of having to restate the equation every time.

--
Bad User. No biscuit!

Re:Dates by fishybell · 2005-11-04 08:19 · Score: 1

re 3:
Create a view for that table.
The view would have a field "My Value" that return "ColA" || "+" || "ColB". You can even index "My Value" if you want.

--
><));>
Re:Dates by pesc · 2005-11-04 23:29 · Score: 1

1. Standard date functions and handling.

Actually, the SQL standard includes dates, intervals and functions to handle these types. Unfortunately, most SQL implementations use their own proprietary stuff. You might want to rephrase your wish to:

1. I want my SQL product to support the standard date functions.

Talk to your vendor...

--

)9TSS
Re:Dates by YomikoReadman · 2005-11-07 01:31 · Score: 1

Depending on your implementation of SQL, you don't even have to do that much. Oracle supports giving the columns an alias in the from clause, and you can use that alias all the way through the SQL statment.

--
I have no regrets, this is the only path.
My whole life has been "UNLIMITED BLADE WORKS"

UPDATE, INSERT look different by yamla · 2005-11-03 11:33 · Score: 2, Informative

It always bugs me that you write UPDATE and INSERT totally differently. I'd much rather see them essentially the same (update obviously would need a WHERE clause). Not a big deal, I admit, but it makes my life harder.

Fundamentally, though, what we need is much better interfaces to our applications. Having to convert C++ data into some format appropriate for SQL and back again is a pain in the behind. Every existing interface between the two that I've seen is crap. Heck, most of the C++ interfaces don't even let you use the std::string type. What are we, back in 1995? Forget vectors and maps, or the fancy boost multiindexed templates. Anyway, these really aren't problems for SQL to solve. And yes, there are object-oriented databases and the like. But relational databases are still pretty much universal. I just mention it because it is a pain in the behind.

--

Oceania has always been at war with Eastasia.

Re:UPDATE, INSERT look different by thebatlab · 2005-11-03 14:29 · Score: 1

I have to tip my hat to that suggestion. It's never made sense. I don't do any intensive database work but the stuff I do, it's annoying to have to write out the insert/update queries differently in the same function.

Better handling of relationships between tables? by Trepalium · 2005-11-03 11:34 · Score: 1

I'm no SQL expert, but it has always bothered me that you have to enter your relationships between tables twice -- once in the database construction itself via foreign keys, and again in each and every query that will make use of that relationship. Why can't I write: SELECT prefs.val FROM login,prefs WHERE login.username="myid" AND prefs.type="name" instead of SELECT prefs.val FROM login,prefs WHERE login.username="myid" AND prefs.type="name" AND prefs.userid=login.id (or the equiv if using JOINs instead) if I have the appropriate keys set up? Wouldn't that be an easier query to optimize? Wouldn't that create fewer resulting bugs?

Perhaps I'm ignorant to some important detail of databases, but I feel that it would be much easier to work with them if the server could represent the relationships between the tables in some meaningful way.

--
I used up all my sick days, so I'm calling in dead.

Infuriating by TTK+Ciar · 2005-11-03 11:38 · Score: 3, Insightful

Out of all the annoying issues, I've pulled out the most hair over unique id's, and INSERT vs UPDATE.

Most SQL implementations give you some way of assigning unique id's to newly INSERT'ed rows. It would be nice if there were a standardized way, but that's a side issue. Once rows have unique id's, you can identify rows to be updated by id. This is very fast and simple.

Except .. in order to find out what id the DBMS has assigned a row, I usually have to follow my INSERT with a SELECT, to read the id column. Slow and annoying. Sometimes the DBMS I am working with takes a few seconds to perform the INSERT, and ten minutes to perform the SELECT. That takes it beyond an optimization issue, and into a workability issue.

Also, if I do not yet know if a data record has been INSERT'ed, and I need to either UPDATE the existing record or INSERT a new one (say, with just a new timestamp), then I need to either attempt an UPDATE and then fall back on INSERT if the UPDATE fails (ew!) or attempt a SELECT and either INSERT or UPDATE depending on whether it returned any rows (ew!).

If SQL came up with a standardized way to associate unique id's with newly INSERT'ed rows, it would be very, very nice if the id column(s) assigned were returned to the client in the same packet as the message confirming that the INSERT succeeded. Nearly zero additional overhead, neat, fast, and easy.

To solve the UPDATE/INSERT issue, I'm less sure. Say, for instance, that I have a daemon which periodically scans the filesystems in a cluster of machines, and it wants to UPDATE the "exists" column of a given row identified by a ( hostname, mountpoint, path, filename) tuple with the current time, if that row already exists, or INSERT a whole new row for that file if it does not exist. Perhaps there could be a "WRITE" command which is just like INSERT but overwrites a row if it already exists? That seems like the wrong solution, too. In the meantime, I play with caches of hashes to unique id's and lose more hair.

-- TTK

Re:Infuriating by Anonymous Coward · 2005-11-03 12:05 · Score: 0

In Oracle Database, you have MERGE statement (INSERT OR UPDATE if the primary key exist)
Re:Infuriating by Forbman · 2005-11-03 12:28 · Score: 2, Informative

Well, of course it all depends on the database. For those DBs that use autogenerated field types (i.e., SERIAL in PostGres, AUTOINC attribute in many others) getting the generated ID for the record you inserted is...problematic at best. Your point is valid.

For other DBs that can use triggers and sequence generators (Oracle, PostGres, Interbase/Firebird), it can be a bit easier, as these DBs have ways to query the sequence generator for its current or next value (as long as you're still in the same transaction scope). In the case of Oracle, getting the Next value off of a sequence increments the sequence, so as long as you hold onto that value, it's going to be unique. No more silly "select max(id) from my table" queries after you insert a record...

In both cases, though, it generally requires some way that can peek at the record being added, in a state where it's "added" to the table but before it's locked down. Generally the most expedient method to do this is with a stored proc/function that has the actual INSERT statement in it, but returns the autogen'd field value, however you can get it, and return it to the layer you called the proc from.

It all depends on your database and your database access layer (e.g., ADO, ODBC, OleDB, direct-to-driver, etc).

MIDAS (database caching layer from Borland 5 Enterprise and later) really does this well. There are other similar products as well.
Re:Infuriating by erotic+piebald · 2005-11-03 13:47 · Score: 2, Informative

Maybe only Oracle does this?:

insert into tab (col1, col2, col3, ...) values (v1, v2, v3, ...)
returning expr1, expr2, ...
into var1, var2, ...

or

update tab
set col1=v1, col2=v2, col3=v3, ...
returning expr1, expr2, ...
into var1, var2, ...
Re:Infuriating by schon · 2005-11-03 14:07 · Score: 1

For those DBs that use autogenerated field types (i.e., SERIAL in PostGres, AUTOINC attribute in many others) getting the generated ID for the record you inserted is...problematic at best.

For Postgres, this is patently false. You can get the value of a sequence (serial) either before or after your INSERT.
Re:Infuriating by Evro · 2005-11-03 14:42 · Score: 1

In Postgres getting the "next value" is easy, you just get it before you insert it.

SELECT nextval('somesequence') as next_value; - usually takes something on the order of 0.0001 seconds

INSERT INTO table (index, something) VALUES (next_value, 'something'); - normal insert

--
rooooar
Re:Infuriating by eric2hill · 2005-11-03 16:56 · Score: 1

If I had mod points, you'd get 'em. I've been using this technique for several years now and it works flawlessly.

--
LOAD "SIG",8,1
LOADING...
READY.
RUN
Re:Infuriating by Anonymous Coward · 2005-11-03 18:15 · Score: 0

The UPDATE/INSERT thing is pretty annoying, I'll admit. There's a good reason for it, mainly dealing with implementation efficiency, but having a standardized, third form of the statement (that many DBs implement in their proprietary way) that would just let me write a row whether or not it currently exists would be handy.

As for the automatic serial numbering, though, I find it easiest to just not rely on the DB-based sequential numbering, but do it in my application instead. For one thing, serial columns aren't exactly standardized. For another thing, I like to reuse serial numbers when I delete records (so I can use smaller serial number widths when there's a lot of update/insert/delete traffic; admittedly, my usage is atypical), and monotonically incremental numbering doesn't allow that.

The part of my application that talks directly to the database is a long-lived server process, and maintains all the index numbers in a bitmap. The standard bitmap algorithms are used to find, allocate, and deallocate the identifiers. It's relatively efficient and it serves my need for reusable identifiers, although it requires careful coding and thinking about concurrency and error recovery issues.

Anyway, it's not for everyone. But a standard way to get around the problem you're talking about is to put the sequence number in its own table. Rather than selecting on your main table to find the sequence number you just added (potentially expensive), just read the value from your sequence number table, increment, and then use that number in your main insert. Something like:

SELECT seq_num FROM seq_table UPDATE seq_table SET seq_num = INSERT INTO data_table VALUES (, ...)

Using transactions with the appropriate isolation level, you can be assured to read out and increment a unique ID. This is essentially how some databases (like PostgreSQL) internally implements serial numbering anyway.

You can even do this without transactions if you're using something braindead like MyISAM tables, since each SQL statement is atomic, although then you have to worry about race conditions and repeatedly doing SELECT/UPDATE/SELECT.
Re:Infuriating by Electrum · 2005-11-03 19:46 · Score: 1

Except .. in order to find out what id the DBMS has assigned a row, I usually have to follow my INSERT with a SELECT, to read the id column. Slow and annoying.

What database requires this? Every database I've ever used has an efficient, documented method of getting the automatically generated ID.

To solve the UPDATE/INSERT issue, I'm less sure.

MySQL has this:

INSERT INTO table (...) VALUES (...) ON DUPLICATE KEY UPDATE column=...

It also lets you use UPDATE syntax for INSERTs:

INSERT INTO table SET name=...

In general, a good solution is to try an INSERT, then UPDATE if it fails due to a key constraint.
Re:Infuriating by moro_666 · 2005-11-03 22:00 · Score: 1

the link points to a few rows that are very very very incompatible with the expression threading and multiple processes :)

however if you do some locking and synchronization, it can be used .... it would look a bit better if it could work in 1 transaction so no locking would really be needed like this :

>begin
>allocate_id -> x
>insert (x,'fishy')
>commit

if commit won't come, the x value will be released again, if a another process between this processes allocate&insert comes by, it just gets the next value of the serial ...

on the other hand ofcourse, you can raise the value of the allocated serial each time you ask for it, but sometimes you really dont want it to be raised, just check what it would be ...

--

I'd tell you the chances of this story being a dupe, but you wouldn't like it.
Re:Infuriating by Daniel+Boisvert · 2005-11-04 01:12 · Score: 1

No more silly "select max(id) from my table" queries after you insert a record...

You don't actually do that, do you?

Please tell me you really query for the the exact data you just inserted with the addition of the id field, and aren't just crossing your fingers and hoping that somebody else didn't stick data into the table after your insert and before your select..
Re:Infuriating by Sigma+7 · 2005-11-04 02:37 · Score: 1

Well, of course it all depends on the database. For those DBs that use autogenerated field types (i.e., SERIAL in PostGres, AUTOINC attribute in many others) getting the generated ID for the record you inserted is...problematic at best.

Microsoft Visual Studio generated the equivalent of the following for a connection to an MSDE server:

INSERT INTO table (Value) VALUES (@Value); SELECT ID, Value FROM table WHERE (Id = @@IDENTITY)

It may be non-standard, but it's hardly problematic.

The only other language that is plagues with inability to do normally expected stuff would be C (itoa is implemented in the Windows library, but not in GLibC.) But if it weren't for C, we'd be programming in BASI, PASAL, and OBOL.
Re:Infuriating by brlewis · 2005-11-04 06:03 · Score: 1

the link points to a few rows that are very very very incompatible with the expression threading and multiple processes :)
What are you talking about? The value of nextval() will be different each call, and currval() is session-specific, so there are no race conditions.
Re:Infuriating by Nevyn · 2005-11-04 07:22 · Score: 1

What database requires this? Every database I've ever used has an efficient, documented method of getting the automatically generated ID.

Yeh, I was pretty surprised when I read a blog entry recently about this "feature" of Oracle. From what I could see it's basically identical to the PostgreSQL behaviour ... but you can't link to seq.next directly to the column and then select on seq.last_value.

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:Infuriating by Electrum · 2005-11-04 11:27 · Score: 1

but you can't link to seq.next directly to the column and then select on seq.last_value

Actually, you can. Create a trigger, then select seq.last_value. It's trivial to make a database wrapper that supports MySQL and Oracle using this approach. Have a function in your wrapper that returns the last inserted ID and pass it the table name (ignored in the MySQL implementation). Name all of your sequences the same as the table name with _seq appended.
Re:Infuriating by cortana · 2005-11-04 16:12 · Score: 1

I sense a dailywtf.com post coming... :)
Re:Infuriating by cortana · 2005-11-04 16:14 · Score: 1

The only other language that is plagues with inability to do normally expected stuff would be C (itoa is implemented in the Windows library, but not in GLibC.)
Use sprintf(3), surely?
Re:Infuriating by Sigma+7 · 2005-11-05 15:32 · Score: 1

Use sprintf(3), surely?

The instant you want to do something outside of base 8, 10, or 16, sprintf won't be sufficient. Also, sprintf is designed to format strings, which results in a bit more overhead (parsing the format specifier, and then creating the string.)

While it is true that I don't need itoa directly in day-to-day use, being as fast as possible is necessary in contests such as UVA, SPOJ, and ZJU - you still need to squeeze as much performance as possible your solutions, as optimizing compilers are not in common use on those sites.
Re:Infuriating by Nevyn · 2005-11-06 10:08 · Score: 1

See the link I posted, AIUI you can't use a trigger if you might ever need to put a value in manually ... whereas this "just works" in PostgreSQL (and AIUI in MySQL), by when you insert without a seq you get a new one and when you insert with a seq it uses that.

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re:Infuriating by Electrum · 2005-11-06 15:26 · Score: 1

See the link I posted, AIUI you can't use a trigger if you might ever need to put a value in manually

Are you sure? I don't have Oracle handy to test, but I am fairly certain the trigger could be written to replace the value only if it is not NULL.

wild card/regexs by i.r.id10t · 2005-11-03 11:43 · Score: 1

My complaint about it when I first started learning sql (ms flavor, only way offered at school at the time) and got to wildcards, where someone writing the spec chose % instead of * and so on. To top it off, I use sql so infrequently (months between writing it, then only a few statements) that this difference bites me in the butt just about every time I try to use a wildcard.

--
Don't blame me, I voted for Kodos

Re:Whats the problem? by profet · 2005-11-03 12:01 · Score: 0

oops... Columns != Rows... my mistake.

Babye karma..

Re:Better handling of relationships between tables by mooingyak · 2005-11-03 12:03 · Score: 1

I think the basic problem with that is what if you got two tables, let's say, logins and projects.

logins had an id and a name

projects has, among other things, submit by and coded by, both of which reference the logins table.

So if you do default joining, which field should it join by?
If you limit the default join to when there's one and only one reference to a given table, then existing code may break once you add a new reference.
And lastly, what if you really want a cartesian join? Granted I've needed this about twice out of a very large number of queries I've written, but it my usage doesn't necessarily reflect everyone else's.

--
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.

Winning the special olumpics and debating an AC... by RingDev · 2005-11-03 12:23 · Score: 4, Insightful

Lets look at something a little more realist: SELECT Lease.LeaseNum, Lease.LesseeNum, Invoice.InvoiceNum, Invoice.AmountBilled FROM Lease INNER JOIN Invoice ON Lease.LeaseNum = Invoice.InvoiceNum WHERE Lease.LeaseNum = "1234" ORDER BY LeaseNum, InvoiceNum Okay, that's pretty big to get some basic lease and invoice info. Now how you you write that? Lease.LeaseNum, Lease.LesseeNum, Invoic e.InvoiceNum, Invoice.AmountBilled Lease JOIN Invoice ON Lease.LeaseNum AND Invoice.LeaseNum Lease.LeaseNum = "1234" Lease.LeaseNum Invoice.InvoiceNum ??? All that's been accomplished is the removal of key words. I'm not seeing any benefit, and I'm seeing the pitfall of it being hard as hell to read. -Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

Java is your friend by lucm · 2005-11-03 12:28 · Score: 1, Interesting

In the world of J2EE you can use container-managed persistence on entity beans (a flavor of EJB). Such object will handle the interaction with the database for you so you don't have to worry about SQL. And it will also address transaction issues so no need to fiddle with manual rollbacks.

With CMP, instead of fooling around with SQL queries you deal with business objects. Why bother with "update inventory" or "insert into cashRegister" when you can call methods on objects, like inventoryItem.substract() or cashRegister.feed() ? This provides you with a good layer between the application and the database, so whenever one has to change you don't have to mess with both. All you got to do to have all this magic is to keep some XML configuration up to date, and if you have the big bucks you can even get JBuilder to do it for you!

SQL is sooo yesterday. Stop wasting time on technical issues: add business value to your applications instead. And with JBoss you don't even have to pay for a good EJB container. (Of course you'll need a good CPU and a little more RAM, but hey, magic has a price!).

--
lucm, indeed.

Re:Java is your friend by Dibblah · 2005-11-03 13:32 · Score: 1

You don't need a database in any of your examples. Anyone who's authored a proper application with a database behind it SHOULD be thinking in SQL, so that the database can do it's job. Otherwise, you're going to be reimplementing stuff that's already available at a lower level. Badly.
Re:Java is your friend by lucm · 2005-11-03 13:50 · Score: 1

Anyone who's authored a proper application with a database behind it SHOULD be thinking in SQL
This is a wrong statement. Data persistence does not have to be bound to the storage technology. Please take time to read about a technology before making so bold statements.
Have a look here.

--
lucm, indeed.
Re:Java is your friend by mr_rattles · 2005-11-03 15:05 · Score: 1

How well does this CMP work when you're pulling from a table with 150 million records? What if I want data from this table for each of my 10,000 customers? Do I have to hit the table 10,000 different times or can the container do joins in a smart enough way to get the data efficiently and reliably?

--
Erik http://yakko.cs.wmich.edu/~rattles
Re:Java is your friend by lucm · 2005-11-03 16:15 · Score: 1

A lot of techies think that SQL skills are necessary to get the best out of an application. This might be true on small projects with little coding required, but for big systems it is safer to rely on a good data model, a good XML mapping and a good EJB server that will make an optimal use of such mapping. See Amazon.com, which is a textbook example of J2EE technologies performance on large-scale systems.
You'd rather code yourself all the interactions with the storage media instead of trusting a technology that has been specifically designed to understand a data structure and make the best use of it? Then how can you trust a RDBMS to handle the interaction with the filesystem? And then, how could you trust the o/s to deal with the hardware? At some point you have to rely on the software.
I know, it's difficult to let go those lovely queries. But soon SQL will be deprecated, just like COBOL, RPG, Clipper, Fortran, QBasic and others. Either you take the new wave, or you become a "legacy expert" and wish hard that there will be another Y2K-ish scam someday.

--
lucm, indeed.
Re:Java is your friend by dshannon · 2005-11-04 00:20 · Score: 1

Now I'm really sad... EJB's? They're awful, for so many reasons.
Now Hibernate, especially with Spring transaction management is much easier to deal with for most cases. And it works very simply, and there's even a native SQL-like object query language called HQL to help. And it writes very efficient SQL queries (I've done a lot of investigation on this on objects backed by tables with tens of millions of rows) to the back end, is entirely DBMS-agnostic, and is apparently the basis for many of EJB 3.0's enhancements over 2.0.
Most J2EE devs don't get much choice on this stuff, but if you do, learn these 2 frameworks. They're even Open Source (and free-as-in-beer) so you can hardly go wrong, really.
HTH
Dan
Re:Java is your friend by lucm · 2005-11-04 01:27 · Score: 1

EJB's? They're awful, for so many reasons.
That's a very interesting point. It's just bad you give none of them reasons.

--
lucm, indeed.
Re:Java is your friend by brlewis · 2005-11-04 06:32 · Score: 1

I hope you're right that a lot of techies realize SQL skills are necessary. Got a pointer to any info about Amazon's use of EJB? I'll let go of my lovely queries when it's no longer true that they can be coded quicker and run faster than any of these "abstraction layers" designed to cram the relational model into a limited object-oriented model.

Re:Winning the special olumpics and debating an AC by Anonymous Coward · 2005-11-03 12:38 · Score: 3, Interesting

You're missing the point. You really have to study the theory, and you'd get something like this (the exact syntax is unimportant of course):

((Lease JOIN Invoice) WHERE LeaseNum = "1234")
[LeaseNum, LesseeNum, InvoiceNum, AmountBilled]
ORDER BY whatever

Why do I put the column names at the end? Because a projection operation applies to a *single relational result*, not to individual tables.

Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.

Why do I leave off the Theta from the join (the equality test)? Because I *already* set the foreign keys on those tables. The DBMS should be able to *deduce* which columns to join on and generate an error if it can't.

Here's what's happening:

I used the JOIN operator on two base relation values, stored in variables with the names Lease and Invoice. I got a third (anonymous) relation with all of the combined columns.

I then applied the RESTRICT ("where") operator on that relation, along with a boolean expression, and I got a fourth relation with just the rows where that expression was true.

I then PROJECTED that relation to get a fifth relation with just the desired columns.

Note that ORDER BY is a non-relational operator. It turns the relational result (unordered set) into an an ordered array. So for a final step, the relation was 1) turned into a regular array and 2) ordered.

Of course I didn't actually perform those steps. I told the DBMS what I wanted, and *it* did the work.

Do you see how the SQL conceals the underlying algebra? And how it makes YOU do the work (in the join for instance)? You might not see it. Study the theory more, and you will. Compare with the A+3 example. Imagine a 12-table join. Imagine having to do a query like "where all rows of table X are a subset of all rows in table Y" right in the middle. An algebraic notation would make this MUCH easier. Just break it down, and apply the next operator to the result of the last. You do it all the time when programming your favorite language, why not in SQL?

And yes, the differences get MUCH deeper than this. SQL can't even represent all possible relational queries!

Yeah, INSERT/UPDATE sucks by MobyDisk · 2005-11-03 12:43 · Score: 3, Insightful

I have yet to encounter a DBMS that didn't have an efficient, straightforward way to get the ID after an insert, but YMMV. However, the INSERT/UPDATE issue is a fundamental syntactical problem and it really should be fixed. INSERT and UPDATE do almost the same thing, yet have completely different syntax.

INSERT INTO someTable (fld1,fld2) VALUES ("foo","bar")
UPDATE someTable SET fld1=foo, fld2=bar

It is REALLY annoying when you have to write some code that generates a SQL statement because you must code for two completely different syntaxes. Someone replied about the Oracle MERGE which seems like a nice way to go.

Fortunately though, there are lots of good frameworks around SQL that make it so that writing SQL is becoming a thing of the past. I would like to see SQL treated like HTTP - nobody writes HTTP. It's a protocol. Let it be. Just use the tools. I guess it will never go away though...

Re:Yeah, INSERT/UPDATE sucks by Zathrus · 2005-11-04 05:49 · Score: 1

Oracle MERGE which seems like a nice way to go.

Oracle's MERGE INTO construct is a nice thing for certain cases, but it certainly doesn't help your frustration with INSERT vs UPDATE -- MERGE INTO basically fuses the two of them, plus a SELECT statement. The very different syntax between them remains.

The nice thing about MERGE INTO, at least from an OCI standpoint, is that you can reuse bind names. So if you're simply trying to do a "make it so" statement (not caring if it's an update or insert w/ a unique key; you just want the data to be there damnit) then you use the same bind names in all three places (select clause, insert clause, update clause) and you only have to do the binds and provide the data once -- not three times. And the whole thing is done over on the DB, which reduces round trips and should speed things up.

there are lots of good frameworks around SQL that make it so that writing SQL is becoming a thing of the past

Such as? I'm honestly unfamiliar with any of them. The perpetual issue is that SQL (and RDBMS in general) are set oriented, while most programmers think in purely algebraic terms. The two don't play nice together.

would like to see SQL treated like HTTP - nobody writes HTTP. It's a protocol.

But it's not a protocol -- it's a language. SQL is infinitely more flexible than HTTP. Maybe you meant HTML or XML -- it's a lot closer to either of those than it is to HTTP. And even then they're completely different -- they're hierarchical, which SQL is inherently not.

Disallow DELETE or UPDATE without WHERE by dtfinch · 2005-11-03 12:58 · Score: 1, Interesting

I feel lucky to have not been burned by this, but you can bet a lot of people have. If you really want to delete everything, you should have to type something like "delete from table where true" or "truncate table".

The SELECT syntax is usually good enough for me, but the capabilities of most implementations leave much to be desired. They only support a small subset of what the syntax suggests.

Re:Disallow DELETE or UPDATE without WHERE by Nurgled · 2005-11-03 20:09 · Score: 1
I agree wholeheartedly. My main beef with SQL is that it is optimised for doing the wrong thing in so many cases:
- It's easier to select all rows from a table than it is to select a specific one.
- It's easier to select all columns from a table than it is to select the subset you are interested in.
- It's easier to delete all rows in a table than it is to delete a specific one.
- It's easier to clobber all rows with an update than to update a specific one.
One thing I do more often than anything else operating on a single row based on its primary key. Why is that so hard?
Re:Disallow DELETE or UPDATE without WHERE by pooly7 · 2005-11-04 06:50 · Score: 1

If you use MySQL : --safe-updates, --i-am-a-dummy, -U at the command line http://dev.mysql.com/doc/refman/5.0/en/mysql-comma nd-options.html

Replace it by using MUMPS. by Anonymous Coward · 2005-11-03 12:59 · Score: 0

I'd use MUMPS like god intended.

Bullshit, at least partial by rpresser · 2005-11-03 13:00 · Score: 1

And these aren't just "theoretical" problems, I run into them every day because I know there's something "more" out there. Here's a simple query you should try to do in one line of SQL: "give me a list of all customers who bought every product in product line X". Someone who knows relational theory just thinks up the solution (you just need to create a list P of all products in product line X, and pull out the list of orders where P is a subset of the order items, then join with the list of customers). Someone who only knows SQL will immediately run for the application layer, where you can't just *declare* your problem and have the app solve it, you literally have to write loops and procedural code to solve the problem.

This is what I thought of before even reading your second sentence:

SELECT Customer.* FROM Customer C
WHERE (SELECT COUNT(DISTINCT O.ProductId)
FROM Orders O
INNER JOIN Products P1 ON O.ProductID = P.ProductID
WHERE O.CustomerID=C.CustomerID
AND P1.ProductLine = 'X')
= (SELECT COUNT(*) FROM Products P2 WHERE P2.ProductLine= 'X') )

And I have never studied relational theory formally; only SQL (and, I admit, Joe Celko's posts).

Re:Bullshit, at least partial by Anonymous Coward · 2005-11-03 16:42 · Score: 0

Celko does have a lot of nice tricks and to get a lot out of SQL you should know them (and more importantly, you should know which algebraic queries they are simulations of). But they are tricks, not fundamental.

See, your query is answering a different question, that happens to have the same answer. Or put it another way, *you* did an optimization that the DBMS should do automatically.

The point is, it's nearly impossible to do a straight table equality check in SQL (looking through some books, I see it *is* possible with 2 levels of nested subselects, but I wouldn't dare copy it all here, and it certainly isn't intuitive or readable, and I'd hate to see what the query optimizer does with it).

I want to ask the DB:

Xyz WHERE (some query arbitrary query) = (some other arbitrary query)

where the queries return tables, not integers.

If the DB internally optimizes it into a COUNT() = COUNT(), fine. But I shouldn't have to come up with the "trick". I should just be able to state the query, translated into relational algebra, and the DB supply the answer. Same with recursive/hierarchical queries, nested sets, etc.

mind reader by chochos · 2005-11-03 13:16 · Score: 3, Funny

SELECT * FROM whatever the hell my customers have in mind

--
Go hug some trees.

dot-qualified column names are unique by Julian+Morrison · 2005-11-03 13:26 · Score: 1

Why do I not qualify every column name? Because relational attributes *must* always be unique and unambiguous.

Result, C naming syndrome.

LeaseNumber and Lease.Number are functionally equivalent, they're both unique, however the latter lends itself to convenient simplification in contexts where it wouldn't create ambiguity. Otherwise you end up repeating yourself a lot. LongTableNamePrefixFoo, LongTableNamePrefixBar, LongTableNamePrefixBaz etc.

Re:dot-qualified column names are unique by Anonymous Coward · 2005-11-03 13:42 · Score: 1, Interesting

1) Relation values can be anonymous, just like "3" is anonymous when it isn't assigned to a variable. What's the table name in that case? In other words, what does this mean:

SELECT 1 AS A, 2 AS A, 3 AS A

The fact that SQL allows this is a flaw. You have to constantly think about the names of your columns, even when they are hidden under "SELECT *".

2) Your point would be fine if SQL consistently used qualified table names, but it doesn't. It doesn't *return* them in a qualified state for example.

3) What qualifier should be used in a derived table (a "view").. the name of the view or the original table? If it uses the original table, that violates the requirement that views be indistinguishable from tables. If it uses the view, then the "full name" of the column has changed from what you specified in the query, and you still can't put two columns with the same "unqualified" name in the same view.

4) What name should be used in a self-join?

5) Relational theory just doesn't allow multiple attributes with the same name in a single relation, because there's no way to to tell them apart (relation headers are UNORDERED SETS, not arrays).

The syntax of what SQL does isn't as important, it's the broken inconsistent semantics.
Re:dot-qualified column names are unique by RingDev · 2005-11-03 16:09 · Score: 1

I actually use the flaw you mention in 1) here and in other arguements you've posted as a tool to force result sets to duplicate data and to create "empty" values. Could I achieve the same thing doing it your way, likely. But from what I've seen of your arguement it would take a new table containing the force information to create the joins I need. I don't see anything specificly wrong with your theory, but I've looked at it for all of 10 minutes here. It'd be interesting to see how it could handle subqueries and more complexe joins. I've got a couple multi-page queries at work that it might be nice to run through.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

Insert / Update combination by justanyone · 2005-11-03 13:45 · Score: 1

Perhaps, instead of replacing SQL, let's just add a clause to the update statement (or insert statement):

insert into table tabname (field1, field2) values (1, 2) [ WITH UPDATE [ALL,FIRST,ADD,NONE] ON FIELD1=1 ];

Thus, if the row exists with field1=1, it will update that row to with value field2=2.
If there are multiple rows where field1=1, define either update all of them (uses 'ALL'), update the first one found (uses 'FIRST'), insert one more (uses 'ADD'), or updates none of them (uses 'NONE').

Alternately, you could add this syntax to the UPDATE verb:

UPDATE TABLE tabname set field=1, field2=2 where field3=3 [ WITH [ NO ] INSERT ]

This would do the select for field3=3. If one or more records were found, update per request. If no records were found, insert a record with field1=1, field2=2 (field3 would be NULL since it's not specified in the 'set' subclause, only in the 'where' clause).

This solves your problem, provides two interesting methods.

--
Unitarian Church: Freethinkers Congregate!

Join syntax stinks by justanyone · 2005-11-03 13:53 · Score: 1

I've never been a fan of the join syntax. 'Inner' and 'outer' and 'left' and 'right' and 'full' joins? Let's for gosh sakes use the simple set-theory math terms we all understand.

Instead of:

select A.f1, b.f1, b.f2 from A join B on A.f1 = b.f1

do:

<B>select A.f1, b.f1, b.f2 from intersection join of A, B on A.F1 = B.F1 select A.f1, b.f1, b.f2 from union join of A, B on A.F1 = B.F1 select A.f1, b.f1, b.f2 from union minus intersection join of A, B on A.F1 = B.F1 select A.f1, b.f1, b.f2 from union minus A intersection join of A, B, C on A.F1 = B.F1 = C.F1 select A.f1, b.f1, b.f2 from union minus B intersection join of A, B, C on A.F1 = B.F1 = C.F1 select A.f1, b.f1, b.f2 from union A, B minus B intersection of A, B, C on A.F1 = B.F1 = C.F1</b>

--
Unitarian Church: Freethinkers Congregate!

Re:Join syntax stinks by poopdeville · 2005-11-03 16:57 · Score: 1

'Left', 'Right', 'Inner', and 'Outer' are set theory terms.

--
After all, I am strangely colored.

Is this a test? by planetjay · 2005-11-03 14:03 · Score: 1, Funny

USE Database;
SELECT * FROM table WHERE column !="99";

Did I get the job? Or for that matter DID YOU?

Re:Is this a test? by Tablizer · 2005-11-03 17:30 · Score: 1

USE Database;
SELECT * FROM table WHERE column !="99";

Indeed a syntactically creative solution to the "all columns except" question in the opening. In my pet query language (see message on c2 above) I actually made the column list a table. There is a shortcut syntax that turns SQL SELECT-like expressions into such a (virtual) table so that you don't have to keyboard alot if you don't want.

But, one could do such "column math" by making a "column table", and then doing relational algebra on the (virtual) column table to remove columns not wanted, and then feed that table into the next sub-query. Rather than re-invent set math just for columns, it just uses the existing relational engine to do it. I think it is cool and like bragging about it. I know, "get a life". I call the language SMEQL, and the "ME" means "meta enabled", meaning that queries can calculate/produce queries (to some extent).

--
Table-ized A.I.

Slightly offtopic by EraserMouseMan · 2005-11-03 14:38 · Score: 1

But I've been getting into MDX lately (data cube technology). I'll tell ya, if you want to do super-fast reporting of hierarchical data, cubes are the way to go. And MDX is a very feature rich cube querying language. It's such a breeze to do a query in MDX that would take all kinds of sub-queries, inner/outer joins and calculated fields to do with SQL. If you're finding yourself wracking your brain with complex SQL for reports you should really check out cubes and MDX. The learning curve will be well worth it!!

Object oriented by countach · 2005-11-03 14:57 · Score: 1

The biggest thing needed is the ability to return rows of differing column makeup for object oriented purposes. You need to be able to go:

SELECT FROM base-class

and get back all the fields from sub-classes so that you can populate entire objects from them. Real object oriented databases let you do that.

Re:Object oriented by Anonymous Coward · 2005-11-03 16:10 · Score: 0

SQL-99 spec includes a specification for defining a type, which can then be used as a column. so there's already some movement towards providing OO support in SQL.

There are extensions by pestilence669 · 2005-11-03 15:32 · Score: 2, Interesting

Most SQL dialects include some sort of exclusion operator.

SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;

or

SELECT * FROM A EXCEPT SELECT "B" FROM A;

Other engines do it differently. I think one of the best things about SQL is that it's a loose standard. You can easily choose the engine that works best for you... unless you are from the Cult of Microsoft (SQL Server). DB/2, Oracle, and even Sybase have very cool features that make queries much more powerful.

While SQL is hard to use at times (remembering double outer joins), it's that way for a reason. You don't want to be as easy to use as VB, for instance. Being forced to think in terms of lists and cartesean products forces you to think about speed and abstraction.

SQL is as easy as it should be, IMO. Specializing the access modifiers will only add to the complexity and make query optimization an impossibility. If you don't care about speed, then your needs probably aren't serious enough for a full blown SQL RDBMS. Text, XML, or even MS Access could be better suited.

Complaining about SQL is like complaining about Linear Algebra. These systems exist for exceptionally good reason. They are constrained to reduce or eliminate unsolvable situations.

Re:There are extensions by HappyDrgn · 2005-11-07 06:47 · Score: 1

Most SQL dialects include some sort of exclusion operator.

SELECT * FROM A INTERSECT SELECT * FROM A LIMIT 99;

Except Oracle, which has no 'limit' exclusion operator! This really bugs.

SELECT tbl.col1, tbl.col2 FROM (SELECT rownum col1, col2 FROM table) tbl WHERE tbl.col1 BETWEEN 1 AND 99;

....could be a lot shorter with limit.

Replace SQL With Prolog by Anonymous Coward · 2005-11-03 16:18 · Score: 1, Interesting

Prolog was designed to work with the relations that relational databases store and is thus the natural language for RDBMS. It's also easy to express hierarchical or network relationships in Prolog.

Actually Prolog can do anything an RDBMS can do, but transactions (along with the ACID properties: atomicity, consistency, isolation, durability) are missing. It would be simplest to add transaction processing to existing Prolog implementations and kiss the current RDBMS/SQL implementations goodbye.

Re:Replace SQL With Prolog by ghakko · 2005-11-05 21:28 · Score: 2, Insightful
Here's a semi-realistic example: suppose I have an intrusion detection system, and I'm logging packets into a table. I now want a tally of inbound traffic by source address and protocol, but want to ignore loopback and any source addresses from which I get fewer than 10 packets.
SELECT INET_NTOA(src), prot, SUM(len) FROM packets WHERE src INET_ATON('127.0.0.1') GROUP BY src, prot HAVING COUNT(*) >= 10;
When rewriting this query as Prolog:
?- bagof(tally(Src1, Prot1, Len1), (setof((Src, Prot), packet(Src, _, Prot, _, _, _, _, _, _, _, _, _, _), Srcs), length(Matches, N), N >= 10, member((Src1, Prot1), Matches), \+ inet_ntoa(Src1, '127,0.0.1'), bagof(Len, packet(Src1, _, Prot1, Len, _, _, _, _, _, _, _, _, _), Lens), sumlist(Lens, Len1)), Tallies).
Notice that in Prolog:
1. There's no natural way to express aggregates. To do what GROUP BY does, one has to nest all-solutions predicates like bagof/3 and setof/3.
2. There's no syntax for named fields.
3. The query is no longer concise and not easy to understand.
This just covers SELECT queries. Transactional INSERT and UPDATE queries would be more complicated, because of the way backtracking works in Prolog.
In short, I think Prolog is too general a language to be useful for queries on relational data. One really needs purpose-designed syntax to accommondate common queries.

Copying rows with slight modifications by toddbu · 2005-11-03 16:20 · Score: 1

We often run into the problem of wanting to copy a row in a table back into the same table (or an identical table in another database) but only make a minor modification to it. For example, let's say I have a system for creating orders. When I want to generate an invoice, I want to copy all the rows from that order into an identical invoice ("you ordered three widgets at $1.99"). The reason to preserve the original order is that you may want to reuse it to generate more invoices later.

So what I'm looking for is something like a "select into" where I can override the values of specific columns, perhaps with a lookup from another table. It should also be smart enough to handle auto_increment/identity type of columns so that you don't get conflicts.

--
If you don't want crime to pay, let the government run it.

Re:Copying rows with slight modifications by Anonymous Coward · 2005-11-04 00:24 · Score: 0

You can do this already I believe in MS SQL Server 2000

INSERT INTO Invoice (InvoiceID, ProductCode, Rate, Description)
SELECT InvoiceID, ProductCode, Rate * 2, Description
FROM Invoice

syntax free by Dan+Farina · 2005-11-03 16:36 · Score: 1

The worst part about SQL is that it was designed at a time when there was this notion that you should be able to read computer languages like English.

As we all know, unfettered application of this principle leads to some terrible languages (see BASIC, COBOL). SQL is one such language: every extension, every new functionality that looks "sql-like" (eg, not a user function) has to be integrated into the grammar. Furthermore, although perhaps not oft considered by SQL veterans, there are key words that, in a very real sense, mean the same thing. WHERE and HAVING in a SELECT are such an example. They both operate on intermediate tables (in the conceptual sense) and accept or reject a tuple, but before or after grouping. There are also inconsistencies in the order with which one supplies operations: in the case of SELECT, WHERE, GROUP BY (a two part keyword, of which the second part is context sensitive! AGH!), the key word leads the parameters. But when you have to do a JOIN, UNION, or INTERSECT you are using infix notation.

My conclusion is that SQL-the-language is a veritable hellhole.

An earlier poster mentioned using S-expressions, and I have to say that I am in complete agreement. SQL queries are usually functional in nature, generally not generating side effects (besides triggers). Then the grammar would be more or less zero size and vendors extending functionality would not result in a huge grammar, just a huge library of functions that are called identically grammatically. It would also make extensions easier to write.

Now, why do I suggest syntax free? Simply because syntax is hard to get right. Syntax-free languages have the handy property of being easy to convert to since they so trivially represent abstract syntax trees, and that means that anyone with a good idea, luck, and some talent in language design could mess around in defining their own language.

Re:syntax free by Dan+Farina · 2005-11-03 16:39 · Score: 1

Specific to the poster's problem: It would also be easy to provide a macroing system that would allow you to "invent" functionality originally not thought of, such as negative masking column names. Once you open up macroing, you have the full power of metaprogramming, and nothing is beyond your reach.
Re:syntax free by Tablizer · 2005-11-03 18:11 · Score: 1

The worst part about SQL is that it was designed at a time when there was this notion that you should be able to read computer languages like English. As we all know, unfettered application of this principle leads to some terrible languages (see BASIC, COBOL).

Actually one of the earliest production relational languages, BS-12 (Business System 12), used prefix notation and was rather functional in syntax. It was influenced by a "lab language" called ISBL that used the following infix operators:

'*' = Natural Join
':' = Selection
'%' = Projection
'+' = Union
'-' = Difference
'.' = Intersection
'=' = Assignment of relation to a name
'LIST' = Display relation (result)

It appears the IBM suits thought the English-like approach was more marketable to other suits. Like Microsoft, they often tried to please the check signers more so than the programmers/techies. It is not really the fault of the "era", but of the same ol' sh8t.

--
Table-ized A.I.

C2 wiki on SQL overhauls and fixes by Tablizer · 2005-11-03 17:03 · Score: 1

At the C2 wiki some of us have been thinking about this question for years. Here are some key topics on it, including my pet SQL replacement suggestion:

http://www.c2.com/cgi/wiki?SqlFlaws

http://www.c2.com/cgi/wiki?TqlRoadmap

--
Table-ized A.I.

Re:C2 wiki on SQL overhauls and fixes by Tablizer · 2005-11-03 17:12 · Score: 1

I forgot to wet your appitite with an example of SMEQL:

Get the top 6 earners in each department based on an Employees table with the columns: empID, dept, empName, and salary.

srt = orderBy(Employees, (dept, salary), order) top = group(srt, ((dept) dept2, max(order) order)) join(srt, top, a.dept=b.dept2 and b.order - a.order <= 5)
Unfortunately, we don't yet have a working prototype.

--
Table-ized A.I.

Scrap it entirely by BlindRobin · 2005-11-03 17:09 · Score: 1

and start over. SQl is an abomination. Even well constructed SQL is an UGLY, syntacticly obscene, maintenance pain. Code to generate SQL is usually pretty nasty too bahhhh...

Re:my request by Anonymous Coward · 2005-11-03 17:12 · Score: 0

I was going to do something like that once. Using 'c' library routines so you could use something with switches that had a good man page. I know there's the mysql command but you still have to type lengthy sytanx to do most tasks.

Your Frustration Is Misplaced by Anonymous Coward · 2005-11-03 17:29 · Score: 0

There's always a quick way to fetch an automatically-generated ID. Look it up in your reference manual. You're wasting time and looking database-ignorant by posting your missive.

But IMO it is best to assign IDs explicitly yourself. That requires some coding and an additional table where the next-available ID for each table is stored.

Infix, no by Tablizer · 2005-11-03 17:53 · Score: 1

I have decided that prefix notion is superior to infix (as used by Tutorial-D). The problem with infix notation is that not all operators have only 2 parameters. If some have more than 2 params, then you have to either introduce goofy syntax adjustments, or mix infix with prefix. If you use all prefix then you don't have to worry about mixing. Plus, prefix can be implemented as regular sequential functions in most languages so that one can emulate the query language in a native programming language more easily.

Consider the case where you have an operation that has 2 required parameters and one optional parameter. That means that sometimes it is written like:

p1 op p2

and sometimes like:

op(p1, p2)

so that we can have:

op(p1, p2, p3)

when you decide to use the 3rd param.

--
Table-ized A.I.

insert vs update by TheLink · 2005-11-03 20:00 · Score: 1

It seems stupid to me that insert would have a different format from update.

e.g. (field1,field2,field3) values ('1','2','blah') vs field1='1', field2='2', field3='blah'. Kinda dumb.

Plus why wasn't there a command to "merge/replace/put" rows into a table much earlier?

SQL's badly designed, but looks like we'll have to live with it for decades to come.

--

Too many replies beneath your current threshold

Who needs 100 columns? by obtuse · 2005-11-03 20:24 · Score: 1

Why do I need 99/100 columns? Perhaps a poor example, but the obvious answer is: Because some nitwit built this database and I just have to live with it.

Do you think most SQL queries are executed by people who created the database in the first place, or even have any control over how it is designed? Don't be silly. It's the usual problem with software. Computers are so flexible, and commercial software is so inflexible, that a huge amount of energy is expended trying to get data from point A to point B when there is no direct route. Hence Perl.

--
Assembly is the reverse of disassembly.

some solutions by Matje · 2005-11-03 20:52 · Score: 2, Informative

fwiw a few solutions.

in MySQL, the statement REPLACE INTO will perform an update or an insert, depending on whether the primary key value exists in the table. It performs exactly like your WRITE command would.

in MySQL, you can perform a SELECT LAST_INSERT_ID() to get the last inserted value.
in MSSQL, use a SELECT @@IDENTITY to get the same. (check in the docs whether you need @@IDENTITY OR @SCOPE_IDENTITY or the third version, I always forget).

Re:some solutions by Vorx · 2005-11-04 09:54 · Score: 1

@@SCOPE_IDENTITY is IMO a better option because if your table has triggers attached that do data operations (think audit logging) then @@IDENTITY can change, but @@SCOPE_IDENTITY won't.

--
Yes this is my real UID. No, it was not bought from EBay.

I can't believe no one's said the obvious... by nagora · 2005-11-03 22:03 · Score: 1

Fricking lasers! That's what would improve SQL - frickin' lasers!

--
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"

Add units to numerical operands by Anonymous Coward · 2005-11-03 22:46 · Score: 0

SQL lacks units for numerical operands. If a column contains widget size in centimetres and I query for widget size with contraints in inches than the query "works" without errors but I get the wrong answer. This is a major problem for interoperability of archive DBs in science

The International Virtual Observatory Alliance (http://www.ivoa.net/) has a SQL-like language called Astronomical Data Query Language and this is supposed to support units.

MACRO, FILTER, UDF, PARAMETERIZED VIEWS by heson · 2005-11-03 23:48 · Score: 1

Standard way to make abstraction layers. I think they exist under names like: parameterized views and user defined functions. Nested selects are hairy to debug when they get long.

Note that the layers exist only before the query optimizer, no extra performance cost beyond that.

Re:Winning the special olumpics and debating an AC by oldCoder · 2005-11-04 01:27 · Score: 1

So what texts do you recommend we study to learn this stuff?

And how will using full Relational Algebra change our existing ORM systems? And will it impact the object model required of the programming languages? And are there examples of existing commercial software packages that offer a Relational Algebra interface? How much money could we make if we decided to build one?

And who could do it faster, MS, Oracle or Open Source?

And how do we justify buying it to management?

--

I18N == Intergalacticization

I would leave it alone (except maybe standardize) by jonadab · 2005-11-04 02:05 · Score: 1

Programmers shouldn't be writing logic directly in SQL anyhow; that's what we have *programming* languages for. SQL is a data language, like XML or YAML, a language that computer programs should be using to transmit data amongst themselves. However, the programmer shouldn't be hardcoding the SQL directly into the program logic. That should be handled by a library, possibly a multi-layer stack of libraries (e.g., a subclass of Class::DBI running over DBI with a DBD connector for the database in question).

Then you write your program logic in a programming language (preferably a multiparadigmatic VHLL), which will always have much more expressiveness than trying to do the equivalent thing in a data language like SQL.

--
Cut that out, or I will ship you to Norilsk in a box.

Standardization by JediTrainer · 2005-11-04 02:52 · Score: 2, Insightful

All I want is for every database to have the same functions; a standardized way to do the basics.

Every vendor seems to have their own ways to define (or arbitrarily break standard) date functions (add/compare/convert/get current timestamp), string manipulation (like uppercasing, substrings and concatenation), getting the generated id from an inserted row (identity/serial/auto_increment), limiting the number of rows returned (TOP or FIRST?), getting a subset of rows (ie a standard way to get rows 100-150 that works with most DBs) or even getting a list of tables or viewing the schema.

Trying to make an app portable across DBs is next to impossible, and that's not even counting stored procedures or different behaviours for the same syntax (like NULL handling across the various functions). This is very irritating and should have been fixed long ago. Instead, we get this crap which makes the differences we see across different web browsers look like child's play.

--

You can accomplish anything you set your mind to. The impossible just takes a little longer.

Clause Order by PizzaFace · 2005-11-04 03:59 · Score: 1

SQL would be easier for people to learn if the order of a statement's clauses was the same as the order in which they are processed. Specifically, move the SELECT (projection) clause to the end of the statement, instead of the beginning:

FROM ...
WHERE ...
GROUP BY ...
HAVING ...
SELECT ...

Declarative languages are fine, but it always helps a programmer to be able to imagine how a statement will be processed, because order of evaluation makes a difference.

relations & relational operators by rfisher · 2005-11-04 04:04 · Score: 1

A RDBMS query language needs to be structured around relational variables & relational operations. Such a language makes it easy to do things that SQL's limited syntax make much too difficult.

Now, if we could just have RDBMSs that actually did the M part for you. Or at least one that isolated the management so only the DBA saw it & presented a clean, normalized view to the outside world.

Quick list by metamatic · 2005-11-04 04:04 · Score: 1

1. End all fixed width field requirements. (Not so much a SQL misfeature as a database misfeature, but still...)

2. Implement standard data types for dates, currency, names, and so on. Give them some smarts, so you can read a date and have it return the actual correct value, not some string that could be formatted in any random way and has a missing time zone.

3. Make the grammar less fussy. Half the problem I have with writing SQL queries is that when I'm done, I have to mess with the pieces of the query until I get them in the right order that the system will accept them. Or even better...

4. Get rid of that stupid English-like syntax and give us a proper notation based on Scheme or Ruby or something.

5. Standardize the bits that aren't standardized yet but every real database needs, like referential integrity constraints.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak

Re:Winning the special olumpics and debating an AC by snarfwarg · 2005-11-04 04:25 · Score: 1

I write it as follows:

[Modify dsname] JOIN ONE LEASE TO MANY INVOICE VIA LeaseNum to InvoiceNum
MODIFY dsname JOIN INNER

SET DS LEASE
FIND LeaseNum "1234"

PRINT -
Lease.LesseeNum,Invoice.InvoiceNum,Invoice.AmountB illed -
BY Lease.LeaseNum - !automatically preceeds print item list
BY Invoice.InvoiceNum Suppressed

Is that simpler?

--
It's not what you Warg, it's how you Snarf

Re:Better handling of relationships between tables by Anonymous Coward · 2005-11-04 04:47 · Score: 0

Yes, having default relationships between tables seem really useful. Let's invent something to handle that ... we can call it a "view".

Problem Not SQL, It's DB Implementations by davemabe · 2005-11-04 05:13 · Score: 0, Redundant

SQL is great - it's trying to get all the database vendors to support it consistently. Ever tried doing in Oracle what is a piece of cake in MySQL:

SELECT * FROM table LIMIT x,y

You can easily page through results using that LIMIT function. Try doing the same in Oracle. I'm told it can be done, but it ain't pretty.

Re:Problem Not SQL, It's DB Implementations by pooly7 · 2005-11-04 06:54 · Score: 2, Informative

Try :
SELECT * FROM table LIMIT y OFFSET x
That would make it even usable with PostGreSQL and most DB, since the x,y is a mysql extention.

Re:Winning the special olumpics and debating an AC by RingDev · 2005-11-04 05:19 · Score: 1

That looks more like a sequential process as opposed to a single request. Interesting approach though.

-Rick

--
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs

automatic index creation by brlewis · 2005-11-04 05:41 · Score: 1

I'd just as soon give the DB a time interval to analyze and decide by itself what indexes to create based on the SELECT/INSERT/UPDATE/DELETE operations that happen in that interval.

In Solvat Russia... by Anonymous Coward · 2005-11-04 05:42 · Score: 0

SQL improve you!

denormalized questionnaire by brlewis · 2005-11-04 05:51 · Score: 2, Insightful

No wasted space and every column is relevant in my example. EVERY time I want EVERY column, no exceptions.

Then your example is contrived. In the real world somebody would ask, "How many people checked item 100?" or "How many people who checked item 33 also checked item 99?"

term "data persistence" shows ignorance by brlewis · 2005-11-04 06:23 · Score: 2

Thanks for the pointer. It confirms what had previously been my sneaking suspicion. Referring to a database as "data persistence" shows ignorance of at least the reporting capabilities of a real database, and likely other things as well. These "object persistence" abstractions will make a 2-day project out of a report that would take 2 minutes to write in SQL. E.g., you wanted to see what magazines people in zip code 90210 subscribe to in order of popularity.

Re:term "data persistence" shows ignorance by lucm · 2005-11-04 07:31 · Score: 1

Referring to a database as "data persistence" shows ignorance of at least the reporting capabilities of a real database, and likely other things as well
According to Merriam-Webster, a database is "a usually large collection of data organized especially for rapid search and retrieval (as by a computer)". I don't see reporting capabilities, and I don't see why "data persistence" is not a valid description.
I suspect that when you say "real database" you apply a selective filter over the pool of existing products, and that you keep only the products having peripheral software. Which does not make the other products less of a database. Also, if reporting capabilities were included in a database, why would Oracle license Oracle Reports separately?
This being said, I find it obvious that either you have not been involved in a major software development project, or that you have been involved in poorly managed projects only. Applying a layer of abstraction between the application and the database is a very good practice and it saves lots of work whenever the application or the database has to be modified.
What do you do if someday for a commercial purpose you must change your database vendor? With your strategy you have to rewrite loads and loads of code, as each vendor has specific SQL features. But if you have an abstraction layer (like EJB) you basically change the drivers and, maybe, update the XML schema.

--
lucm, indeed.
Re:term "data persistence" shows ignorance by brlewis · 2005-11-04 07:53 · Score: 1

By "real database" I mean nothing more than that subset of SQL that is portable between PostgreSQL, DB2 and Oracle, and easily adaptible to Transact-SQL (MSFT, Sybase). I gave a simple and concrete example of something that takes two minutes to implement in SQL and 2 days to implement using EJBs. You retorted with speculation about my experience. Answer my example first.

Changing database vendors would involve straightforward porting, not rewriting. But even if you did change database vendors five times and did a rewrite each time, it would still make more sense to redo the 2 minutes worth of work six times rather than spend two days and be done with it.
Re:term "data persistence" shows ignorance by lucm · 2005-11-04 08:30 · Score: 1

SQL that is portable between PostgreSQL, DB2 and Oracle, and easily adaptible to Transact-SQL...
Easily adaptable? No it is not. You don't have table heritance or array fields in Oracle, which you have in Postgres; how do you "port" the queries? And T-SQL is quite different from the Oracle or Postgres SQL, even a junior programmer can tell that queries on one database are not easily adaptable to another. Which is why I suspect that you don't have much experience.
I gave a simple and concrete example of something that takes two minutes to implement in SQL and 2 days to implement using EJBs
No you did not. Your evaluation of how long this kind of implementation takes is wrong, so all your logic (which is based on that evaluation) is flawed. A simple EJB can be done in less than two minutes with JBuilder or any other J2EE IDE. Most of the code is generated by the software, not by the programmer.
An EJB is much easier to maintain than a SQL query because with EJBs you don't have to rewrite at all if you change the back-end, as long as the business logic is the same.

--
lucm, indeed.
Re:term "data persistence" shows ignorance by brlewis · 2005-11-07 03:32 · Score: 1
1. You ignored "subset" in my post. PostgreSQL hierarchical tables are not part of that subset of SQL and I never use them.
2. You restrict yourself to "simple" EJBs. Yes, these can be made in 2 minutes. Doing real-world work such as in my example cannot be done in 2 minutes with EJBs, but can be done in 2 minutes if you know SQL.

Select pains by Anonymous Coward · 2005-11-04 06:42 · Score: 0

Yes, it's a bear that we can't do a "Select * -Column5 from tableA".

But really, why should select only operate on column names? Why couldn't it operate on datatypes as well? Why Can't I run "Select INTEGER from tableA" to get all integer columns, or "Select -DateTime from TableA" to get all BUT date columns? We could either make the datatypes keywords that couldn't be used as column names, or precede datatypes in queries with a special character so that they can be differentiated from column names.

Also, why not be able to specify column names by a partial string? Say we have a bunch of fields that contain'Customer', why not do "Select %CUSTOMER% from tableA"?

How about being able to query columns by there order, including ranges? "Select 1 to 5 from tableA" That might be handy.

How about being able to easily select just the columns in an index, or the primary_key column, or foreign_key columns. Right now you have to look up the index information and then write the select yourself. That's clunky, why not write "Select * from tableA where COLUMN is foreign key"?

Tables by LordMyren · 2005-11-04 07:11 · Score: 1

I never liked tables. Too much impedance mismatch. I would get rid of tables.

Easy... by frission · 2005-11-04 07:19 · Score: 1

Just talking about this to a co-worker yesterday. I hate the fact that I can't do this:
select (first_name||' '||last_name) as name
from customers
where name like 'z%'
instead i have to do this:
select (first_name||' '||last_name) as name
from candidates
where (first_name||' '||last_name) like 'z%'
even thought I JUST defined the alias! granted, i've only ever used Oracle, I don't know how other DBs handle the above.

Re:Easy... by Anonymous Coward · 2005-11-04 08:26 · Score: 0

Create a view?

Did you mean ... by vlad_petric · 2005-11-04 07:44 · Score: 1

DELETE * ?

--

The Raven

What's the 1%? by brlewis · 2005-11-04 08:03 · Score: 1

I've been doing SQL since 1997, and every time I've used GROUP BY I've used exactly the non-aggregate expressions in the SELECT clause. Can you give an example where it would be something else? I too would like GROUP BY DEFAULT or somesuch.

Re:What's the 1%? by Atzanteol · 2005-11-04 08:43 · Score: 1

I sorta hedged my bet there. I can't think of anytime in the near past when I've done it. I was thinking maybe sometime back when I was doing a lot of report queries (large complicated joins) I may have ordered things without a care to how they were in the SELECT.

But if things were "auto" grouped it would be just one less clause to modify.

--
"Ignorance more frequently begets confidence than does knowledge"

- Charles Darwin
Re:What's the 1%? by TopSpin · 2005-11-04 09:24 · Score: 2, Insightful

I've used exactly the non-aggregate expressions in the SELECT clause.

This is non-optional. Something is either an aggregate expression or it isn't, so why are we expected to explain this in the statement? Probably because aggregates aren't 'first class' in terms of SQL. Aggregates are functions.

Can you give an example where it would be something else? I too would like GROUP BY DEFAULT or somesuch.

This is my idea:

SELECT a, b, c
AGGREGATE x, y, z
FROM foo

GROUP BY vanishes. Non-aggregates expressions are valid only in SELECT while aggregate expressions are valid only in AGGREGATE. Less ambiguous and no redundant expressions. Easier to use also; comment out the AGGREGATE clause and you've got a basic statement. Quickly remove group tuple elements with comments.

Other ideas:

Binary aggregate functions: OR(), for example. Clever techniques using these can eliminate lumps of code elsewhere. Example, given the values 1, 1, 5, 1, aggregate OR() yields 5, aggregate AND() yields 1. Very useful for analysis of 'detail' rows that have individual states.

Reorder the basic clauses: FROM should be first.

Eliminate HAVING: This is redundant with subselects.

--
Lurking at the bottom of the gravity well, getting old
Re:What's the 1%? by Anonymous Coward · 2005-11-07 00:31 · Score: 0

You can GROUP BY elements NOT in the SELECT Clause
Rare as heck but I have done so in the last few months

Re:Better handling of relationships between tables by Anonymous Coward · 2005-11-04 08:13 · Score: 0

Or we could call it a natural join. :)

Standard CREATE / ALTER statements? by Trevin · 2005-11-04 09:17 · Score: 1

Is there a standard way to create or alter a table? The databases I've worked with (MySQL, Postgres, Oracle) all seem to use similar syntax but have slight variations, besides the fact that they use different names for the various column types, and certain column types only exist in one or a few DBMS's. And that doesn't even take into account certain database abstractions that are DBMS-specific.

Efficiency is important by Anonymous Coward · 2005-11-04 10:11 · Score: 0

select * -columnt

to select 99 out of 100 does not make much sens. Data is in blocks so the whole block will be fetched.
Removing a single data from it is not really efficient ...

SQL is not for high level programming - it is for massive, effective data insertion and selection.

Standardization by Anonymous Coward · 2005-11-04 10:46 · Score: 0

I'd lock up all the "SQL compliant" vendors in a room, and make them fight it out until there was only one "SQL compliant" variation left.

Then again, MS Access would probably cheat, sneak an Uzi into the arena, and doom the world. :-(

Re:Winning the special olumpics and debating an AC by cortana · 2005-11-04 14:11 · Score: 1

Surely your local university has some kind of basic databases course. If you are not a student yourself then visit the course info page (or email the lecturer) and find out what text books they recommend.

use hibernate by JulianLibertine · 2005-11-04 17:12 · Score: 1

consider hibernate.....I tune sql by mapping, and its been ages since I wrote by hand a query....well that's of course if using java

Too easy: just add RegExes throughout by Anonymous Coward · 2005-11-04 20:34 · Score: 0

SELECT /^(desc,cost,qty).*$/ FROM invoices WHERE /^qty.*$/ IS NOT NULL;

1:1 relationships. by oneiros27 · 2005-11-05 12:41 · Score: 1

If you have several tables all with a strict 1:1 relationship, they should be in ONE table. Anything else is considered denormalized, not yet normalized. (aside from being just plain BAD!)

Although there are very few times when it makes sense to have a table spread across multiple tables, there are times when it is better to have 1:1 relationships in tables.

If you have a large table that gets used for more than one purpose, and one of those purposes uses relatively few columns, and it occurs more frequently than the other purposes, it may be better to have its information in a seperate table. You could also use a multi-column index, but only if the table has a significant number of writes per read, it may be a problem.

Next, with our two purpose concept again -- if the purpose that is called more often uses few columns, and those columns are all fixed width, you may see advantages in moving them to a seperate table.

And, once again -- two purposes, and the security levels are different for who has access to the data -- I might split up the data, if the sensitive data is relatively small in relation to the entire table, so that I only have to protect the smaller amount of data. (I know -- you're thinking 'But I can use views, or set up rights in the database', but you're forgetting that you also have to protect the backups, and it's a whole lot easier to keep track of one tape, as opposed to a whole rack of tapes, depending on the amount of data you're dealing with.)

So ... are there times I'd use tables for a one to one relationship when it wasn't a sparesely populated table? Yes. Are there times when it's used and it probably shouldn't have been? Yes, but every situation is different, and if someone can back it up with benchmarking using real data, or has a damned good reason for doing so, it's better than blindly always doing one thing without understanding all of the implications.

--
Build it, and they will come^Hplain.

Re:Winning the special olympics and debating an AC by hereticmessiah · 2005-11-05 19:54 · Score: 1

The book you're looking for is "Database Systems: A Practical Approach to Design, Implementation and Management by Thomas Connolly, Carolyn Begg, et al. It cover practically everything from theory to practice. My copy dates back to '98 and I don't think I'd ever give it away. Its coverage of relational algebra is very good.

Ignore the negative reviews in Amazon, by the way. It was one of my undergraduate degree texts and had no problems with it. And of course, considering that the negative reviews seem to be pissed at it because (a) it covers the theory properly; (b) it's written by a bunch of Scots, so it's "British" and that's apparently bad - I don't know how we survive over on this side of the pond with all those terrible American books ;-); (c) it isn't like a Dietel & Dietel book, and that's somehow a bad thing; (d) they were looking for Databases and SQL for Dummies and discovered that this wasn't it. The only relevant criticism made is that it's a little wordy. Ah well.

--
I don't like trolls and mod against me if you like, but I'd prefer if you'd reply.

Re:Better handling of relationships between tables by Old+Wolf · 2005-11-06 17:21 · Score: 1

Create a view that has those tables joined in the way you want. Then execute your queries against that view.

Just plain scope_identity() by Anonymous Coward · 2005-11-06 18:11 · Score: 0

@@SCOPE_IDENTITY is IMO a better option because if your table has triggers attached that do data operations (think audit logging) then @@IDENTITY can change, but @@SCOPE_IDENTITY won't.

I always use scope_identity() which has never let me down. Doesn't the @@ mean the variable is global? Where I work we have had major issues with @@identity (ironically, some of Microsoft's autogenerated VB.NET code uses @@identity which still causes problems).

Re:Better handling of relationships between tables by Trepalium · 2005-11-07 18:14 · Score: 1

Well, like I said, I'm no SQL expert. The only experience I have with SQL is some very simple web projects, and the in-house inventory/order tracking software which was written by someone who overused stored procedures and joins, and never used views or foreign keys. Since the front end for the order tracking software was written in MS Access (which is responsible for enforcing all security), it's not really a surprise. It might mean my view on SQL is a little warped.

--
I used up all my sick days, so I'm calling in dead.

Just have the Vendors follow the standard by Anonymous Coward · 2005-11-07 20:18 · Score: 0

Yes, Heirarchical Queries are part of the SQL 2003 standard.

The single biggest problem with SQL is that the vendors don't follow the standard and all have their own, often broken, ways of doing things in a half-assed manner.

There are lots of things about the syntax that are annoying (required redundancy on group-by and aggregates; enough keywords they often conflict with good names, ugly case folding rules, etc) --- but all of those things could be tollerated or hidden behind some API if only there were a system that supported the standard.

As it is now; 90% of "sql programming" is screwing around with some vendor's attempt at lock-in.

baby, bathwater by jdowland · 2005-11-07 21:50 · Score: 1

I'd bin it and start again, aiming to create a more complete standardised language. If you look at the SQL standard, there's not even a common means for creating tables and databases. I can only assume that's by design, but it does prevent you from writing vendor agnostic SQL.

271 comments