Diagramming Tool For SQL Select Statements

Bring a database down? by Shados · 2008-08-03 12:55 · Score: 5, Informative

No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

So at worse, you're slowing down your own localhost development database engine for everyone else trying to access it (read: no one).

Not much for the DBA to scream about...

Re:Bring a database down? by Anonymous Coward · 2008-08-03 13:12 · Score: 5, Informative

No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.
A real ACID-compliant database, no. MySQL, maybe.
Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?
Unfortunately sometimes you do need to run new queries against production servers. Of course, with a real database like MSSQL or Oracle, you can see how a query will execute, what path the optimizer will follow, and what the cost of the query will be.
Re:Bring a database down? by MBCook · 2008-08-03 13:39 · Score: 5, Informative

Explain/describe exists in MySQL, it's just very hard to do.
Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster). Is it possible to take MySQL down? Easily. It can be surprisingly easy to lock the server completely. Even when you select off one set of tables (A) and want to insert into another set (B, possibly in a different schema/DB) it is possible to have things locked. It's very easy. We haven't seen a crashing bug in MySQL in a while (fun: a query that formated dates with the date format function could reliably crash MySQL 4.0 or 4.1 (don't remember which).
Does explain help? No. On Oracle it may help. In Postgres it seems to help. I have no experience with MSSQL. In MySQL you have to watch out. While it can be useful, it is very limited.
It's row counts can be horribly useless. It can list 1.2 million rows when in fact it can take a fraction of a second to get the data because it's all in an index in memory.
Worse: it will run the query for you. Under some circumstances (using a subquery can do it, using more than one level of subquery is almost guaranteed to do it) it will just run the inner query and then use that to produce results. This means that describe/explain can lock the database and take hours to return (if you had a query that was bad enough and didn't kill the describe/explain). It's all the fun of running the real query, without the results actually presented to you.
Note: We're using 5.0 (since 5.1 isn't production ready yet). Some of this may be fixed.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:Bring a database down? by Anonymous Coward · 2008-08-03 13:41 · Score: 3, Funny

Terrabyte? A planet byte?
Re:Bring a database down? by NerveGas · 2008-08-03 14:24 · Score: 3, Funny

So, you don't put an untested query on a production server. Great. What happens when someone changes data in such a way that your query now explodes? :D
In the last case I had to deal with that, one boneheaded programmer had his code set to send him an email if it couldnt' find a good match in the DB. Someone changed the data, and with the amount of traffic, his code, spread across our web serving farm, had injected almost a million messages into the email queues. Programmers are awesome.

--
Oh, you're not stuck, you're just unable to let go of the onion rings.
Re:Bring a database down? by russotto · 2008-08-03 14:33 · Score: 5, Informative

Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster).

Oh yes, Oracle can be brought to a grinding halt (even on substantial hardware) by a big nasty query. It may not be crashed, but it's nonresponsive. Especially annoying when there is no need for the cartesian product; Oracle's pessimizer just chose to do one when something else was MUCH more appropriate. Alas this tool would not catch that situation (but EXPLAIN PLAN does).
Re:Bring a database down? by killjoe · 2008-08-03 14:55 · Score: 2, Informative

Depends on your database. I know I have been able to bring SQL server down with a query.
Try this...
begin transaction
update rows set a=b where x=y
commit transaction.
On your workstation this could run really fast because you only have ten records. On the production database server this could crush the server if you had a few million records effected.

--
evil is as evil does
Re:Bring a database down? by GoofyBoy · 2008-08-03 15:05 · Score: 2, Informative

>In practice, there's no difference.
To a DBA its a big difference.
1. Just a massive slow down - login (SQL Server there is a Dedicated Administrator Connection, don't think I've had problems connecting with a problem Oracle db as long as I can get on the OS (partly because sessions are processes)), and just kill the process. The DB should clean everything up. (as long as its not a toy db; I'm looking at you MySQL.)
2. A crash - then you have to go through a whole number of steps to bring it up and then verify the data is ok, then let everyone back in. There may be an backup involved if you are unlucky. You definitely want to figure out what happened.

--
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Re:Bring a database down? by mbourgon · 2008-08-03 15:19 · Score: 2, Interesting

AHAHAHAHAHAHAHAHAHA. Since I do run terabyte-sized databases, I'll contradict you - poor queries _can_ tank a server, even with small tables, if the query is poor enough. While it technically may be running, if nobody else can access it, then for practical purposes the server is down. And never underestimate the ability of one user with enough knowledge to be dangerous, to spread that selfsame query across as many people as possible.

--
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
Re:Bring a database down? by Shados · 2008-08-03 15:40 · Score: 2, Informative

Maybe my poor queries writing skills are bad :) Because I've seriously -tried- before... cross joins on 100+ tables, all of which containing several douzen gigs of data, totally multiple terabytes...the scheduling was good enough to give the query very low priority, leaving the server ok.
If you use (in SQL Server at least) the default settings, that will basically render your database useless... but if you use the newer locking strategies from 2005 (which had been available in Oracle for ages), the tables won't be locked, and everything will be fine. Laggish for sure, but the server definately won't tank.
Somewhere I used to work for, I would even run millions of inserts, continually, on our staging server (which was shared among a 50 or so devs) because I was testing an ETL routine... the server was slower for sure when I messed up and did a multi cross join on my insert source on a friday evening, but it never brought it down.
Re: Bring a database down? by scdeimos · 2008-08-03 17:46 · Score: 2, Insightful

No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.
You've obviously not tried anything simple on MS-SQL, like expanding a varchar(4) column to nvarchar(10) on a table with a few million rows. MS-SQL spins its wheels filling-up the transaction log until it overflows, then rolls it all back again. A 4GB log file, filled with a 250meg table (and no indexes because they were already dropped)?
In the end we had to drop all FK refs, select * into another table, drop the original table then select * (with conversions) into newTableWithOriginal's name and reset all the FK's. *shakes head*
Re:Bring a database down? by Venik · 2008-08-03 19:03 · Score: 2, Insightful

SQL code is usually developed on some small server or the DBA's own workstation. The dev database is representative of the prod version only in structure and not in size. So this type of errors sometimes go unnoticed until the code is migrated to the prod environment. The effect of such errors vary depending on server architecture. The most sensitive are HA cluster environments, where the clustering engine overreacts and starts failing things over, exacerbating the problem.
Re:Bring a database down? by Craig+Ringer · 2008-08-03 19:08 · Score: 5, Informative

craig:~$ psql
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.
craig=> set statement_timeout=1000;
SET
craig=> SELECT generate_series(0,100000000000000000);
ERROR: canceling statement due to statement timeout
Re: Bring a database down? by lucm · 2008-08-05 15:10 · Score: 2, Insightful

> myspace.com has been supplated by Facebook.
Facebook being more popular than mySpace has nothing to do with the database back-end. If you need more big customers for SQL Server 2005, they are easy to find: Barnes & Nobles, HMV online music store, NASDAQ (over 5000 transactions/sec).
So basically your statement that SQL Server is a toy database might have attracted a few claps 6 or 7 years ago on Slashdot, but the reality is that SQL Server is a robust product finding its way in many markets. As one could say: "2001 just called, they want their SQL Server rant back".
> And DB2 is the granddaddy that is being trusted by ALL banks.
I do not have data about banks and database, however I suspect that since many smaller banks are still using OS/400, there must be a lot of DB2 out there. For the bigger banks, I sincerely doubt there is any RDBMS laying around, except for OLAP or e-banking, in which case DB2 won't be in the contenders (e-banking is the land of Oracle, SQL Server and sadly Interbase). For the real backbone, big money usually sticks to big iron, which usually means hierarchical databases.
According to Gartner, the current market share is the following: Oracle 47%, IBM 21%, Microsoft 17%, with Microsoft closing the gap on IBM every year. And let's not forget that IBM's 21% includes Informix/Cheetah, Cloudscape, etc, not only DB2.
> And banks are the most thorough corporate IT customers.
This is an urban legend. Banks are not even the most conservative IT customers. I've been involved in three e-banking projects, and it amazed me how careless those people can be with data integrity and maintenance.
The most impressive IT customers I met are insurance companies and space industry companies (not defense contractors). I've seen my way in many datacenters, and only in insurance companies did I see figures about the heat dissipation of network cable.

--
lucm, indeed.

Still no cure for cancer? by hkz · 2008-08-03 12:59 · Score: 5, Informative

A link to an alpha project on Sourceforge that was created three days ago and doesn't even have its own website? That apparently outputs LaTeX tables instead of something readable without having to compile it first, like HTML, SVG, or even indented text? I know it's silly to expect every story to be about a cure for cancer, but come on...

Re:Still no cure for cancer? by LSD-OBS · 2008-08-03 13:23 · Score: 5, Informative

Yup, not cool.
Word to the wise: if you're going to actually start advertising a project, please make sure you have some binaries built for some common relevant platforms, and make sure you have some decent information online even if it's just an ugly page with screenshots or examples of what it does.
In this case, we're talking about some scripts written in Python. At least let people know this on the front page, and list the project dependancies! ie, GraphViz, or whatever.
This way, your potential users won't immediately discard it due to a lack of compelling information, and your potential (future) developers can see how far you've got and maybe get inspiration to chip in and help!
That said, this sounds like it should be a great tool for beginner or intermediate SQL users, and I look forward to throwing a few of our mammoth 12-table-join queries at for much fun.

--
Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
Re:Still no cure for cancer? by Blakey+Rat · 2008-08-04 03:24 · Score: 2, Informative

Ditto. I downloaded it to take a look and see how good it was at parsing T-SQL, since we have a few saved T-SQL queries with WHILE loops in them. I gave up after seeing it's... nothing. Just a Python script. It requires Graphviz, Python, and Pyparsing (even though it comes with pyparsing!? WTF!), and even more damning is that you can't use it for ad-hoc queries, the query has to be saved into a file first.
Someone slap a GUI on this that lets you paste in a query, and bundle all the requirements along with the package, and then we might have something. Right now, I'll just stick with MS SQL Server's query grapher.

--
Comment of the year

Re:How do these stories get picked? by larry+bagina · 2008-08-03 13:09 · Score: 3, Insightful

Posted by kdawson

--
Do you even lift?

These aren't the 'roids you're looking for.

Looking for a problem? by Craig+Ringer · 2008-08-03 13:14 · Score: 5, Insightful

Execution of SQL statements can require the RDBMS to perform nested loops over parts of the query execution.

This can be an issue if the DBMS is forced to do something like perform a sequential scan of one table for each record matched in another table. That gets expensive *fast*.

There are many other possible performance issues, of course.

However, I don't see how SQL parsing can tell you much about the performance characteristics of the query. The database's query optimiser makes choices about how to execute the query, and is free to change its mind depending on configuration parameters, available resources, system load, disk bandwidth, present indexes, statistics gathered about data in the table, etc. PostgreSQL's planner for example does make heavy use of table statistics, so query plans may change depending on the quantity and distribution of data in a table.

Any decent database can already tell you how it will execute a query (and usually give you a performance readout from an actual execution of the query). There are plenty of GUI tools for displaying the resulting query plan output graphically. PgAdmin-II can do it, for example.

A simple SQL parser can have no idea about what indexes are configured, the distribution of the data, how much working memory the database has available for sorts and joins, etc. The database knows these things - and can already tell you how it will, or did, execute a query - so why not let it do its job?

The whole project doesn't make much sense.

Re:Looking for a problem? by SQLGuru · 2008-08-03 13:25 · Score: 3, Insightful

I use diagrams as a tuning tool, but only to look for paths that don't make sense or alternate paths through tables or for "dead-ends"......but these are things that a computer can't really tell you because they require an understanding of the data.
But you're right, the explain plan is the single most useful tool for tuning a query. If you understand how the engine is going to execute the query you know what areas you can affect. And tuning is manipulaing those effects in a way that makes the query faster.
Layne

So, an alpha project for what exactly? by Mycroft_514 · 2008-08-03 13:16 · Score: 5, Interesting

Doesn't name WHICH RDBMS, and then you throw SQL at it? So what? For DB2 we have a thing called "Visual Explain" which NOT ONLY does this, but is free, provided by IBM, but also shows you other things like whch index is being used for each step, etc.

This is news? This isn't even worth a second look!

Re:So, an alpha project for what exactly? by Hackerlish · 2008-08-03 13:18 · Score: 2, Informative

Watch out! Anyone pointing out how a kdawson story isn't news gets moderated down as a troll. I can't even work out how this got out of the firehose.

EXPLAIN by Craig+Ringer · 2008-08-03 13:22 · Score: 5, Insightful

I don't see what this has over EXPLAIN and an appropriate graphical display tool like PgAdmin-III. There are large numbers of tools that display graphical query plans - and unlike this simple SQL parser, they know how the database will actually execute the query once the query optimiser is done with it.

Furthermore, a simple SQL parser has no idea about what indexes are present, available working memory for sorts and joins, etc. It can't know how the DB will really execute the query, without which it's hard to tell what performance issues may or may not arise.

See comment 24461217 for a more detailed explanation of why this whole idea makes very little sense.

Re:EXPLAIN by Craig+Ringer · 2008-08-03 14:23 · Score: 2, Informative

Another comment here revealed part of why someone might think a tool like this was useful:
In MySQL, EXPLAIN apparently works more like PostgreSQL's EXPLAIN ANALYZE (and related features in other RDBMSs). MySQL's EXPLAIN actually executes the query rather than just running it through the query planner. The documentation even warns that data modification is possible with EXPLAIN in some circumstances.
If your database gives you no way to ask the query planner what it will do without actually executing the query, something like this begins to look faintly useful. Personally, though, I can't imagine voluntarily using such a database.

Re:your programmers shouldn't be writing SQL by lastchance_000 · 2008-08-03 13:27 · Score: 2, Insightful

Quis custodiet ipsos custodes?

Re:Or... by mino · 2008-08-03 13:33 · Score: 2, Informative

That, and not using medium and low duty databases lile MSSQL and MySQL can go a very long way to keeping users happy.

Honestly, to describe MSSQL as "medium and low duty" is pretty rich. You'd best believe I'm happy to bash MS as much as the next guy but SQL Server is a high-performing, highly maintainable, high-availability database and doesn't deserve to be mentioned in the same sentence as MySQL.

Hell, MSSQL might actually be the only truly good product MS make -- in fact, it probably is. It's not a toy and people who assume it is, just because it comes from MS (I'm not saying this is what you're doing, but people DO do this) just show that they don't know what they're talking about.

Existing tools by Craig+Ringer · 2008-08-03 13:34 · Score: 2, Interesting

Most PostgreSQL users don't seem to use the existing, and superior, tools like EXPLAIN, EXPLAIN ANALYZE, PgAdmin-III's graphical explain, etc. I'm sure the same is true for users of many other databases.

It's not like these tools are particularly difficult to use or understand. No training is required, though being willing to think and read a little documentation helps if you want to get the most out of them. Understanding at least vaguely how databases execute queries is handy for any database user anyway. The same understanding is required to get anything useful out of this just-posted tool.

Anyway, as I've noted elsewhere the exiting tools for this do a much better job due to integration with the RDBMS and superior knowledge of how the DB will execute the query.

In tablespace, no one can hear you scream... by fahrbot-bot · 2008-08-03 13:44 · Score: 3, Funny

If you sit close to the DBAs, you can hear them screaming...

I've noticed that when things go horribly wrong, you don't actaully have to sit that close. To be fair, as a Unix SA who has to deal with Windoze systems, I've done my fair share of screaming. :-)

--
It must have been something you assimilated. . . .

Are you serious? by SpasticWeasel · 2008-08-03 13:51 · Score: 2, Informative

So SQL Server has had a graphical execution plan view for ever, and it's better than this lameness. But of course its not free, and we all know that free software is better, even when it sucks. Seriously, compare this to the real tools included with a serious RDBMS, and I have to question why this was even posted. It's almost farcical.

--
No sooner do I get over one, then you put a better one right next to me. Bastards.

Re:Are you serious? by weicco · 2008-08-03 18:29 · Score: 2, Insightful

Oh damn. I had to buy a whole PC set to run Debian. I guess Linux isn't free after all.

--
You don't know what you don't know.

Re:WTF by ahmusch · 2008-08-03 13:52 · Score: 3, Informative

Really? Most of us would call recursive SQL "looping" SQL, and something like this in Oracle is recursive:

SELECT LPAD(' ', 2*LEVEL, ' ' ) || ename empName, dname, job, sys_connect_by_path( ename, '/' ) cbp FROM emp e, dept d WHERE e.deptno = d.deptno CONNECT BY PRIOR empno = mgr ORDER SIBLINGS BY job;

Heck, even ANSI finally got into recursive SQL using the WITH clause:

with TransClosedEdges (tail, head) as ( select tail, head from Edges union all select e.tail, ee.head from Edges e, TransClosedEdges ee where e.head = ee.tail ) select distinct * from TransClosedEdges;

Now let's imagine queries with multiple levels of nesting using such clauses - after all, any SELECT statement can generally be used in any FROM clause.

Now, perhaps you're Chris Date or Fabian Pascal and are truly concerned with the completeness of SQL as implementing the relational model. For the rest of us, however, recursive SQL can answer interesting questions without getting into the nastiness of procedural code.

Oh, and considering the default join in virtually any SQL database is a nested-loop join, I'd say all databases loop by default. And a statement as innocuous as :

select * from a, b, c;

Can absolutely crater cpu and I/O performance. If each has 1,000 rows and there's not enough memory, there's 1,000,001 table scans. Hope your disk is fast.

I'm screaming from the summary. by sootman · 2008-08-03 14:15 · Score: 2, Funny

Can we have that in English please? Possibly with a diagram?

--
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.

Re:your programmers shouldn't be writing SQL by Samah · 2008-08-03 14:43 · Score: 2, Interesting

Generally what happens on my project is that the team (headed by an analyst) decides on the best design for the task, then subtasks are delegated to developers based on their level of skill with PL/SQL and/or Java.
Business logic (for the most part) is done on the server-side with PL/SQL packages, while the application itself is a Java fat client running on a Citrix cluster.
Before you make statements about keeping business logic separate from the database, this situation works well for this application, as it allows for less client-server communication, easier handling of commits and rollbacks, and much faster data access. A bonus is that when a severity 1 case is raised that is related to business logic, it doesn't require a long system outage.
The production server has read-only access for standard developers, and a logged full access account for support (and senior developers).
Every code change is reviewed by one or more senior developers to ensure it won't break existing functionality or contains (as you put it) "crapness".
From your comments I take it you are a DBA and have had bad experiences with poor programmers. In your case, maybe what you've suggested is a decent option for you, but I really don't think you should be stating it as the "right way".
As always, YMMV.

--
Homonyms are fun!
You're driving your car, but they're riding their bikes there.

Comment removed by account_deleted · 2008-08-03 15:15 · Score: 5, Interesting

Comment removed based on user account deletion

Re:your programmers shouldn't be writing SQL by cduffy · 2008-08-03 16:09 · Score: 2, Interesting

1. more readable code, there is less of it

Counterargument: Less readable code, as it's split into two places.

2. easier to maintain - change in the database and the change happens realtime, no need to a new release (if your doing binaries)

Counterargument: Harder to maintain - more upgrades will require the database to be revved as opposed to only the application, and synchronization between the two becomes more of an issue.

(Granted, IRL there needs to be robust infrastructure for database upgrades and downgrades no matter what -- but making previously code-only minor patches impact both components doesn't necessarily make things easier).

3. better access control in many situations. sometimes you want to get at data but don't want the users to have that kind of access. you can run a function as a higher level user but allow lower level users select access to the function.

Yup; that is indeed a good reason to use stored procedures or views.

4. faster/more accurate. in general, your DBA will write a better/faster query than your programmers.

Of course the DBA will write better queries; that's why I advocate making DBA review mandatory for code changes impacting the data access layer. In shops with a good DBA, the programmers will come to the DBA first when they have a complex query to write anyhow; that's what happens where I'm at presently. (Our DBA is a rockstar, incidentally poached from my last employer, and very well-respected; at that last job, however, we had a CEO's-college-buddy incompetent before we had the rockstar, and I'd have hated to see him hold the power your workflow would grant).

5. One less thing for your programmers to worry about. it means they can focus on writing the application (which is their job remember).

From the perspective of the programmers writing the data access layer (you're doing a proper tiered application with business logic and data access broken off from each other, right?), they need to worry about interfacing with the DB no matter what; your proposal reduces their scope considerably (by making the code they maintain effectively into a collection of nearly-opaque stubs referencing logic stored elsewhere), but certainly doesn't eliminate the relevant work from development's domain.

I'm largely playing Devil's Advocate here: What you're advocating is a good workflow, but I think that calling it the only good workflow is a serious misrepresentation -- the problems it addresses can be resolved through other means, and at least some of the benefits are two-sided.

Re:Mod parents down!!! by Craig+Ringer · 2008-08-03 19:19 · Score: 2, Informative

I think you might've missed the point.

The term SELECT statement generally refers to the whole statement, including FROM, WHERE, HAVING, etc clauses.

This is pretty clear in context, as it'd be nonsensical to produce a graphical explain tool for the result field list in the SELECT clause its self.

That's why the parent said SELECT statement not SELECT clause .

As it happens the same issues regarding the need for planner knowledge etc are true for DML like INSERT, UPDATE and DELETE. It's not about SELECT at all, but rather any non-DDL query.

Slashdot Mirror

Diagramming Tool For SQL Select Statements

36 of 156 comments (clear)