Slashdot Mirror


Diagramming Tool For SQL Select Statements

alxtoth writes "Snowflake is a new BSD-licensed tool that parses SQL Select statements and generates a diagram. It shows parts of the underlying SQL directly in the diagram. For example: x=30, GROUP BY (year), SUM (sales), HAVING MIN (age) > 18. The primary reason for the tool was to avoid Cartesian joins and loops in SQL written by hand, with many joined tables. The database will execute such a statement, if syntactically correct, resulting in runaway queries that can bring the database down. If you sit close to the DBAs, you can hear them screaming... "

17 of 156 comments (clear)

  1. Bring a database down? by Shados · · Score: 5, Informative

    No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

    Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

    So at worse, you're slowing down your own localhost development database engine for everyone else trying to access it (read: no one).

    Not much for the DBA to scream about...

    1. Re:Bring a database down? by Anonymous Coward · · Score: 5, Informative

      No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

      A real ACID-compliant database, no. MySQL, maybe.

      Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

      Unfortunately sometimes you do need to run new queries against production servers. Of course, with a real database like MSSQL or Oracle, you can see how a query will execute, what path the optimizer will follow, and what the cost of the query will be.

    2. Re:Bring a database down? by MBCook · · Score: 5, Informative

      Explain/describe exists in MySQL, it's just very hard to do.

      Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster). Is it possible to take MySQL down? Easily. It can be surprisingly easy to lock the server completely. Even when you select off one set of tables (A) and want to insert into another set (B, possibly in a different schema/DB) it is possible to have things locked. It's very easy. We haven't seen a crashing bug in MySQL in a while (fun: a query that formated dates with the date format function could reliably crash MySQL 4.0 or 4.1 (don't remember which).

      Does explain help? No. On Oracle it may help. In Postgres it seems to help. I have no experience with MSSQL. In MySQL you have to watch out. While it can be useful, it is very limited.

      It's row counts can be horribly useless. It can list 1.2 million rows when in fact it can take a fraction of a second to get the data because it's all in an index in memory.

      Worse: it will run the query for you. Under some circumstances (using a subquery can do it, using more than one level of subquery is almost guaranteed to do it) it will just run the inner query and then use that to produce results. This means that describe/explain can lock the database and take hours to return (if you had a query that was bad enough and didn't kill the describe/explain). It's all the fun of running the real query, without the results actually presented to you.

      Note: We're using 5.0 (since 5.1 isn't production ready yet). Some of this may be fixed.

      --
      Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    3. Re:Bring a database down? by Anonymous Coward · · Score: 3, Funny

      Terrabyte? A planet byte?

    4. Re:Bring a database down? by NerveGas · · Score: 3, Funny

      So, you don't put an untested query on a production server. Great. What happens when someone changes data in such a way that your query now explodes? :D

      In the last case I had to deal with that, one boneheaded programmer had his code set to send him an email if it couldnt' find a good match in the DB. Someone changed the data, and with the amount of traffic, his code, spread across our web serving farm, had injected almost a million messages into the email queues. Programmers are awesome.

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
    5. Re:Bring a database down? by russotto · · Score: 5, Informative

      Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster).

      Oh yes, Oracle can be brought to a grinding halt (even on substantial hardware) by a big nasty query. It may not be crashed, but it's nonresponsive. Especially annoying when there is no need for the cartesian product; Oracle's pessimizer just chose to do one when something else was MUCH more appropriate. Alas this tool would not catch that situation (but EXPLAIN PLAN does).

    6. Re:Bring a database down? by Craig+Ringer · · Score: 5, Informative

      craig:~$ psql
      Welcome to psql 8.3.3, the PostgreSQL interactive terminal.
      craig=> set statement_timeout=1000;
      SET
      craig=> SELECT generate_series(0,100000000000000000);
      ERROR: canceling statement due to statement timeout

  2. Still no cure for cancer? by hkz · · Score: 5, Informative

    A link to an alpha project on Sourceforge that was created three days ago and doesn't even have its own website? That apparently outputs LaTeX tables instead of something readable without having to compile it first, like HTML, SVG, or even indented text? I know it's silly to expect every story to be about a cure for cancer, but come on...

    1. Re:Still no cure for cancer? by LSD-OBS · · Score: 5, Informative

      Yup, not cool.

      Word to the wise: if you're going to actually start advertising a project, please make sure you have some binaries built for some common relevant platforms, and make sure you have some decent information online even if it's just an ugly page with screenshots or examples of what it does.

      In this case, we're talking about some scripts written in Python. At least let people know this on the front page, and list the project dependancies! ie, GraphViz, or whatever.

      This way, your potential users won't immediately discard it due to a lack of compelling information, and your potential (future) developers can see how far you've got and maybe get inspiration to chip in and help!

      That said, this sounds like it should be a great tool for beginner or intermediate SQL users, and I look forward to throwing a few of our mammoth 12-table-join queries at for much fun.

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
  3. Re:How do these stories get picked? by larry+bagina · · Score: 3, Insightful

    Posted by kdawson

    --
    Do you even lift?

    These aren't the 'roids you're looking for.

  4. Looking for a problem? by Craig+Ringer · · Score: 5, Insightful

    Execution of SQL statements can require the RDBMS to perform nested loops over parts of the query execution.

    This can be an issue if the DBMS is forced to do something like perform a sequential scan of one table for each record matched in another table. That gets expensive *fast*.

    There are many other possible performance issues, of course.

    However, I don't see how SQL parsing can tell you much about the performance characteristics of the query. The database's query optimiser makes choices about how to execute the query, and is free to change its mind depending on configuration parameters, available resources, system load, disk bandwidth, present indexes, statistics gathered about data in the table, etc. PostgreSQL's planner for example does make heavy use of table statistics, so query plans may change depending on the quantity and distribution of data in a table.

    Any decent database can already tell you how it will execute a query (and usually give you a performance readout from an actual execution of the query). There are plenty of GUI tools for displaying the resulting query plan output graphically. PgAdmin-II can do it, for example.

    A simple SQL parser can have no idea about what indexes are configured, the distribution of the data, how much working memory the database has available for sorts and joins, etc. The database knows these things - and can already tell you how it will, or did, execute a query - so why not let it do its job?

    The whole project doesn't make much sense.

    1. Re:Looking for a problem? by SQLGuru · · Score: 3, Insightful

      I use diagrams as a tuning tool, but only to look for paths that don't make sense or alternate paths through tables or for "dead-ends"......but these are things that a computer can't really tell you because they require an understanding of the data.

      But you're right, the explain plan is the single most useful tool for tuning a query. If you understand how the engine is going to execute the query you know what areas you can affect. And tuning is manipulaing those effects in a way that makes the query faster.

      Layne

  5. So, an alpha project for what exactly? by Mycroft_514 · · Score: 5, Interesting

    Doesn't name WHICH RDBMS, and then you throw SQL at it? So what? For DB2 we have a thing called "Visual Explain" which NOT ONLY does this, but is free, provided by IBM, but also shows you other things like whch index is being used for each step, etc.

    This is news? This isn't even worth a second look!

  6. EXPLAIN by Craig+Ringer · · Score: 5, Insightful

    I don't see what this has over EXPLAIN and an appropriate graphical display tool like PgAdmin-III. There are large numbers of tools that display graphical query plans - and unlike this simple SQL parser, they know how the database will actually execute the query once the query optimiser is done with it.

    Furthermore, a simple SQL parser has no idea about what indexes are present, available working memory for sorts and joins, etc. It can't know how the DB will really execute the query, without which it's hard to tell what performance issues may or may not arise.

    See comment 24461217 for a more detailed explanation of why this whole idea makes very little sense.

  7. In tablespace, no one can hear you scream... by fahrbot-bot · · Score: 3, Funny

    If you sit close to the DBAs, you can hear them screaming...

    I've noticed that when things go horribly wrong, you don't actaully have to sit that close. To be fair, as a Unix SA who has to deal with Windoze systems, I've done my fair share of screaming. :-)

    --
    It must have been something you assimilated. . . .
  8. Re:WTF by ahmusch · · Score: 3, Informative
    Really? Most of us would call recursive SQL "looping" SQL, and something like this in Oracle is recursive:

    SELECT LPAD(' ', 2*LEVEL, ' ' ) || ename empName, dname, job,
    sys_connect_by_path( ename, '/' ) cbp
    FROM emp e, dept d
    WHERE e.deptno = d.deptno
    CONNECT BY PRIOR empno = mgr
    ORDER SIBLINGS BY job;

    Heck, even ANSI finally got into recursive SQL using the WITH clause:

    with TransClosedEdges (tail, head) as
    ( select tail, head from Edges
    union all
    select e.tail, ee.head from Edges e, TransClosedEdges ee
    where e.head = ee.tail
    )
    select distinct * from TransClosedEdges;

    Now let's imagine queries with multiple levels of nesting using such clauses - after all, any SELECT statement can generally be used in any FROM clause.

    Now, perhaps you're Chris Date or Fabian Pascal and are truly concerned with the completeness of SQL as implementing the relational model. For the rest of us, however, recursive SQL can answer interesting questions without getting into the nastiness of procedural code.

    Oh, and considering the default join in virtually any SQL database is a nested-loop join, I'd say all databases loop by default. And a statement as innocuous as :

    select * from a, b, c;

    Can absolutely crater cpu and I/O performance. If each has 1,000 rows and there's not enough memory, there's 1,000,001 table scans. Hope your disk is fast.

  9. Comment removed by account_deleted · · Score: 5, Interesting

    Comment removed based on user account deletion