Diagramming Tool For SQL Select Statements
alxtoth writes "Snowflake is a new BSD-licensed tool that parses SQL Select statements and generates a diagram. It shows parts of the underlying SQL directly in the diagram. For example: x=30, GROUP BY (year), SUM (sales), HAVING MIN (age) > 18. The primary reason for the tool was to avoid Cartesian joins and loops in SQL written by hand, with many joined tables. The database will execute such a statement, if syntactically correct, resulting in runaway queries that can bring the database down. If you sit close to the DBAs, you can hear them screaming... "
No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.
Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?
So at worse, you're slowing down your own localhost development database engine for everyone else trying to access it (read: no one).
Not much for the DBA to scream about...
A link to an alpha project on Sourceforge that was created three days ago and doesn't even have its own website? That apparently outputs LaTeX tables instead of something readable without having to compile it first, like HTML, SVG, or even indented text? I know it's silly to expect every story to be about a cure for cancer, but come on...
Posted by kdawson
Do you even lift?
These aren't the 'roids you're looking for.
Execution of SQL statements can require the RDBMS to perform nested loops over parts of the query execution.
This can be an issue if the DBMS is forced to do something like perform a sequential scan of one table for each record matched in another table. That gets expensive *fast*.
There are many other possible performance issues, of course.
However, I don't see how SQL parsing can tell you much about the performance characteristics of the query. The database's query optimiser makes choices about how to execute the query, and is free to change its mind depending on configuration parameters, available resources, system load, disk bandwidth, present indexes, statistics gathered about data in the table, etc. PostgreSQL's planner for example does make heavy use of table statistics, so query plans may change depending on the quantity and distribution of data in a table.
Any decent database can already tell you how it will execute a query (and usually give you a performance readout from an actual execution of the query). There are plenty of GUI tools for displaying the resulting query plan output graphically. PgAdmin-II can do it, for example.
A simple SQL parser can have no idea about what indexes are configured, the distribution of the data, how much working memory the database has available for sorts and joins, etc. The database knows these things - and can already tell you how it will, or did, execute a query - so why not let it do its job?
The whole project doesn't make much sense.
Doesn't name WHICH RDBMS, and then you throw SQL at it? So what? For DB2 we have a thing called "Visual Explain" which NOT ONLY does this, but is free, provided by IBM, but also shows you other things like whch index is being used for each step, etc.
This is news? This isn't even worth a second look!
I don't see what this has over EXPLAIN and an appropriate graphical display tool like PgAdmin-III. There are large numbers of tools that display graphical query plans - and unlike this simple SQL parser, they know how the database will actually execute the query once the query optimiser is done with it.
Furthermore, a simple SQL parser has no idea about what indexes are present, available working memory for sorts and joins, etc. It can't know how the DB will really execute the query, without which it's hard to tell what performance issues may or may not arise.
See comment 24461217 for a more detailed explanation of why this whole idea makes very little sense.
Quis custodiet ipsos custodes?
That, and not using medium and low duty databases lile MSSQL and MySQL can go a very long way to keeping users happy.
Honestly, to describe MSSQL as "medium and low duty" is pretty rich. You'd best believe I'm happy to bash MS as much as the next guy but SQL Server is a high-performing, highly maintainable, high-availability database and doesn't deserve to be mentioned in the same sentence as MySQL.
Hell, MSSQL might actually be the only truly good product MS make -- in fact, it probably is. It's not a toy and people who assume it is, just because it comes from MS (I'm not saying this is what you're doing, but people DO do this) just show that they don't know what they're talking about.
Most PostgreSQL users don't seem to use the existing, and superior, tools like EXPLAIN, EXPLAIN ANALYZE, PgAdmin-III's graphical explain, etc. I'm sure the same is true for users of many other databases.
It's not like these tools are particularly difficult to use or understand. No training is required, though being willing to think and read a little documentation helps if you want to get the most out of them. Understanding at least vaguely how databases execute queries is handy for any database user anyway. The same understanding is required to get anything useful out of this just-posted tool.
Anyway, as I've noted elsewhere the exiting tools for this do a much better job due to integration with the RDBMS and superior knowledge of how the DB will execute the query.
I've noticed that when things go horribly wrong, you don't actaully have to sit that close. To be fair, as a Unix SA who has to deal with Windoze systems, I've done my fair share of screaming. :-)
It must have been something you assimilated. . . .
So SQL Server has had a graphical execution plan view for ever, and it's better than this lameness. But of course its not free, and we all know that free software is better, even when it sucks. Seriously, compare this to the real tools included with a serious RDBMS, and I have to question why this was even posted. It's almost farcical.
No sooner do I get over one, then you put a better one right next to me. Bastards.
Heck, even ANSI finally got into recursive SQL using the WITH clause:
Now let's imagine queries with multiple levels of nesting using such clauses - after all, any SELECT statement can generally be used in any FROM clause.
Now, perhaps you're Chris Date or Fabian Pascal and are truly concerned with the completeness of SQL as implementing the relational model. For the rest of us, however, recursive SQL can answer interesting questions without getting into the nastiness of procedural code.
Oh, and considering the default join in virtually any SQL database is a nested-loop join, I'd say all databases loop by default. And a statement as innocuous as :
Can absolutely crater cpu and I/O performance. If each has 1,000 rows and there's not enough memory, there's 1,000,001 table scans. Hope your disk is fast.
Can we have that in English please? Possibly with a diagram?
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
Generally what happens on my project is that the team (headed by an analyst) decides on the best design for the task, then subtasks are delegated to developers based on their level of skill with PL/SQL and/or Java.
Business logic (for the most part) is done on the server-side with PL/SQL packages, while the application itself is a Java fat client running on a Citrix cluster.
Before you make statements about keeping business logic separate from the database, this situation works well for this application, as it allows for less client-server communication, easier handling of commits and rollbacks, and much faster data access. A bonus is that when a severity 1 case is raised that is related to business logic, it doesn't require a long system outage.
The production server has read-only access for standard developers, and a logged full access account for support (and senior developers).
Every code change is reviewed by one or more senior developers to ensure it won't break existing functionality or contains (as you put it) "crapness".
From your comments I take it you are a DBA and have had bad experiences with poor programmers. In your case, maybe what you've suggested is a decent option for you, but I really don't think you should be stating it as the "right way".
As always, YMMV.
Homonyms are fun!
You're driving your car, but they're riding their bikes there.
Comment removed based on user account deletion
Counterargument: Less readable code, as it's split into two places.
Counterargument: Harder to maintain - more upgrades will require the database to be revved as opposed to only the application, and synchronization between the two becomes more of an issue.
(Granted, IRL there needs to be robust infrastructure for database upgrades and downgrades no matter what -- but making previously code-only minor patches impact both components doesn't necessarily make things easier).
Yup; that is indeed a good reason to use stored procedures or views.
Of course the DBA will write better queries; that's why I advocate making DBA review mandatory for code changes impacting the data access layer. In shops with a good DBA, the programmers will come to the DBA first when they have a complex query to write anyhow; that's what happens where I'm at presently. (Our DBA is a rockstar, incidentally poached from my last employer, and very well-respected; at that last job, however, we had a CEO's-college-buddy incompetent before we had the rockstar, and I'd have hated to see him hold the power your workflow would grant).
From the perspective of the programmers writing the data access layer (you're doing a proper tiered application with business logic and data access broken off from each other, right?), they need to worry about interfacing with the DB no matter what; your proposal reduces their scope considerably (by making the code they maintain effectively into a collection of nearly-opaque stubs referencing logic stored elsewhere), but certainly doesn't eliminate the relevant work from development's domain.
I'm largely playing Devil's Advocate here: What you're advocating is a good workflow, but I think that calling it the only good workflow is a serious misrepresentation -- the problems it addresses can be resolved through other means, and at least some of the benefits are two-sided.
I think you might've missed the point.
The term SELECT statement generally refers to the whole statement, including FROM, WHERE, HAVING, etc clauses.
This is pretty clear in context, as it'd be nonsensical to produce a graphical explain tool for the result field list in the SELECT clause its self.
That's why the parent said SELECT statement not SELECT clause .
As it happens the same issues regarding the need for planner knowledge etc are true for DML like INSERT, UPDATE and DELETE. It's not about SELECT at all, but rather any non-DDL query.