Slashdot Mirror


Diagramming Tool For SQL Select Statements

alxtoth writes "Snowflake is a new BSD-licensed tool that parses SQL Select statements and generates a diagram. It shows parts of the underlying SQL directly in the diagram. For example: x=30, GROUP BY (year), SUM (sales), HAVING MIN (age) > 18. The primary reason for the tool was to avoid Cartesian joins and loops in SQL written by hand, with many joined tables. The database will execute such a statement, if syntactically correct, resulting in runaway queries that can bring the database down. If you sit close to the DBAs, you can hear them screaming... "

156 comments

  1. Bring a database down? by Shados · · Score: 5, Informative

    No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

    Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

    So at worse, you're slowing down your own localhost development database engine for everyone else trying to access it (read: no one).

    Not much for the DBA to scream about...

    1. Re:Bring a database down? by Anonymous Coward · · Score: 5, Informative

      No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

      A real ACID-compliant database, no. MySQL, maybe.

      Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

      Unfortunately sometimes you do need to run new queries against production servers. Of course, with a real database like MSSQL or Oracle, you can see how a query will execute, what path the optimizer will follow, and what the cost of the query will be.

    2. Re:Bring a database down? by MBCook · · Score: 5, Informative

      Explain/describe exists in MySQL, it's just very hard to do.

      Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster). Is it possible to take MySQL down? Easily. It can be surprisingly easy to lock the server completely. Even when you select off one set of tables (A) and want to insert into another set (B, possibly in a different schema/DB) it is possible to have things locked. It's very easy. We haven't seen a crashing bug in MySQL in a while (fun: a query that formated dates with the date format function could reliably crash MySQL 4.0 or 4.1 (don't remember which).

      Does explain help? No. On Oracle it may help. In Postgres it seems to help. I have no experience with MSSQL. In MySQL you have to watch out. While it can be useful, it is very limited.

      It's row counts can be horribly useless. It can list 1.2 million rows when in fact it can take a fraction of a second to get the data because it's all in an index in memory.

      Worse: it will run the query for you. Under some circumstances (using a subquery can do it, using more than one level of subquery is almost guaranteed to do it) it will just run the inner query and then use that to produce results. This means that describe/explain can lock the database and take hours to return (if you had a query that was bad enough and didn't kill the describe/explain). It's all the fun of running the real query, without the results actually presented to you.

      Note: We're using 5.0 (since 5.1 isn't production ready yet). Some of this may be fixed.

      --
      Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    3. Re:Bring a database down? by Anonymous Coward · · Score: 3, Funny

      Terrabyte? A planet byte?

    4. Re:Bring a database down? by haltenfrauden27 · · Score: 0
      I mean, what people mean when they say "bring it down" is basically what you said - "slow it down a lot." In practice, there's no difference.

      This reminds me of something Bill Gates used to say about Windows. Something to the effect that if users were more patient many "crashes" would actually take care of themselves.

    5. Re:Bring a database down? by NerveGas · · Score: 3, Funny

      So, you don't put an untested query on a production server. Great. What happens when someone changes data in such a way that your query now explodes? :D

      In the last case I had to deal with that, one boneheaded programmer had his code set to send him an email if it couldnt' find a good match in the DB. Someone changed the data, and with the amount of traffic, his code, spread across our web serving farm, had injected almost a million messages into the email queues. Programmers are awesome.

      --
      Oh, you're not stuck, you're just unable to let go of the onion rings.
    6. Re:Bring a database down? by russotto · · Score: 5, Informative

      Is it possible to bring Oracle down? I would think so, it would just take a lot (note: assuming normal hardware, not a large high-power cluster).

      Oh yes, Oracle can be brought to a grinding halt (even on substantial hardware) by a big nasty query. It may not be crashed, but it's nonresponsive. Especially annoying when there is no need for the cartesian product; Oracle's pessimizer just chose to do one when something else was MUCH more appropriate. Alas this tool would not catch that situation (but EXPLAIN PLAN does).

    7. Re:Bring a database down? by Anonymous Coward · · Score: 0

      Trollly trolly trollllly
      likes to play with
      dolly dolly dolly
      hee hee hee
      ha ha ha
      I can do a dance
      I'm not wearing pants
      Bye Bye Bye!

    8. Re:Bring a database down? by Anonymous Coward · · Score: 1, Funny

      so you're a linux user. what else would you like to tell us?

    9. Re:Bring a database down? by Shados · · Score: 1

      In that case, your web farm and/or email server dies, while the database server is still purring along :)

    10. Re:Bring a database down? by killjoe · · Score: 2, Informative

      Depends on your database. I know I have been able to bring SQL server down with a query.

      Try this...

      begin transaction

      update rows set a=b where x=y

      commit transaction.

      On your workstation this could run really fast because you only have ten records. On the production database server this could crush the server if you had a few million records effected.

      --
      evil is as evil does
    11. Re:Bring a database down? by Shados · · Score: 1

      Not from where I'm standing. The query itself will be slow if it affects a lot of rows (if I use exactly the query you're giving, and that X is indexed, it will be near instant, no matter how many rows I have =P), but the server will purr along just fine with other queries (assuming its configured to use snapshot mode, else it could lock away other queries... but who doesn't use snapshot mode since its been available in SQL Server anyway? Especially after people been bitching about it from the day it was available in Oracle...)

    12. Re:Bring a database down? by GoofyBoy · · Score: 2, Informative

      >In practice, there's no difference.

      To a DBA its a big difference.

      1. Just a massive slow down - login (SQL Server there is a Dedicated Administrator Connection, don't think I've had problems connecting with a problem Oracle db as long as I can get on the OS (partly because sessions are processes)), and just kill the process. The DB should clean everything up. (as long as its not a toy db; I'm looking at you MySQL.)
      2. A crash - then you have to go through a whole number of steps to bring it up and then verify the data is ok, then let everyone back in. There may be an backup involved if you are unlucky. You definitely want to figure out what happened.

      --
      The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
    13. Re:Bring a database down? by Anonymous Coward · · Score: 0

      so you're a linux user. what else would you like to tell us?

      I also like wearing women's clothes.

    14. Re:Bring a database down? by mbourgon · · Score: 2, Interesting

      AHAHAHAHAHAHAHAHAHA. Since I do run terabyte-sized databases, I'll contradict you - poor queries _can_ tank a server, even with small tables, if the query is poor enough. While it technically may be running, if nobody else can access it, then for practical purposes the server is down. And never underestimate the ability of one user with enough knowledge to be dangerous, to spread that selfsame query across as many people as possible.

      --
      "Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
    15. Re:Bring a database down? by popeyethesailor · · Score: 1

      You can use Database Resource Management to control maximum execution time for queries in Oracle; it's designed to protect such scenarios.

      However, the "pessimizer" as you call it is not a magic wand - it's just decent heuristics for selecting the optimal path for data retrieval. It's pretty complex; and has taken decades to come to this level. It's damn good if you understand it and configure your instance correctly. Where it doesnt work, you always have had other options.

      And no, I dont work for Oracle.

    16. Re:Bring a database down? by Shados · · Score: 2, Informative

      Maybe my poor queries writing skills are bad :) Because I've seriously -tried- before... cross joins on 100+ tables, all of which containing several douzen gigs of data, totally multiple terabytes...the scheduling was good enough to give the query very low priority, leaving the server ok.

      If you use (in SQL Server at least) the default settings, that will basically render your database useless... but if you use the newer locking strategies from 2005 (which had been available in Oracle for ages), the tables won't be locked, and everything will be fine. Laggish for sure, but the server definately won't tank.

      Somewhere I used to work for, I would even run millions of inserts, continually, on our staging server (which was shared among a 50 or so devs) because I was testing an ETL routine... the server was slower for sure when I messed up and did a multi cross join on my insert source on a friday evening, but it never brought it down.

    17. Re:Bring a database down? by Anonymous Coward · · Score: 0

      If you are worried about adversly effecting system performance look at the query plan estimate before running the thing for hints that you may have missed something (Not only join problems but missing indexes, statistics...etc)

      If it still takes too long to run you can always cancel it or use your platforms query governer if you need to be that paranoid.

      Tools are great but in this case it seems like your platform tools should provide better coverage for detecting any problems.

    18. Re:Bring a database down? by reybrujo · · Score: 1

      Incidentally, your query crashes the referenced tool. Talk about a really nasty query ;-)

    19. Re: Bring a database down? by scdeimos · · Score: 2, Insightful

      No single query will ever bring a (real) RDBMS down. Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.

      You've obviously not tried anything simple on MS-SQL, like expanding a varchar(4) column to nvarchar(10) on a table with a few million rows. MS-SQL spins its wheels filling-up the transaction log until it overflows, then rolls it all back again. A 4GB log file, filled with a 250meg table (and no indexes because they were already dropped)?

      In the end we had to drop all FK refs, select * into another table, drop the original table then select * (with conversions) into newTableWithOriginal's name and reset all the FK's. *shakes head*

    20. Re: Bring a database down? by freedom_india · · Score: 1

      ...and that is why you should switch to DB2.
      What the heck are you doing with MS-SQL Server? Don't you know its for developers and kids?
      Trying to use SQL Server in production is like trying to cut your toenails with a straight razor: You may end up cutting your toenails ultimately, but you are likely to bleed yourself to death before that.

      --
      "Doing what i can, with what i have." ~ Burt Gummer
    21. Re:Bring a database down? by Tablizer · · Score: 1

      One feature that I dearly wish RDBMS vendors would provide is a time-limit clause. Example:

          SELECT * FROM foo TIMELIMIT 30

      This would ensure that the statement does not run for more than 30 seconds.
           

    22. Re:Bring a database down? by Forbman · · Score: 1

      Hmm... but what if X is a clustered index (or some other index type which physically orders the data on the physical storage...)? if you're updating the clustered index, you could wildly, inadvertently crush your server due to the disk I/O as data has to get reordered on the drive...

      Not the biggest fan of clustered indexes in Sql Server/Sybase... Oh well, I suppose in one of the next few releases of SQL Server that MS will figure out how Oracle does some more of their low-level black magic, and they will quietly deprecate clustered indexes... (that is, keep them in spirit, but change their implementation as a New Innovation!!!)...

    23. Re:Bring a database down? by Threni · · Score: 1

      I'm a fan of clustered indexes in SQL Server 2000, as they are generally fast than non-clustered indexes. Never managed to get them to crash, either. What am I doing wrong?

    24. Re:Bring a database down? by Venik · · Score: 2, Insightful

      SQL code is usually developed on some small server or the DBA's own workstation. The dev database is representative of the prod version only in structure and not in size. So this type of errors sometimes go unnoticed until the code is migrated to the prod environment. The effect of such errors vary depending on server architecture. The most sensitive are HA cluster environments, where the clustering engine overreacts and starts failing things over, exacerbating the problem.

    25. Re:Bring a database down? by Craig+Ringer · · Score: 5, Informative

      craig:~$ psql
      Welcome to psql 8.3.3, the PostgreSQL interactive terminal.
      craig=> set statement_timeout=1000;
      SET
      craig=> SELECT generate_series(0,100000000000000000);
      ERROR: canceling statement due to statement timeout

    26. Re:Bring a database down? by Anonymous Coward · · Score: 0

      I'm far from a SQL guru, but isn't that sort of situation the place to store user search results (vBulletin-esque) and query for the stored search before hitting the live data?

    27. Re:Bring a database down? by ultranova · · Score: 1

      Since I do run terabyte-sized databases, I'll contradict you - poor queries _can_ tank a server, even with small tables, if the query is poor enough. While it technically may be running, if nobody else can access it, then for practical purposes the server is down.

      This sounds like the server is doing potentially unbound amount of I/O or processing with a lock held. Otherwise the other queries should still run, just slightly slower due to increased load in the server. A query, no matter how poor, shouldn't be able to monopolize resources; it should share them equally with all other running queries, and merely take its sweet time to return.

      In other words, it sounds like a bug.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    28. Re:Bring a database down? by Shados · · Score: 1

      This type of error would go unnoticed? Err? If the query returns -the wrong data- (as a cross join by mistake would), and it goes unnoticed, something worse than your database crashing on you is waiting.

    29. Re:Bring a database down? by Slashcrap · · Score: 1

      Is it possible to bring Oracle down?

      Yes. Run Oracle 10.2.0.1 on SLES10 and wait for 49 days until the value returned by the times() syscall wraps around. Oracle shits itself. It can take up to 247 days on other distros.

      I'm sure they've fixed it since, but holy shit that's a stupid bug. What was that other software that crashed after 49 days? Windows 95. Unbreakable indeed.

    30. Re: Bring a database down? by lucm · · Score: 1

      > What the heck are you doing with MS-SQL Server? Don't you know its for developers and kids?

      You should call Myspace.com and let them know that they made a stupid database choice.

      Seriously, DB2 is so 1992.

      --
      lucm, indeed.
    31. Re:Bring a database down? by 192939495969798999 · · Score: 1

      It's not that you bring it "down", you just slow it down to the point of a noticeable performance hit to everyone else. I've seen queries written by people just smart enough to be very dangerous, where the query consumes 100% of resources for an hour (when if it was written correctly, the query would take 2 seconds). That's as good as "down" in some cases.

      --
      stuff |
    32. Re:Bring a database down? by hey! · · Score: 1

      The question is: is there a semantically equivalent query that DOESN'T overload the system?

      A cartesian product isn't necessarily a bad choice, if the product fits in memory. I personally haven't had this problem with Oracle, so I'd be interested in an example.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    33. Re:Bring a database down? by hey! · · Score: 1

      Well, a modern RBDMS is practically an operating system. This means that they way you bring it down doesn't involve the kinds of things a tool like this tells you. You probably need to do something procedural, involving a mix of tasks the RDBMS can't handle efficiently.

      Of course, it is always possible to tune things in a disastrous way. Oracle is an RDBMS that is highly tunable, which means that a lot of people make really bad choices, for example tuning things in a way that require greater memory without telling Oracle to use a larger amount of memory. Seen that one a number of times.

      In any case, the original point of RDBMSs were to decouple the specification of the data you want from the procedure used to get it. That's why the closure property is so important in relational algebra. An optimizer transforms a query or subquery into a set of semantically equivalent operations, choosing from a number of possible alternatives what it thinks might be the best one. For this reason, the graph structure of the query is not very useful in predicting what the database will do. It may not even look at some of the tables mentioned in the query, because it can get the answer from an index, or in some cases it may detect that a set of joins are redundant.

      The original goal of the RDBMS is this: to generate a plan that will be good enough, faster than a good programmer could come up with a better plan. The plan won't be better than the programmer's plan, but you'll have the data in hand while the programmer is still designing, coding and testing his program. This makes ad hoc queries economical to do.

      Over the years, optimizers have gotten better. It makes no sense to try to rewrite a query to run faster, at least without examining the explicit plan chosen by the optimizer. Even so, you can't count on that plan staying the same over time. The optimizer has runtime knowledge that the programmer does not (there is a new index on that field; index A has less entropy than index B, etc.).

      Most of the errors programers make in SQL are semantic. From a performance standpoint, this is still true. Sometimes a programmer generates SQL constraints that are unnecessarily complex, in a way that the optimizer can't deduce without reading his mind. Even if the optimizer should detect this, it might not. On the other hand, the programmer may in fact defeat the optimizer by issuing lots of simple SQL and assembling something procedurally that could be done in SQL.

      The upshot is that the best practice, IMO, is to "think in SQL". Let SQL do the work for you, that's what it's there for. Subsequently if there is a performance problem, try to simplify your SQL to eliminate redundancy (almost always better than making a query more complex to give hints to the optimizer). After that, consider both DBA type solutions to the problem and application design solutions.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    34. Re:Bring a database down? by Anonymous Coward · · Score: 0

      I also like wearing women's clothes.

      I never wanted to be a linux user! I wanted to be...

      A Lumberjack!

    35. Re:Bring a database down? by johnlcallaway · · Score: 1

      The statement about being 'no need for a Cartesian product' reminds me of a guy who said inner joins should never be used because they can bring a system down. Because he wrote one once that was a poor query and it did just that.

      Just because a person doesn't know when to use something properly, or uses it improperly from time to time, doesn't mean it has no use.

      I've used Cartesian joins before. Not very often, but I do recall using them in the past for very specific requirements. If memory serves me, it was a table that only had one row in it, and data in that row changed from time to time.

      --
      I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
    36. Re:Bring a database down? by CountBrass · · Score: 1

      Actually no. SQLServer 2000's mail integration is synchronous so if the mail server is down it hangs the database. Good one eh!

      --
      Bad analogies are like waxing a monkey with a rainbow.
    37. Re:Bring a database down? by russotto · · Score: 1

      The question is: is there a semantically equivalent query that DOESN'T overload the system?

      Yes, in the particular cases I ran into; I was able to reformulate the queries to avoid the problem. They were SELECT DISTINCT on a big hairy set of tables; by making that query into a subquery with SELECT ALL, and then doing a SELECT DISTINCT on the subquery, the problem was resolved.

    38. Re:Bring a database down? by Alpha830RulZ · · Score: 1

      Hm-m-m-m. I do this on a daily basis with one of our instances. We use SQL Server as a data preparation engine for a predictive modeling operation. This typically involves mass updates of the type you're specifying. I'm working on a set of data right now, with table sizes up to 45M rows. Some of the longer updates will run for 3 to 5 minutes, but that's hardly locking up the box. If you poorly construct the query, it will run for a couple of hours, but I don't call that bringing the database down, it's just stupid query development.

      For all the bashing of SQL Server, in 5 years of hard use over probably a dozen different instances, I have -never- seen it crash, or corrupt data, including indexes. We have an enitre line of business supported on SQL Server (online data searches for various sources), and it's been rock solid for us.

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
    39. Re:Bring a database down? by zevans · · Score: 1

      Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?

      What about, say, the ad-hoc area for Decision Science and other speculative data-mining? These analysts do multidimensional joins all day every day. This tool strikes me as very useful in getting a better grip on the consquences of the latest cockamamy scorecard idea, BEFORE it hits the server.

      --
      "... and more and more now there are all kinds of electronic goodies available" -- Pink Floyd 1972
    40. Re: Bring a database down? by Blakey+Rat · · Score: 1

      MS SQL Server is good and relatively cheap. The type of problem the grandparent mentioned exists in ALL DBMSes from ALL vendors. But there's nothing that DB2 can do that MS SQL can't, and MS SQL has great data-flow tools that come along with it to make actual use of the data.

      Now, Microsoft's graphical DB tools (SQL Server Management Studio) has gone backwards in many ways from the older Query Analyzer/Enterprise Manager toolset, but oh well.

    41. Re:Bring a database down? by Shados · · Score: 1

      Data mining should be done on the datawarehouse's OLAP cubes, and the queries you do aren't pure SQL at that point (with all the MDX and whatsnot), so the tool become useless.

    42. Re:Bring a database down? by Doctor+Faustus · · Score: 1

      Now, it could seriously slow down a production server, but... you're not pushing untested SQL on a production server now, are you? Right? Riiiiiiiiiiight?
      Not as part of a system, no, but if I just need answers, I run untested queries on production servers all the time.

    43. Re:Bring a database down? by Anonymous Coward · · Score: 0

      Hear, hear. I've never had a runaway query bring down my iSeries.

    44. Re:Bring a database down? by leenks · · Score: 1

      Err, there are ways of configuring decent RDBMS to have this kind of behaviour, albeit typically on a per user basis - just like you can limit the number of parallel queries a particular user can execute.

    45. Re:Bring a database down? by leenks · · Score: 1

      This is why there are "reference" systems - typically the same size and spec (or within the same ballpark) that contain the same data as the live system (or again, some representative sample of it) which are used for staging / integration / pre-release testing.

    46. Re:Bring a database down? by ahmusch · · Score: 1

      Oracle's "low-level black magic" is multi-version concurrency control, where it's segregated redo and undo operations, eliminating locks as a scarce resource in the system.

      And Oracle tracks clustering factor of every index. If you really want data to be clustered about an index in Oracle, one creates the table with the clause ORGANIZATION INDEX and the table is physically created as a B-tree index structure.

    47. Re:Bring a database down? by drharris · · Score: 1

      That's great, but isn't this a feature of the psql client? How about providing a server-side method for same?

    48. Re:Bring a database down? by dex.pdx · · Score: 1

      Killing other queries that seems like a bad idea... Why not index and and rate all the words in all the documents... A dictionary (think Websters) table with a many to many indexed relationship would I think improve your results. Just split all the words in the document (upon insert) sort for unique with count (turn in into a hash) then under the documents "meta" table you link to the "word id" from the dictionary table and sort by "count" (number of times the word occurs in the document + maybe rate of popularity for the document) That way the frequency for any common dictionary word is already calculated for in a search-able way for any document uploaded. Save the full-text search for deep searches.

    49. Re:Bring a database down? by Tim+C · · Score: 1

      Oracle... we use it for some backend stuff at work, and frankly I think a monkey with a pen and paper could be faster.

      Someone's done something wrong (probably stupidly wrong), as I've used Oracle on a system running stupidly complex queries against a table containing ~60 million rows, and while some queries can take a while most are surprisingly fast (sub-second or two). Yes, it's been designed and optimised to be fast - but that's what I meant about someone at your place having done something wrong...

    50. Re:Bring a database down? by rtaylor · · Score: 1

      No. That is a feature of the server and can be configured in postgresql.conf or on a per user basis (ALTER USER SET statement_timeout = ...).

      Just beware it applies to all statements. You may want to check those long running reports or maintenance queries.

      --
      Rod Taylor
    51. Re:Bring a database down? by Anonymous Coward · · Score: 0

      Oh I agree. My personal theory is they hired a consultant who didn't know the the f*ck he was doing, but whatever... I just use the webapp that connects to it and listen to customer complaints about our products getting timeouts to our central server... The monkey comment was just my personal frustration in our setup coming out ;)

    52. Re:Bring a database down? by dex.pdx · · Score: 1

      Well the phrase search could be implemented by first using indexed "word relations and scores" to slim down the search set: -> break phrase into "words" -> do join query for documents containing those words -> search for phrase inside of documents that contain those words I'm certainly not an expert in search engines or even relation databases for that matter. It just struck me as bad practice to-do query killing. Maybe something on the query level would work in your case: limit -> page -> timeout Limit your results to 50 per set if set exceeds limit implement paging and if the query exceeds a timeout (implemented in web app) the application kills the query on the client side and returns a meaningful "Query timeout, please try a less generic search term" message.

    53. Re:Bring a database down? by dindi · · Score: 1

      Dev, Testing, Staging, Prod ...

      Technically Staging is identical to production environment with the same amount of data. Testing has less powerful hardware, but many times just as much data to deal with.

      BTW with our more or less 100GIG database we run tests on WAY slower machines than what the actual DB server is. We do not have staging or testing, but one environment where we mess around. Development is pretty much the same, but with smaller DBs...

    54. Re:Bring a database down? by Venik · · Score: 1

      The point is, buggy SQL code running on a powerful production server with a huge database may manifest itself in a completely different way, as opposed to when running on a slow dev box in a minimal db environment. With growing volumes of data and shrinking IT budgets sometimes even the largest companies find themselves with inadequate testing and staging hardware.

    55. Re: Bring a database down? by freedom_india · · Score: 1

      myspace.com has been supplated by Facebook.
      And DB2 is the granddaddy that is being trusted by ALL banks.
      Give me a bank which does not store its data on DB2, and i will concede this.
      And banks are the most thorough corporate IT customers.

      --
      "Doing what i can, with what i have." ~ Burt Gummer
    56. Re:Bring a database down? by mgblst · · Score: 1

      You clearly don't use SQL Server, it has a great propensity to run out of memory and crash (OK, not great, but every now and again).

    57. Re:Bring a database down? by Anonymous Coward · · Score: 0

      >> Even on a terrabyte of data or more, doing a crazy multi-hundred-table cross join, you're not going to bring it down.
      Ummmm define "down". I define it as the point to where no useful work can be accomplished and I've seen it happen. It's a matter of scale - you run one of those silly-assed queries with enterprise amounts of data and the DBS starts building temp tables and locking rows (hey - I did say silly-assed didn't I?) and the DBS is going down.

    58. Re: Bring a database down? by dedazo · · Score: 1

      This is the database that added IDENTITY columns in 2003, correct? The same one that took about a decade to finally work correctly outside of the yay-cobol-million-row-scrollable-rowset model used in the OS/390 version, right?

      I think I'll stick with MSSQL and Oracle, thanks.

      --
      Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
    59. Re:Bring a database down? by daveclearyinmd · · Score: 1

      I work with customers doing Business Intelligence against Data Warehouses, and I can assure you that it is quite easy to develop runaway SQL Queries. Most of the queries we see are generated by some tool, and are not humanly readable or understandable. It is the nature of Business Intelligence that these queries are adhoc, untested. In my own data warehousing system, I frequently run into poor optimizer choices. Usually it involves nested queries, where common sense tells you the optimizer should run the innermost query first, store the temp results, then join in the outer query; but for some reason, the DB chooses to do the outer query first, and re-executes the 'static' inner query at every loop. I've had queries run for hours; then I extracted the inner query to an explicit temp table first, and the total query runs in seconds. Oracle also has a bad habit of choosing to do a hash join (instead of nested loops), only to find out that the hash join exceeds available RAM, and the temp results have to page to disk. Again, queries that should run in seconds thrash the machine for hours...

    60. Re: Bring a database down? by lucm · · Score: 2, Insightful

      > myspace.com has been supplated by Facebook.

      Facebook being more popular than mySpace has nothing to do with the database back-end. If you need more big customers for SQL Server 2005, they are easy to find: Barnes & Nobles, HMV online music store, NASDAQ (over 5000 transactions/sec).

      So basically your statement that SQL Server is a toy database might have attracted a few claps 6 or 7 years ago on Slashdot, but the reality is that SQL Server is a robust product finding its way in many markets. As one could say: "2001 just called, they want their SQL Server rant back".

      > And DB2 is the granddaddy that is being trusted by ALL banks.

      I do not have data about banks and database, however I suspect that since many smaller banks are still using OS/400, there must be a lot of DB2 out there. For the bigger banks, I sincerely doubt there is any RDBMS laying around, except for OLAP or e-banking, in which case DB2 won't be in the contenders (e-banking is the land of Oracle, SQL Server and sadly Interbase). For the real backbone, big money usually sticks to big iron, which usually means hierarchical databases.

      According to Gartner, the current market share is the following: Oracle 47%, IBM 21%, Microsoft 17%, with Microsoft closing the gap on IBM every year. And let's not forget that IBM's 21% includes Informix/Cheetah, Cloudscape, etc, not only DB2.

      > And banks are the most thorough corporate IT customers.

      This is an urban legend. Banks are not even the most conservative IT customers. I've been involved in three e-banking projects, and it amazed me how careless those people can be with data integrity and maintenance.

      The most impressive IT customers I met are insurance companies and space industry companies (not defense contractors). I've seen my way in many datacenters, and only in insurance companies did I see figures about the heat dissipation of network cable.

      --
      lucm, indeed.
    61. Re: Bring a database down? by freedom_india · · Score: 1

      For the bigger banks

      How about HSBC, Statestreet & Barclays? Both live on IBM.
      I lived with many large banks for over a decade, especially with HSBC.
      IBM is something they swear by. Not just because its well known, but because it is so good and thorough.
      Mid level banks, your figures are correct: Oracle and then SQL Server.
      I worked for a mid-level bank in CT which migrated from mainframe to Services-Bases Arch and uses Oracle as a back-end.
      Plus, security of data and privacy lawsuits terrify large banks more. I had to go through a FBI Security clearance cycle before statestreet allowed me to work on their scrambling data program.

      --
      "Doing what i can, with what i have." ~ Burt Gummer
  2. Might come in handy by Anonymous Coward · · Score: 0

    I know, it would have caught this


    SELECT *
        FROM Event E
        LEFT JOIN Document D ON D.DocumentTitle = 'Daily Note'
      WHERE D.DocumentTitle IS NULL

    Hint: LEFT JOIN runs in O(N) time where NOT IN runs in O(N2) time on this system.

    1. Re:Might come in handy by VGPowerlord · · Score: 1

      I'm surprised your RDBMS doesn't complain that only one table is referenced in the ON statement.

      Then again, I've never tried an ON statement that only references one table...

      --
      GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
    2. Re:Might come in handy by Anonymous Coward · · Score: 0

      "Don't prevent your users from doing stupid things as that will also prevent your users from doing cleaver things."

      I have never needed a left cartesian product, but I have needed a left join and a cartesion product in the same query. The easiest way to get it is ON 1 = 1.

      No, it doesn't complain, and I'm glad it doesn't.

      Thanks for noticing.

  3. Still no cure for cancer? by hkz · · Score: 5, Informative

    A link to an alpha project on Sourceforge that was created three days ago and doesn't even have its own website? That apparently outputs LaTeX tables instead of something readable without having to compile it first, like HTML, SVG, or even indented text? I know it's silly to expect every story to be about a cure for cancer, but come on...

    1. Re:Still no cure for cancer? by Anonymous Coward · · Score: 0

      That was submitted to Slashdot by the project's creator, no less...

    2. Re:Still no cure for cancer? by magarity · · Score: 1

      an alpha project on Sourceforge that was created three days ago
       
      ... and won't go much farther because even a casual reading of the explain plan can plainly show what the project is trying to track down. It sounds like a complete duplication of effort.

    3. Re:Still no cure for cancer? by LSD-OBS · · Score: 5, Informative

      Yup, not cool.

      Word to the wise: if you're going to actually start advertising a project, please make sure you have some binaries built for some common relevant platforms, and make sure you have some decent information online even if it's just an ugly page with screenshots or examples of what it does.

      In this case, we're talking about some scripts written in Python. At least let people know this on the front page, and list the project dependancies! ie, GraphViz, or whatever.

      This way, your potential users won't immediately discard it due to a lack of compelling information, and your potential (future) developers can see how far you've got and maybe get inspiration to chip in and help!

      That said, this sounds like it should be a great tool for beginner or intermediate SQL users, and I look forward to throwing a few of our mammoth 12-table-join queries at for much fun.

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
    4. Re:Still no cure for cancer? by Blakey+Rat · · Score: 2, Informative

      Ditto. I downloaded it to take a look and see how good it was at parsing T-SQL, since we have a few saved T-SQL queries with WHILE loops in them. I gave up after seeing it's... nothing. Just a Python script. It requires Graphviz, Python, and Pyparsing (even though it comes with pyparsing!? WTF!), and even more damning is that you can't use it for ad-hoc queries, the query has to be saved into a file first.

      Someone slap a GUI on this that lets you paste in a query, and bundle all the requirements along with the package, and then we might have something. Right now, I'll just stick with MS SQL Server's query grapher.

  4. Or... by yttrstein · · Score: 1

    You can do what I've done and seen done a number of times, and write a hunk of middleware that parses SQL statements for runaways and send back a warning to the user. That, and not using medium and low duty databases lile MSSQL and MySQL can go a very long way to keeping users happy.

    1. Re:Or... by mino · · Score: 2, Informative

      That, and not using medium and low duty databases lile MSSQL and MySQL can go a very long way to keeping users happy.

      Honestly, to describe MSSQL as "medium and low duty" is pretty rich. You'd best believe I'm happy to bash MS as much as the next guy but SQL Server is a high-performing, highly maintainable, high-availability database and doesn't deserve to be mentioned in the same sentence as MySQL.

      Hell, MSSQL might actually be the only truly good product MS make -- in fact, it probably is. It's not a toy and people who assume it is, just because it comes from MS (I'm not saying this is what you're doing, but people DO do this) just show that they don't know what they're talking about.

    2. Re:Or... by yttrstein · · Score: 1

      Let me know when MySQL is able to do a real hot dumb without locking tables, and I'll stick it right up there with Sybase on the spot. :)

    3. Re:Or... by yttrstein · · Score: 1

      Whoops! Hot dump that is. Wow. That was a great slip. But just to be very specific here, I was of course not bashing MSSQL or MySQL, but rather putting them in their proper place. Neither one of them has anywhere near the tools that Oracle or DB2 (IBM's implementation) do, but each does definitely have their place in the continuum of databases.

    4. Re:Or... by mino · · Score: 1

      Just out of curiosity, what features/tools do Oracle and DB2 have that MS SQL doesn't have, which keep it out of that 'level' in your opinion?

      I'm not disagreeing, just geniunely curious.

    5. Re:Or... by GoofyBoy · · Score: 1

      There were lots in SQL Server 2000

      With 2005 you really need to hit the high end stuff of Oracle/DB2.

      One example is Oracle RAC, multiple servers acting as one database using one set of data files. SQL Server has only really simple failover, I think they want you to use Windows Clustering(?).

      Native geospacial datatype is not in 2005 but I think is in 2008.

      Not run on Unix and the like. Yes, for some shops its required. Depending on the OS version, you hit file size and RAM limits.

      Pet Peeve - What's up with having a generic dbo, but then each user has its own schema?

      --
      The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
    6. Re:Or... by mino · · Score: 1

      Thanks GoofyBoy, interesting.

      I must admit I don't know that much about RAC -- but it doesn't really seem to me THAT different from clustering a SQL Server. I don't see a heap of difference between and Oracle Clusterware infrastructure and an active-active Windows Cluster Service (or whatever the hell they're calling it now, the damn thing changes names every 38 seconds) setup running SQL Server Enterprise (ditto).

      Fair enough on the other points -- they're not really things I'd consider as stopping it being an 'enterprise database' but that may be just because I've never hit those problems myself -- never done geospatial work, and running it on Windows-only has never REALLY been a problem wherever I've worked, but obviously YMMV. I don't know about the filesize and RAM thing though -- pick your tools for the job, I guess. Don't run SQL Server on 32-bit Windows Server on a FAT filesystem, any more than you would try and run a huge Oracle install on 32-bit Linux with (some crappy Linux filesystem).

      And as for the user/schema thing, yeah that's weird. But it's gone away in 2005. :)

    7. Re:Or... by triffid_98 · · Score: 1
      I am not a DBA, but one of the more useful differences is that you can perform block inserts against each of the RAC nodes independently. That means that you can perform your big table loading scripts in parallel instead of running them against one machine and mirroring it.

      Oh, and if you want to enforce query timeouts, that is supported in the user profile via CPU_PER_CALL (non-conforming queries are terminated and resources released)

      I must admit I don't know that much about RAC -- but it doesn't really seem to me THAT different from clustering a SQL Server

    8. Re:Or... by mino · · Score: 1

      You can do something similar to the divided-inserts in a SQL Server cluster, assuming your table is partitioned appropriately, but I agree that it doesn't sound like the same thing... and is a lot more fiddly.

      Oh, and if you want to enforce query timeouts, that is supported in the user profile via CPU_PER_CALL (non-conforming queries are terminated and resources released)

      In SQL Server, you can use "sp_configure 'query governor cost limit', xxxx" to do something very similar. It's elapsed time, not CPU time, but has the same basic effect.

  5. How do these stories get picked? by Hackerlish · · Score: 0, Troll

    Poorly Written Summary. Something not many people many if any will care about. Yesterday we the 'Language Translation Error' One-liner and now this? How did this story get picked?

    1. Re:How do these stories get picked? by larry+bagina · · Score: 3, Insightful

      Posted by kdawson

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

  6. When did infinite loop become an "invention" by bsharma · · Score: 0

    They have been there from the days of "Halting problem" (of Turing machines - before electronic computers were invented)

  7. Troll, but I'll bite anyway by Anonymous Coward · · Score: 0

    WITH A AS (
    SELECT 1
    UNION ALL
    SELECT 1
    ) SELECT *

    runs forever.

    1. Re:Troll, but I'll bite anyway by Anonymous Coward · · Score: 0

      Is this an SQL statement (ANSI/ISO)? I don't think so.

    2. Re:Troll, but I'll bite anyway by Anonymous Coward · · Score: 0

      Except for the fact that it's been so long since I used that construct that my syntax is probably off, it is in SQL'93.

    3. Re:Troll, but I'll bite anyway by Craig+Ringer · · Score: 1

      The WITH clause was apparently introduced in SQL:1999. It's still not particularly widely supported, though.

  8. Looking for a problem? by Craig+Ringer · · Score: 5, Insightful

    Execution of SQL statements can require the RDBMS to perform nested loops over parts of the query execution.

    This can be an issue if the DBMS is forced to do something like perform a sequential scan of one table for each record matched in another table. That gets expensive *fast*.

    There are many other possible performance issues, of course.

    However, I don't see how SQL parsing can tell you much about the performance characteristics of the query. The database's query optimiser makes choices about how to execute the query, and is free to change its mind depending on configuration parameters, available resources, system load, disk bandwidth, present indexes, statistics gathered about data in the table, etc. PostgreSQL's planner for example does make heavy use of table statistics, so query plans may change depending on the quantity and distribution of data in a table.

    Any decent database can already tell you how it will execute a query (and usually give you a performance readout from an actual execution of the query). There are plenty of GUI tools for displaying the resulting query plan output graphically. PgAdmin-II can do it, for example.

    A simple SQL parser can have no idea about what indexes are configured, the distribution of the data, how much working memory the database has available for sorts and joins, etc. The database knows these things - and can already tell you how it will, or did, execute a query - so why not let it do its job?

    The whole project doesn't make much sense.

    1. Re:Looking for a problem? by SQLGuru · · Score: 3, Insightful

      I use diagrams as a tuning tool, but only to look for paths that don't make sense or alternate paths through tables or for "dead-ends"......but these are things that a computer can't really tell you because they require an understanding of the data.

      But you're right, the explain plan is the single most useful tool for tuning a query. If you understand how the engine is going to execute the query you know what areas you can affect. And tuning is manipulaing those effects in a way that makes the query faster.

      Layne

    2. Re:Looking for a problem? by Anonymous Coward · · Score: 0

      Execution of SQL statements can require the RDBMS to perform nested loops over parts of the query execution.

      While looping to execute the query plan may have a significant impact on performance, this isn't the same as looping being part of the language itself (although it apparently can in later, less widely-supported versions of the SQL spec).

      As long as the number of iterations for each loop is more or less fixed (known data set, static constraints), any looping performed by the RDBMS is exactly equivalent to unrolling the loop into a single, massively long, completely deterministic operation--which means the language is part of the set of regular languages, and therefore not Turing complete.

  9. So, an alpha project for what exactly? by Mycroft_514 · · Score: 5, Interesting

    Doesn't name WHICH RDBMS, and then you throw SQL at it? So what? For DB2 we have a thing called "Visual Explain" which NOT ONLY does this, but is free, provided by IBM, but also shows you other things like whch index is being used for each step, etc.

    This is news? This isn't even worth a second look!

    1. Re:So, an alpha project for what exactly? by Hackerlish · · Score: 2, Informative

      Watch out! Anyone pointing out how a kdawson story isn't news gets moderated down as a troll. I can't even work out how this got out of the firehose.

    2. Re:So, an alpha project for what exactly? by Mycroft_514 · · Score: 1

      I have karma to burn, and I don't take kindly to anyone (even the powerrs that be) modding down when what I say is based upon fact!

      Look to my credentials, only been a DBA for 20+ years!

    3. Re:So, an alpha project for what exactly? by blantonl · · Score: 1

      Look to my credentials, only been a DBA for 20+ years!

      Yes, but a DB2 developer.

      And remember, you don't look "to" credentials, you look "at" them.

      --
      Lindsay Blanton
      RadioReference.com
    4. Re:So, an alpha project for what exactly? by Matchstick · · Score: 1

      And remember, you don't look "to" credentials, you look "at" them.

      PHLEBAS the Phoenician, a fortnight dead,
      Forgot the cry of gulls, and the deep seas swell
      And the profit and loss.
                                                          A current under sea
      Picked his bones in whispers. As he rose and fell
      He passed the stages of his age and youth
      Entering the whirlpool.
                                                          Gentile or Jew
      O you who turn the wheel and look to windward,
      Consider Phlebas, who was once handsome and tall as you.

  10. i smell a cheap trick... by JonnyChaos · · Score: 1

    posted by the admin of the project? the spam tag is accurate... "yes this is an open source clearing house, no we will not all rapidly sign up to your cute little project." though, i would be willing to be this is a Masters Thesis project and alxtoth is hoping to get some fast-tracking going on...

    --
    we were somewhere just out of Barstow when the patent trolls attacked.
  11. your programmers shouldn't be writing SQL by timmarhy · · Score: 1
    all your apps should only be able to access the DB as unprivileged users with resource limits to prevent crashing, and they should only be able to run stored functions which someone qualified at sql creates for the application guys.

    this way the programmers are prevented from infecting the database from their crapness

    --
    If you mod me down, I will become more powerful than you can imagine....
    1. Re:your programmers shouldn't be writing SQL by lastchance_000 · · Score: 2, Insightful

      Quis custodiet ipsos custodes?

    2. Re:your programmers shouldn't be writing SQL by KeithJM · · Score: 1

      For things like reports, your developers have to write complex SQL. You can argue that it shouldn't be a developer, instead it should be done by a "development DBA" or whatever, but essentially whoever writes the SQL IS the developer for reports. Even experienced DBAs can leave out a join in a complex (10 or more table) query, and it often isn't found if it's only run against a development and/or QA database with limited data and no real load. Cartesian products should be found if anyone actually reads the output in QA, but that doesn't always seem to be the case.

    3. Re:your programmers shouldn't be writing SQL by TheRealMindChild · · Score: 1

      Sure, things that actually use the database for production shouldn't be trying to do dirty things with it, but developers, whether dedicated "DBA's", or the poor shop with only one tech guy of any kind, need to be able to "play" with the database to be able to tweak it and ... well... do anything of meaning other than retrieve data. Sometimes this can be dangerous, but this is why they are testing on a development server... right?

      --

      "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
    4. Re:your programmers shouldn't be writing SQL by timmarhy · · Score: 1
      no. have your DBA/database guy write a function to return the datasets you need and use THAT to make the report. I understand some shops can't afford to have both and in that case you must work with what you got, but SQL in the code is bad bad bad.

      this is one reason i've always been critical of OSS and it's lack of a real report developing environment like reporting services.

      --
      If you mod me down, I will become more powerful than you can imagine....
    5. Re:your programmers shouldn't be writing SQL by Samah · · Score: 2, Interesting

      Generally what happens on my project is that the team (headed by an analyst) decides on the best design for the task, then subtasks are delegated to developers based on their level of skill with PL/SQL and/or Java.
      Business logic (for the most part) is done on the server-side with PL/SQL packages, while the application itself is a Java fat client running on a Citrix cluster.
      Before you make statements about keeping business logic separate from the database, this situation works well for this application, as it allows for less client-server communication, easier handling of commits and rollbacks, and much faster data access. A bonus is that when a severity 1 case is raised that is related to business logic, it doesn't require a long system outage.
      The production server has read-only access for standard developers, and a logged full access account for support (and senior developers).
      Every code change is reviewed by one or more senior developers to ensure it won't break existing functionality or contains (as you put it) "crapness".
      From your comments I take it you are a DBA and have had bad experiences with poor programmers. In your case, maybe what you've suggested is a decent option for you, but I really don't think you should be stating it as the "right way".
      As always, YMMV.

      --
      Homonyms are fun!
      You're driving your car, but they're riding their bikes there.
    6. Re:your programmers shouldn't be writing SQL by cduffy · · Score: 1

      You can tout that as "the right way", but there's still no reason this has to be a technical-design issue rather than a process-design issue -- and while my background is as an OSS groupie, I've been the OSS groupie at enough proprietary shops (ie. the party responsible for dealing with upstream on projects used as underlying infrastructure for actually running the proprietary software we built) that I can say with a fair bit of confidence that the approach you're espousing just isn't all that popular in The Real World.

      Isolate your queries into a specific set of packages and require DBA review before changes to those packages can be promoted to the production branch; there are plenty of ways to keep developers from messing up production by writing bad queries less heavy-handed than the strict access-control approach given, and you should have a revision control infrastructure enforcing code review anyhow.

      Re "lack of a real report developing environment" -- DataVision? Agata Report?

    7. Re:your programmers shouldn't be writing SQL by timmarhy · · Score: 1
      Oh i'm well aware it isn't popular in the real world, because it usually cuts developers grass.

      it's better in the following ways:

      1. more readable code, there is less of it

      2. easier to maintain - change in the database and the change happens realtime, no need to a new release (if your doing binaries)

      3. better access control in many situations. sometimes you want to get at data but don't want the users to have that kind of access. you can run a function as a higher level user but allow lower level users select access to the function.

      4. faster/more accurate. in general, your DBA will write a better/faster query than your programmers.

      5. One less thing for your programmers to worry about. it means they can focus on writing the application (which is their job remember).

      I've had people fight me tooth and nail on this before but they always end up coming around. a lot of the time it's also about job security - people worry taking a task off their hands threatens their jobs.

      --
      If you mod me down, I will become more powerful than you can imagine....
    8. Re:your programmers shouldn't be writing SQL by cduffy · · Score: 2, Interesting

      1. more readable code, there is less of it

      Counterargument: Less readable code, as it's split into two places.

      2. easier to maintain - change in the database and the change happens realtime, no need to a new release (if your doing binaries)

      Counterargument: Harder to maintain - more upgrades will require the database to be revved as opposed to only the application, and synchronization between the two becomes more of an issue.

      (Granted, IRL there needs to be robust infrastructure for database upgrades and downgrades no matter what -- but making previously code-only minor patches impact both components doesn't necessarily make things easier).

      3. better access control in many situations. sometimes you want to get at data but don't want the users to have that kind of access. you can run a function as a higher level user but allow lower level users select access to the function.

      Yup; that is indeed a good reason to use stored procedures or views.

      4. faster/more accurate. in general, your DBA will write a better/faster query than your programmers.

      Of course the DBA will write better queries; that's why I advocate making DBA review mandatory for code changes impacting the data access layer. In shops with a good DBA, the programmers will come to the DBA first when they have a complex query to write anyhow; that's what happens where I'm at presently. (Our DBA is a rockstar, incidentally poached from my last employer, and very well-respected; at that last job, however, we had a CEO's-college-buddy incompetent before we had the rockstar, and I'd have hated to see him hold the power your workflow would grant).

      5. One less thing for your programmers to worry about. it means they can focus on writing the application (which is their job remember).

      From the perspective of the programmers writing the data access layer (you're doing a proper tiered application with business logic and data access broken off from each other, right?), they need to worry about interfacing with the DB no matter what; your proposal reduces their scope considerably (by making the code they maintain effectively into a collection of nearly-opaque stubs referencing logic stored elsewhere), but certainly doesn't eliminate the relevant work from development's domain.

      I'm largely playing Devil's Advocate here: What you're advocating is a good workflow, but I think that calling it the only good workflow is a serious misrepresentation -- the problems it addresses can be resolved through other means, and at least some of the benefits are two-sided.

    9. Re:your programmers shouldn't be writing SQL by timmarhy · · Score: 1
      Ok your argument makes sense that the DB could change something and it'd be invisible to the programmers, and impossible for them to work around but that comes down to common communication, and if things were that bad this would be the LEAST of your worries ;)

      i'm certainly not advocating patch culture either, but it's invenitable that you'll need to. in the case of most places i've worked you have a centralized database which is far easier to update than all the clients

      --
      If you mod me down, I will become more powerful than you can imagine....
  12. EXPLAIN by Craig+Ringer · · Score: 5, Insightful

    I don't see what this has over EXPLAIN and an appropriate graphical display tool like PgAdmin-III. There are large numbers of tools that display graphical query plans - and unlike this simple SQL parser, they know how the database will actually execute the query once the query optimiser is done with it.

    Furthermore, a simple SQL parser has no idea about what indexes are present, available working memory for sorts and joins, etc. It can't know how the DB will really execute the query, without which it's hard to tell what performance issues may or may not arise.

    See comment 24461217 for a more detailed explanation of why this whole idea makes very little sense.

    1. Re:EXPLAIN by Anonymous Coward · · Score: 0

      Yeah this story seems to be ignorant of a simple explain plan which is available in every major rdbms that I'm aware of. It will show you cartesian joins, nested loops, use of indexes and partitions, etc. There are some differences with regards to which execution path will be taken (like cost or rule based execution decisions, hints, etc), but the analyst should be aware of this and use the explain plan to tweak their queries for optimal results. Tools like SQL Analyzer or TOAD display the explain plans in a reasonably user friendly and convenient manner, or a script can be built to dump it into a nested or execution ordered text file for analysis.

      If the rdbms doesn't have some sort of explain tool you're in deep trouble anyway. Particularly it isn't always clear which index will be used if you specify in your join or where clauses multiple indexed fields without looking at the explain.

      This tool just looks like it produces pretty pictures.

    2. Re:EXPLAIN by Craig+Ringer · · Score: 2, Informative

      Another comment here revealed part of why someone might think a tool like this was useful:

      In MySQL, EXPLAIN apparently works more like PostgreSQL's EXPLAIN ANALYZE (and related features in other RDBMSs). MySQL's EXPLAIN actually executes the query rather than just running it through the query planner. The documentation even warns that data modification is possible with EXPLAIN in some circumstances.

      If your database gives you no way to ask the query planner what it will do without actually executing the query, something like this begins to look faintly useful. Personally, though, I can't imagine voluntarily using such a database.

    3. Re:EXPLAIN by daveime · · Score: 1

      AFAIK, THe MySQL doesn't actually select the rows, but does determine which, if any, of the available indices it WOULD use if the query was run, using the cardinality stats stored in the indices themselves.

      So you can see from the EXPLAIN that he had indices "a","b", and "c" available, and chose "a" as it would probably return the fewest rows if the query was run.

      Sometimes it can cause problems, as his optimizer tends to always choose the index that returns the fewest rows, but experience has shown me that sometimes this doesn't always equate to the fastest response time.

      Lest we forget that if an index is not stored in memory, he's still going to have the I/O overhead of loading the MYI (index file) from disk anyway.

    4. Re:EXPLAIN by dedazo · · Score: 1

      So in other words this is useful when your DBMS toolset sucks.

      --
      Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
  13. Existing tools by Craig+Ringer · · Score: 2, Interesting

    Most PostgreSQL users don't seem to use the existing, and superior, tools like EXPLAIN, EXPLAIN ANALYZE, PgAdmin-III's graphical explain, etc. I'm sure the same is true for users of many other databases.

    It's not like these tools are particularly difficult to use or understand. No training is required, though being willing to think and read a little documentation helps if you want to get the most out of them. Understanding at least vaguely how databases execute queries is handy for any database user anyway. The same understanding is required to get anything useful out of this just-posted tool.

    Anyway, as I've noted elsewhere the exiting tools for this do a much better job due to integration with the RDBMS and superior knowledge of how the DB will execute the query.

  14. Halting problem by Craig+Ringer · · Score: 1

    If it was, such a query analysis tool would be provably incapable of handling all queries because of the halting problem.

    Thankfully most SQL dialects are limited to expressing queries that can be executed in finite time with a defined end point.

    1. Re:Halting problem by Craig+Ringer · · Score: 1

      Yes - exactly my point. Please read my post and its parent.

  15. Is this by jchawk · · Score: 1

    They day of the python? There were 2 different tools written in python.

    I'm not flaming, I just thought it was interesting as I loved programming in python when in college!

    1. Re:Is this by cduffy · · Score: 1

      I'm not flaming, I just thought it was interesting as I loved programming in python when in college!

      "When in college"? Why not now?

      Python jobs aren't hard to find.

    2. Re:Is this by jchawk · · Score: 1

      Different career direction. I was working for a ISP in college and ended up a systems types guy rather then programmer and haven't done much programming since then, other then shell scripting / etc...

    3. Re:Is this by cduffy · · Score: 1

      Heh; I consider myself a systems-type guy, and most of what I do is debugging or extending other peoples' code in C, writing my own tools in Python or shell, and helping more specialized folks (Java developers, pure sysadmins) debug cross-functional problems.

    4. Re:Is this by jchawk · · Score: 1

      Funny how it works out isn't it?

  16. In tablespace, no one can hear you scream... by fahrbot-bot · · Score: 3, Funny

    If you sit close to the DBAs, you can hear them screaming...

    I've noticed that when things go horribly wrong, you don't actaully have to sit that close. To be fair, as a Unix SA who has to deal with Windoze systems, I've done my fair share of screaming. :-)

    --
    It must have been something you assimilated. . . .
    1. Re:In tablespace, no one can hear you scream... by Anonymous Coward · · Score: 0

      Windoze? LOL, good one!

  17. Mod parent up. by Animats · · Score: 1

    That's right. Mod parent up. Any tool that only looks at the SELECT statement, without knowing about the indices or what the optimizer is doing, is nearly useless.

  18. Are you serious? by SpasticWeasel · · Score: 2, Informative

    So SQL Server has had a graphical execution plan view for ever, and it's better than this lameness. But of course its not free, and we all know that free software is better, even when it sucks. Seriously, compare this to the real tools included with a serious RDBMS, and I have to question why this was even posted. It's almost farcical.

    --
    No sooner do I get over one, then you put a better one right next to me. Bastards.
    1. Re:Are you serious? by Shados · · Score: 1

      Not free as in Freedom, but definately free as in free beer. The SQL Management Studio Express version has the execution plan stuff, and it works quite peachy, for free.

    2. Re:Are you serious? by Anonymous Coward · · Score: 0

      Not exactly free as in beer when you have to buy an expensive operating system to run it and pay for most uses of it.

    3. Re:Are you serious? by weicco · · Score: 2, Insightful

      Oh damn. I had to buy a whole PC set to run Debian. I guess Linux isn't free after all.

      --
      You don't know what you don't know.
  19. Re:WTF by ahmusch · · Score: 3, Informative
    Really? Most of us would call recursive SQL "looping" SQL, and something like this in Oracle is recursive:

    SELECT LPAD(' ', 2*LEVEL, ' ' ) || ename empName, dname, job,
    sys_connect_by_path( ename, '/' ) cbp
    FROM emp e, dept d
    WHERE e.deptno = d.deptno
    CONNECT BY PRIOR empno = mgr
    ORDER SIBLINGS BY job;

    Heck, even ANSI finally got into recursive SQL using the WITH clause:

    with TransClosedEdges (tail, head) as
    ( select tail, head from Edges
    union all
    select e.tail, ee.head from Edges e, TransClosedEdges ee
    where e.head = ee.tail
    )
    select distinct * from TransClosedEdges;

    Now let's imagine queries with multiple levels of nesting using such clauses - after all, any SELECT statement can generally be used in any FROM clause.

    Now, perhaps you're Chris Date or Fabian Pascal and are truly concerned with the completeness of SQL as implementing the relational model. For the rest of us, however, recursive SQL can answer interesting questions without getting into the nastiness of procedural code.

    Oh, and considering the default join in virtually any SQL database is a nested-loop join, I'd say all databases loop by default. And a statement as innocuous as :

    select * from a, b, c;

    Can absolutely crater cpu and I/O performance. If each has 1,000 rows and there's not enough memory, there's 1,000,001 table scans. Hope your disk is fast.

  20. SQL injection detection by rgovostes · · Score: 1

    I've thought about such a tool for detecting SQL injection. Essentially, you have a whitelist of SELECT statement "diagrams" stored somewhere. Before running a query, you generate a diagram of the current statement and check it against the diagram.

    Of course a better investment would be to write your code the right way first...

    1. Re:SQL injection detection by Shados · · Score: 1

      Besides, for legacy applications that were coded like shit, there's already a bunch of tools that will scan them (either the application, black box style, though mostly for the web, or the code, white box style) to find sql injection vulnerabilities.

      Still sad that even to this day, if you go to your favorite programming language XYZ forum, half of the newbies use concatenated strings, because a large amount of tutorials on the net do it that way...

  21. I'm screaming from the summary. by sootman · · Score: 2, Funny

    Can we have that in English please? Possibly with a diagram?

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
  22. Comment removed by account_deleted · · Score: 5, Interesting

    Comment removed based on user account deletion

  23. Re:Mod parents down!!! by lucm · · Score: 1

    > Any tool that only looks at the SELECT statement, without knowing about the indices or what the optimizer is doing, is nearly useless.

    This is wrong. The indices or optimizer have very little to do with the SELECT; at the time of processing the SELECT clause, most database engines already are done with the FROM, JOIN, WHERE, GROUP and HAVING clauses. At this point there will be little gain to add/drop indices from the query plan, unless the platform does support included fields. As such, SELECT is like ORDER BY, a final operation that has little weight in the overall plan (unless there is a huge amount of fields, but even then...)

    For long queries with complex joins (like recursion), a diagram tool for SELECT can be very helpful, because it can help you validate that you actually get what you ask for. I would see this product as a logical optimization tool, not a physical one.

    --
    lucm, indeed.
  24. What crap by Anonymous Coward · · Score: 0

    Nobody with any sense is going to use this because it's under that fucking evil BSD license, right Linuxistas?

    Bite me.

  25. Re:Mod parents down!!! by Craig+Ringer · · Score: 2, Informative

    I think you might've missed the point.

    The term SELECT statement generally refers to the whole statement, including FROM, WHERE, HAVING, etc clauses.

    This is pretty clear in context, as it'd be nonsensical to produce a graphical explain tool for the result field list in the SELECT clause its self.

    That's why the parent said SELECT statement not SELECT clause .

    As it happens the same issues regarding the need for planner knowledge etc are true for DML like INSERT, UPDATE and DELETE. It's not about SELECT at all, but rather any non-DDL query.

  26. ANSI Joins by 8bit · · Score: 1

    Just use ANSI style joins. You're guaranteed never to accidentally do a cartesian join.

    For example:
    Normal join - "SELECT * FROM table1 t1 JOIN table2 t2 ON (t1.column = t2.column)"
    Cartesian join - "SELECT * FROM table1 t1 CROSS JOIN table2 t2"

    Big difference, no mistakes no matter how many tables/columns you're pulling in to your query.

    --

    --Roy
    1. Re:ANSI Joins by Shados · · Score: 1

      SELECT * FROM Table1 t1 join table2 t2 on t1.column = t1.column.

      WHOOPS. Its such a easy mistake to make, too, especially if you use silly non-significant aliases like T1, T2, T3 instead of shortening the name to something meaningful.

  27. Wanna slashvertise your v0.01 software? by Anonymous Coward · · Score: 0

    Build a website for it. (Don't have "Homepage" link to an empty directory.) Take screenshots in a decent resolution. Offer an open forum.

  28. Re:Mod parents down!!! by lucm · · Score: 1

    > I think you might've missed the point.
    > The term SELECT statement generally refers to the whole statement, including FROM, WHERE, HAVING, etc clauses.

    I did not miss the point:
    >> For long queries with complex joins (like recursion), a diagram tool for SELECT can be very helpful

    I simply disagree that the product is useless without knowing about the indices. Before one should start reviewing query plans and figuring out what index is important, one must make sure the query makes sense. Logical before physical.

    A good example of this is a query with a LEFT JOIN. Applying filters in the JOIN or in the WHERE clause might look similar (it is with INNER JOIN), but you may get very different results with LEFT since the non-matching rows from the left table are added to the result AFTER the filter on the JOIN is applied. With a graphical tool, you can see it immediately and reformulate your query accordingly.

    --
    lucm, indeed.
  29. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  30. Grammar nazis to the rescue by Anonymous Coward · · Score: 0

    Ermm, yes, but 'windward' is a direction. Let me see... 'Never Eat Shredded Wheat' - nope, don't see 'credentials' on the compass, so I don't think they count as a direction, so I suppose that makes them something we look 'at'. Nice quote though.

    1. Re:Grammar nazis to the rescue by Mycroft_514 · · Score: 1

      Actually looking "to" is correct within the original context. But you go ahead and think what you think.

  31. Re:Mod parents down!!! by Anonymous Coward · · Score: 0

    Without knowing the indexes how are you going to know what it will do on the disk? The indexes are how the planner decides what data is in or out many times.

    If you had a disk/memory with 0ns seek time then yes this makes more sense (as you do not need indexes as much). But most issues are with either some sort of loop (which this tool sounds like it can help with), or a poor plan (ie borked indexes).

    SELECT *
    FROM X
    WHERE Y=99

    on a system where there is 100 rows that will probably come back fast.
    on a system where there is 1 billion rows it will take ages. (seconds depeding on if the data is up in memory and how big the rows are)
    on a system where there is 1 billion rows and an index on Y it will come back in a reasonable time.

    Without knowing the indexes you do not know if you are getting a full table scan (a large loop) or if the planner can get you to 90% the way to the data.

    Indexes sound bad up front but are a wonderful tool. I have seen them abused too. However, almost all things in comp sci can abused.

    I as a DBA am more worried about will the query finish FAST and not lock up crazy amounts of the database while doing it. Both things help minimize deadlocks and raise customer satisfaction.

    Also indexes help with the locking issue. What parts of the DB are locked so the data is consistant. If you can narrow it down to row level; you get speed from other statments in your system.

    I too in my day have written the big crazy statment. This tool probably would help disect what something like that does. But that is NOT MAINTAINABLE. If you need a tool to tell you what it is doing then your doing it wrong. I usually take a step back and break it down into a bunch of smaller simpler statments. This lets *ME* know what it is doing and someone eles who has to maintain it do so. Breaking it down also removes the nested loop problem quite nicely.

    The problem is that sql lets you put statments inside of statments. Putting those in the wrong places can be devistating in performance. The rule is pretty easy. You can put select statments in the select clause or where clause areas but be prepared for it to take awhile to finish. Put them in the FROM clause where they belong.

    When I have a perf issue I almost always start with the disk reads/writes which is almost always an index is messed. THEN I move to logic. A better tool is the one that samples how long different ones take so you can figure out which queries are messed. It is usually pretty obvious what is messed up at that point when an index doesnt take care of it. It is a data layout issue (where does you data live) or a data flow design issue (how are you asking for the data). Many times it is not even the queries themselves. It is the fact that a programmer decided to call that one procedure that takes .1 seconds to execute, 200k times in a row. Instead of using his brain and writting a new proc that returns all of the data he needs in a table. You can spend all your time getting it down to .05 or fix the real issue that the design is broken/missing something.

  32. You missed a keyword... by argent · · Score: 1

    You missed the parenthesized word in "No single query will ever bring a (real) RDBMS down."

  33. Re:Mod parents down!!! by lucm · · Score: 1

    > The problem is that sql lets you put statments inside of statments. Putting those in the wrong places can be devistating in performance. The rule is pretty easy. You can put select statments in the select clause or where clause areas but be prepared for it to take awhile to finish. Put them in the FROM clause where they belong.

    The SQL optimizer will actually do that for you. It can replace a correlated subquery by a JOIN if this appears to me more optimal.

    > When I have a perf issue I almost always start with the disk reads/writes which is almost always an index is messed. THEN I move to logic.

    I completely agree. But that does not mean that the OP tool is pointless, it is just not suited for performance analysis. Hence my point: use it to review the logic of a query, then move to the physical layer with other tools. In most IT companies those two steps are not done by the same people anyway.

    > I too in my day have written the big crazy statment. This tool probably would help disect what something like that does. But that is NOT MAINTAINABLE. If you need a tool to tell you what it is doing then your doing it wrong. I usually take a step back and break it down into a bunch of smaller simpler statments.

    I agree. In the ideal situation, this is what one should do. Sadly we do not live in an ideal world, and sometimes one must deal with huge queries. For those sad moments, the OP tool is good. That's the point.

    --
    lucm, indeed.
  34. I've done projects like that by Slashdot+Parent · · Score: 1

    And these type of edicts from up on high tend to really bite you in the behind over time. You wind up with hundreds upon hundreds of stored procedures, and nobody knows which ones are even in use any longer. One project will wind up requesting a change that affects another project, and it basically excludes any O/R mapping tools. It's just one huge mess.

    Your best bet is to insist that your developers are just a little clued in. How hard is it to say, "As long as your queries always have an indexed field in the where clause, and your joins are all on the primary key, fire away. Anything other than that needs DBA approval." Or some other reasonable policy depending on how much data are in the database.

    --
    They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
  35. How many users does it take to bring down Oracle? by ekeko · · Score: 1
    This is a question I always pose to Oracle users. They usually will think: 100-200 users.

    My reply is: # of CPUs on the RDBMS server.

    So for a 8-way SUN server, you just need 8 user *sessions* submitting a 3 table cartesian product join, each table with 1 million rows.

    Who many users does it take to bring down a server?

    It could just be ONE human, issuing the same query 8 times in a row...

  36. Query result analysis by Craig+Ringer · · Score: 1

    Yes, it could be useful for examining the output of the query in non-performance terms. For complex queries I can easily see how that could be useful. That may, in fact, be the whole idea behind the tool - to help reduce or eliminate execution of grossly incorrect queries that don't do what the user wants. Tools like EXPLAIN aren't as useful for that, either, as the query looks quite different after the query optimiser is done with it. Additionally EXPLAIN output usually drops detail about specific fields, so it's not really possible to predict the results of the query from explain output alone. That said, it's quite possible to spot issues like a join without a filter.

  37. Re:How many users does it take to bring down Oracl by Shados · · Score: 1

    Indeed. I was quite careful to say "no -SINGLE- query..."

    Well, unless you're running the database on a Pocket PC or something =P

  38. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  39. Re:WTF by weicco · · Score: 1

    Oh, and considering the default join in virtually any SQL database is a nested-loop join, I'd say all databases loop by default. And a statement as innocuous as : select * from a, b, c; Can absolutely crater cpu and I/O performance. If each has 1,000 rows and there's not enough memory, there's 1,000,001 table scans. Hope your disk is fast.

    I tested this on SQL Server 2005 Express and either I am doing something wrong or your statement is false.

    I created three tables Table1, Table2, Table3. Each one has two columns id INT PRIMARY KEY IDENTITY, str NVARCHAR(MAX). I ran an query that inserted 1000 rows in each of those three tables. Then I ran 2 queries concurrently: SELECT * FROM Table1,Table2,Table3

    Well, my PC slowed down a bit and queries have been running some minutes now but nothing dramatic happened. I ran third query SELECT * FROM Table1 and it executed quite fine. I made some INSERTs and UPDATEs and those went well also.

    So results are: some slowness, no CPU cratering (but both cores are under 100% load), mem use is almost endurable and one unhappy boss because I used company time to do this test.

    --
    You don't know what you don't know.
  40. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  41. Re:Mod parents down!!! by Anonymous Coward · · Score: 0

    What he meant was, if you only have the SQL statement, without accessing the CATALOG to know what is indexed, you can only go so far. Of course the optimizer will need even more information than just WHAT is indexed -- sizes of tables, distribution of values, etc. Oracle/DB2 require you to update the index statistics periodically, otherwise the optimizer may choose a bad plan.