Slashdot Mirror


Refactoring SQL Applications

stoolpigeon writes "My past as a DBA probably makes me a bit biased, but the reaction I've seen the most when a database application isn't performing as well as would be liked seems to focus on the database side of things. The search for a solution usually seems to center around tuning db parameters, the building (or removal) of indexes and, if the budget allows, throwing hardware at the problem. In their new work, Refactoring SQL Applications, Faroult and L'Hermite bring a much wider range of options to the table. There is a lot in this little book for the developer charged with fixing an existing application and I think a lot of good information that could save one from making a number of headache-inducing mistakes on a new application." Keep reading for the rest of JR's review. Refactoring SQL Applications author Stephane Faroult with Pascal L'Hermite pages 293 publisher O'Reilly Media, Inc. rating 9/10 reviewer JR Peck ISBN 978-0-596-51497-6 summary Good for a developer charged with fixing an existing application. The book is divided into eight chapters; the first two deal with how to approach a problematic application in general. In the preface the authors say, "This book tries to take a realistic and honest view of the improvement of applications with a strong SQL component, and to define a rational framework for tactical maneuvers." I found this to be true throughout the entire book and was impressed by how well the examples, suggestions and problems echoed my real-life experience. This book is first and foremost practical. There is really almost nothing in the book that does not come immediately to bear upon the problem at hand. I've seen others do a lot less with many more pages.

The examples and benchmarks are compared across three different popular relational database management systems. They are MySQL, Oracle RDBMS and Microsoft SQL Server. I thought that this brought up a couple interesting issues that are not directly addressed in the book. First is that the authors are talking about how to improve performance, not comparing platforms, but the numbers are there and may be of some interest to people who would like to compare them. Secondly, I've met a number of people over the years who get quite animated about insisting that a good DBA does not need to know any certain solution, but rather just the fundamentals. I think Faroult and L'Hermite put this idea to rest, though unintentionally. In order to discuss how to best understand what exactly is happening and how best remedy issues, they show that it is necessary to have an understanding of platform specific issues and tools. This is true on two levels. The first is that the location of use of the built in tools for each platform are different. The second is that what works for one platform does not necessarily work for another.

For example, Chapter Two "Sanity Checks" contains a section on parsing and bind variables. The authors compare performance when queries are hard coded, with new prepared statements on each iteration (firm coded) and with one prepared statement and changing the parameter value on each iteration in a loop (soft coded). On Oracle and SQL Server the performance was poorest with hard coded, better with firm coded and best with soft coded. MySQL did best with soft coded as well but actually took a performance hit moving from hard coded to firm coded. This had to do with differences in how MySQL server caches statements. The authors took the time to rewrite their code from java to C in order to ensure that the issue was not related to language or driver issues. This is not to say that one can ignore RDBMS and SQL fundamentals, but rather that to get top performance requires knowledge of platform specific issues. This also comes out again when dealing with optimizers.

With that in mind, the authors recommend that readers have a solid understanding of SQL and some programming language. Most examples are SQL and code is given in Java and PHP. There are also examples that illustrate SQL extensions showing procedures, functions, etc. written for all three RDBMS products covered. The authors stick primarily to standard SQL but do make note and at times show examples of how things will look in each of the other databases. This information is current and reflects the most recent versions of the each product.

The fourth chapter, "Testing Framework" is incredibly useful. The authors cover generating test data and then checking correctness of outcomes through comparison. This is really useful information for anyone working to improve an application, or writing one for the first time. I think it also a large part of why this book could really appeal to new and experienced developers as well as the developer working on existing or brand new applications. I think there is a good chance that only the most extremely experienced developer would find nothing new here, or at least some new way to approach a problem. New developers can learn quite a bit and avoid some bad habits and assumptions without having to gain that information the hard way. And then the tools for generating random data, large amounts of data and comparing results will provide excellent opportunities for learning and real world application.

The next three chapters cover dealing with specific types of issues and how to improve performance. The last chapter then quickly describes a scenario of just how the authors step into real world situations and start to attack a problem. This is followed with two appendices. The first is scripts and samples, the second tools that are available to help in finding issues and resolving them. Some of the authors tools use SQLite, which is discussed briefly in the chapter on creating test data as some of the tools depend upon it.

I think that it has been a while since I've read a book that could have such a rapid return on investment. There are many suggestions and insights that should enable anyone to squeeze better performance out of just about any database application. While the focus is on the application side, there is plenty that requires understanding and work on the database side as well. There is discussion of the parameters and hardware I mentioned at the start of this review. But rather than the only options, they are one part in a much larger and systematic approach.

The authors relate that often refactoring for this type of application comes into play when something that used to work does not work any more. This can often lead to an environment of high pressure and emotion. The desire for a rapid resolution can lead to casting about in the dark for a quick fix or a feeling that cost is no longer as significant since a fix must be had now. The authors argue, and I agree, that this is exactly when a rational, disciplined process of tracking down and fixing issues is the most valuable. I agree. The issue is of course that someone in a position to do something must have the ability to take that approach. This book will get one well on the way to being in that place. Of course it can't take a brand new developer or DBA an expert. Much like a degree it can give them some fundamental tools that will allow them to take full advantage of experience as it comes rather than just crashing and burning.

If I could I'd have any developer on a database centric application read this, and DBAs as well. There is a lot here for both sides to learn about just how much they depend upon and impact one another. This may be an idealistic dream, especially for larger shops where often the relationship between those two groups is adversarial, but I think that such an approach could only make life much better for everyone involved. For anyone looking to enter this world on either side of the DBA or developer equation, this may make a nice addition to their education. For that individual wearing both hats this could be a life saver. In this small book they will learn many things to look out for as well as gain exposure to some of the similarities and differences in what are arguably the top three relational database management systems right now.

You can purchase Refactoring SQL Applications from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

159 comments

  1. Should have included PostgreSQL and DB2 by bogaboga · · Score: 4, Insightful

    This study would have carried more weight if it had included PostgreSQL and IBM's DB2. These two databases do more serious work than MySQL although many believe MySQL is more widely deployed.

    1. Re:Should have included PostgreSQL and DB2 by MBGMorden · · Score: 4, Interesting

      I have to agree. Maybe I'm unfairly biased and it's just past performance (both relayed by others and experienced by myself), but I don't trust MySQL for anything more complex than a backend for a simple website. If I want a reliable open source database for a serious project, I'd go for PostgreSQL in a heartbeat.

      I actually was looking at KnowledgeTree recently as a possible solution for a document management system for our organization (we have a clunky old system and some others here are pushing SharePoint as a replacment . . .), but it's use of MySQL pretty much stopped that dead in it's tracks. I know they'd prefer MS SQL Server as an option here. I could *probably* talk with them if it supported PgSQL. But MySQL isn't even an option to discuss for something this important.

      --
      "People who think they know everything are very annoying to those of us who do."-Mark Twain
    2. Re:Should have included PostgreSQL and DB2 by Quintilian · · Score: 2, Funny

      Exactly, MySQL is nothing but a toy database.

      You're right! I wanted to catalog all my LEGO sets and G.I.Joes and it was just useful enough.

    3. Re:Should have included PostgreSQL and DB2 by bogaboga · · Score: 3, Insightful

      Exactly, MySQL is nothing but a toy database.

      This is the problem with most slashdotters. Most of them put up unsupported comments. What I would like you to do is to support your claims by pointing us to websites that have made the "mistake" of first running MySQL and later discovering the "light" in adopting PostgreSQL or otherwise.

      Alternatively, you could websites that use MySQL; which websites can be branded as "toy websites" by extension.

    4. Re:Should have included PostgreSQL and DB2 by Anonymous Coward · · Score: 0

      How so? I have a database with over 35 million records and no problems.

    5. Re:Should have included PostgreSQL and DB2 by not+already+in+use · · Score: 4, Insightful

      Puh-leez. You don't come to slashdot for objective discussion. You come here to mingle with folks who enforce your inherently held "truths." You come here to post the same Microsoft jokes that haven't been funny for 10 years and get modded +1, Funny. You come here to ridicule the less technically inclined in an attempt to inflate your hollow ego. You come here to partake in a circle-jerk of technical non-conformance. You come here for many things, none of which involve objective, reasonable discussion.

      As an aside, some of the toy websites that use mySQL include Flickr, Facebook, Wikipedia, Google, Nokia and YouTube.

      --
      Similes are like metaphors
    6. Re:Should have included PostgreSQL and DB2 by hondo77 · · Score: 1

      The book is not a study, it is trying to teach refactoring concepts with the idea that you can take and apply them to any SQL project. Surely you can figure out how to apply an Oracle or MySQL example to PostgreSQL, yes?

      --
      I live ze unknown. I love ze unknown. I am ze unknown.
    7. Re:Should have included PostgreSQL and DB2 by vlm · · Score: 3, Insightful

      You forgot the other often repeated/never researched traditional slashdot claims seen in every mysql comment section, such as mysql doesn't do transactions and doesn't do replication, both of which are necessary for each and every database install ever done past present or future... (Of course it has those features for about half a decade, maybe more, certainly since around the 4.0 range, but that never slows down the folks that repeat those claims)

      Then there are the re-occurring claims that mysql is useless because it doesn't have some bizarre feature that you might personally think is useful, therefore any database without it is useless for everyone doing anything, like perhaps direct file importation of COBOL ISAM punch cards, or an internal database representation for complex four dimensional vectors. You know, the stuff everyone uses.

      Then there are the posts explaining how a failing hard drive on an old gateway 2000 vaporized the filesystem and/or bad ram caused endless kernel lockups, and the mysql software was running on that bad hardware, and correlation always implies causation, so mysql must be bad too.

      Finally I expect several posts about how they found an obscure bug in the beta 3.23 version back around eight years therefore they'll never use it again because that is the only software that has ever had a bug.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    8. Re:Should have included PostgreSQL and DB2 by aoteoroa · · Score: 1

      This study would have carried more weight if it had included PostgreSQL and IBM's DB2

      MySQL is easy to use, and understand and is therefore perfect for a book on general database concepts. Years ago books on programming concepts had their examples written in C. It didn't matter that C was not the best tool for every job it was just a language that most programmers understood.

    9. Re:Should have included PostgreSQL and DB2 by einhverfr · · Score: 3, Interesting

      Well... I have been using Postgresql since back WHEN MySQL didn't do transactions.... I still don't trust MySQL's transactions or the new strict mode. At the same time, I have watched PostgreSQL do an absolutely terrific job of running horrendously written queries optimally. Here are two criticisms I have about using MySQL for real application work, especially when you are distributing that application (and thus have little control over how users set up their db's):

      1) MySQL transactions are built into the table engines, and by default (last I checked, and meaning you don't install innodb, etc), the tables will not be transactional. This means that if you are building an inhouse app, you can trust it more than you can if you are distributing your software. In short, if you are distributing software you can't guarantee that it is running on a system with transactions without a great deal of headache........ The same goes for referential integrity enforcement.

      2) Strict mode can be turned off by any application. This means that the more recent data integrity checks cannot be relied upon. This is an issue on both inhouse and distributed software because it adds quite a bit of overhead to the QA process internally, and can add support headaches in software for distribution.

      MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port. It is quite a bit worse than PostgreSQL for anything else.

      --

      LedgerSMB: Open source Accounting/ERP
    10. Re:Should have included PostgreSQL and DB2 by Tiro · · Score: 1

      I have the same qualms about deploying Wordpress, which requires MySQL. Not to mention that the MySQL commercial license costs $600.

    11. Re:Should have included PostgreSQL and DB2 by MBGMorden · · Score: 3, Interesting

      I'm not holding anything against it that regard. The simple fact is that I've had two fairly low traffic MySQL databases become corrupted beyond the point of being usable within the last 3 years. The hardware wasn't at fault here (nor was it old or outdated). Now luckily, this was for something that while important, wasn't "OMG somebody's head's gonna roll!" critical (namely, it was the quarantine database for amavisd-new on a mail filter, and then later an internal message/call tracking system that we'd wrote).

      For stuff like that, where you can stand to lose the data, or at worst, roll to a backup, then MySQL has it's uses. However, our document management system for example contains tons of documents that we must legally keep archived and available (Government institution - we have to have it available for FOIA requests). We also have for instance land appraisal software keeping databases of property taxing information that we need to bill at the end of the year (with about $50 million annually riding on that - if we don't get those bills out our whole budget shuts down). I just don't trust that type of thing to MySQL. Not to mention that the "nobody ever got fired for buying Microsoft" mentality does kick in. If the database fails and I have to restore from backup, then if it's MS SQL Server or Oracle then your bosses will usually not fault me(as long as I have good backups in place, which I do). If something that critical fails and I used MySQL on the project, I very well might be looking for a new job.

      --
      "People who think they know everything are very annoying to those of us who do."-Mark Twain
    12. Re:Should have included PostgreSQL and DB2 by poot_rootbeer · · Score: 1

      This study would have carried more weight if it had included PostgreSQL and IBM's DB2. These two databases do more serious work than MySQL although many believe MySQL is more widely deployed.

      "Study"? This is a book review.

      Thanks for getting the "WHAT ABOUT POSTGRES" comment that must accompany every Slashdot story submission that mentions MySQL out of the way early, though.

    13. Re:Should have included PostgreSQL and DB2 by MyLongNickName · · Score: 1

      Oddly enough, I am also looking at KnowledgeTree. Very inexpensive and well put together system. What exactly is your problem with mysql in this instance?

      I'd also be interested in hearing more about your view of KnowledgeTree as a whole. I was very impressed with its Office integration and overall ease of use compared with more expensive products.

      --
      See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
    14. Re:Should have included PostgreSQL and DB2 by Splab · · Score: 4, Informative

      GP is right, MySQL is a toy database, advanced toy but still a toy database.

      The absolutely most important thing for a database is data integrity, the ability to trust in your system - when it says "Yeah I saved that for you", it should take catastrophic events to lose it again.

      MySQL treats data in a best effort way, if what you asked it to do doesn't fly with current config, it reverts to something that looks right enough and go with that.

      Consider a database setup, admin installs MySQL with default creates some tables, runs it for a while, decides he needs more log space (transaction), he adjusts the settings and restart MySQL. It starts, everything is peachy. Transactions are running, being committed, he adds more tables, and then suddenly shit hits the fan, he does a rollback, MySQL says ok, but lo and behold, the data is still there...

      So what went wrong? When he changed the transaction log size MySQL during start up realized an inconsistency between the actual log file size and the wanted, MySQL can't expand this file on the fly so InnoDB is disabled, MySQL now reverts to MyISAM (I am not kidding, this is what MySQL will do). Any subsequent calls to begin and commit transaction will be accepted with an OK. Any tables created afterward will be accepted, even with explicit engine syntax MySQL will just issue query ok, 1 warning.

      Now the warning will tell you that the InnoDB engine wasn't available, so MySQL chose MyISAM instead - however, most aren't aware of this behavior, especially since most programming languages does not support this.

      A database should at no point _ever!_ say "OK" to a request for something that can't be handled. If I say begin transaction and something isn't right I want my database to shout on top of its binary lungs that something is wrong and my data isn't safe.

    15. Re:Should have included PostgreSQL and DB2 by einhverfr · · Score: 4, Insightful

      It isn't quite that simple, but I suppose one of my earlier (and later abandoned) projects qualifies.

      I set up HERMES (a CRM suite written in PHP4) originally on MySQL and eventually discovered that the lack of transactions, etc. were a serious problem (this was back in 1999). I tried to move it over to PostgreSQL and discovered that PostgreSQL was really hard to administer (this was back in 1999). I ended up doing all my prototyping on MySQL, then converting the schemas to PostgreSQL using mysql2pgsql.pl because this was the only way I could get the data protections I needed (back in 1999).

      Now, both MySQL and PostgreSQL have come a long way in the nearly-a-decade since then. MySQL has added transactions (for some table types not installed by default), foreign keys (for some table types not installed by default), strict mode (which can be circumvented on the app level), and a the planner has gotten much better. On the other hand, nearly every one of my issues with PostgreSQL has been resolved too. 8.3 has some really impressive new features from a developer perspective, and 8.4 will have even more. I haven't had to do prototyping on MySQL since PostgreSQL 7.3 came out.

      I still stand by the statement that "compared to PostgreSQL, MySQL is a toy," and I would expect the gap between them to continue to widen. However what was limited to light content management db's in 1999 (MySQL), has become better able to handle a wider range of single-app dbs. MySQL is still no reasonable choice for an enterprise-wide database management solution especially where critical data is involved, but there are an increasing number of special cases where it is an option, in particular when compared to Firebird's embedded version, SQLite, and stuff like Sybase's SQL Anywhere. Comparing MySQL to MS SQL though only comes out favorably for MySQL where MS SQL is quite a bit more than is needed. PostgreSQL OTOH can in most cases compare favorably to Oracle, DB2, and MS SQL.

      So the other half of the statement needs to be "but there are some cool things you can do with a toy db...."

      --

      LedgerSMB: Open source Accounting/ERP
    16. Re:Should have included PostgreSQL and DB2 by Fulcrum+of+Evil · · Score: 4, Insightful

      What I would like you to do is to support your claims by pointing us to websites that have made the "mistake" of first running MySQL and later discovering the "light" in adopting PostgreSQL or otherwise.

      It's a toy database because when things aren't set up properly, they don't fail. Instead, they succeed silently and corrupt data (see using the wrong file format for your tablespace). Also, the developers are a treat - "we don't need transactions, do integrity checks in the app", followed by "we now have transactions, aren't we cool". Do they have triggers yet? Meanwhile, I have postgres, which works just fine.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    17. Re:Should have included PostgreSQL and DB2 by MBGMorden · · Score: 1

      I've had a few MySQL databases become corrupted in production systems. I've not had any corruption in either MS SQL Server or PostgreSQL databases that have been in use longer and are used much more heavily. More or less just a case of "once bitten twice shy".

      As to KnowledgeTree specifically, I didn't use it extensively, but it did look promising. I did have some minor issues defining permissions on certain items, but that was probably just a matter of learning curve. The only downside I'd state was that the system did seem a tad on the slow side. Now, that might not be a fair statement given that I was running it on my workstation for testing purposes rather than a big server, but my desktop machine is still pretty decent (Pentium 4 3.0ghz Dual core with 2GB RAM) and with just me on the system it was just barely above what I'd deem "acceptable". Again though, I'm sure that on something with a bit more horsepower it would pick up in performance.

      --
      "People who think they know everything are very annoying to those of us who do."-Mark Twain
    18. Re:Should have included PostgreSQL and DB2 by Splab · · Score: 1

      Seriously, if you believe MySQL to be safe you have no business with database applications, google it - heck, just read the linked sites from sibling posts.

      Transactions are only supported by specific engines and even when you think you are running the right engine MySQL might surprise you (usually when you need a rollback the most). Read up on it, your data is being corrupted!

    19. Re:Should have included PostgreSQL and DB2 by MyLongNickName · · Score: 1

      Yeah, the speed seems to be the only thing I found lacking. I attributed that to the workstation I have this installed on as well. If it turns out that it runs better on a server, we will likely begin using it.

      Do you have another system that you are looking at?

      --
      See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
    20. Re:Should have included PostgreSQL and DB2 by A+nonymous+Coward · · Score: 1

      Or alternatively, you could find sights detailing the problems with MySQL. I got bitten by one, where if a table was defined with any timestamp, any row update would automatically update the timestamp whether asked for or not. They thought it a feature; I thought it a disaster that took hours to track down. Then I found that it has no validation, taking dates such as 2005-01-42.

      I have no idea whether it still has these particular failings, but by the time I wised up and started looking for web sites detailing these bogosities, I was thoroughly fed up with it and will never touch it again.

    21. Re:Should have included PostgreSQL and DB2 by jellomizer · · Score: 1

      Suboptimal SQL procedures can be slow on any system.

      Once I wrote a stored procedure in my first draft (Quick and easy to code) It took 45 minutes to run...

      The DBA optimized using server tools brought it down to about 30 minutes.

      Then I went back checked for all the bottle necks and fixed them from longest to shortest. It was able to run in 20 seconds. Yea to took more lines and by no way as an elegant SQL call. But for a 13,500% speed improvement lets put elegance to the wast side.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    22. Re:Should have included PostgreSQL and DB2 by Hecatonchires · · Score: 1

      Wow. 35 million! That's heaps!

      Is that the sum of rows in all tables? # of members? Transactions? User audit entries?

      --

      Yay me!

    23. Re:Should have included PostgreSQL and DB2 by Leolo · · Score: 2, Interesting

      I write applications that use MySQL that get installed on servers at the cient's premises. I'm also the one doing with the installation and MySQL config.

      Reponding to your points :

      1. If the client were to insist on handling the MySQL part, and screwed it up, it would cease to be my problem. Or rather, I would point at the instalation and tell them were they fucked up;
      2. About turning off strict-mode. If your applications are turning off strict-mode, then don't be supprised if you break data integrity. If your clients are writing apps that turn off strict-mode and mess up data integrity, that isn't really your problem, is it?

      Or how about we put it another way: it will always be possible for someone to mess up the data. These are human issue, not a software issue. Deal with them that way.

    24. Re:Should have included PostgreSQL and DB2 by einhverfr · · Score: 2, Interesting

      If the client were to insist on handling the MySQL part, and screwed it up, it would cease to be my problem. Or rather, I would point at the instalation and tell them were they fucked up;

      Ok, so your point is that this is fine as long as you install MySQL, make sure that Innodb, etc. is installed, etc. Fine. I don't want that responsibility.

      About turning off strict-mode. If your applications are turning off strict-mode, then don't be supprised if you break data integrity. If your clients are writing apps that turn off strict-mode and mess up data integrity, that isn't really your problem, is it?

      On the other hand, it isn't a support headache I want, since I might be hired to come in and fix it. I would rather be spending my paid development time elsewhere.

      However, on that latter point, for inhouse apps where several apps share a db, you have a bigger issue, and that is that you really need to have someone auditing every app to ensure it DOESN'T turn off strict mode. Why bother with that issue? Why not use a db which ALWAYS performs these sorts of checks and where they can't be turned off?

      --

      LedgerSMB: Open source Accounting/ERP
    25. Re:Should have included PostgreSQL and DB2 by DiegoBravo · · Score: 1

      While I mostly agree with your points, I definitively don't understand why MySQL still creates tables without Foreign Key support, if you don't add the silly "engine=innodb" keyword. Please don't reply with "backward compatibility for broken applications/schemas"...
      ...
      Server version: 5.0.67-0ubuntu6 (Ubuntu) ...
      mysql> create table testtable ( xx integer);
      Query OK, 0 rows affected (0.00 sec)

      mysql> show create table testtable; ...CREATE TABLE testtable( ...
      ) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

    26. Re:Should have included PostgreSQL and DB2 by larry+bagina · · Score: 1

      MySQL 5 has triggers, but they're limited compared to pgsql (and everyone else). Maybe it's been fixed, but it used to be that they were only invoked on tables you updated directly, not on tables affected by foreign key constraints or affected by trigger code.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    27. Re:Should have included PostgreSQL and DB2 by Anonymous Coward · · Score: 1, Interesting

      Maybe when PgSQL becomes CASE-INSENSITIVE on WHERE clauses. I mean really, am I gonna have to fork it, give it a better, catchier name, and get rid of case sensitivity just to get a solid open source DB that doesn't drive me nuts??

      PoStGrEsQl = PostGreSQL UNLESS I EXPLICITLY SAY IT DOESN'T. Not the other way around. </rant>

    28. Re:Should have included PostgreSQL and DB2 by daveime · · Score: 1

      MySQL has always worked fine for me ... anecdotal I know, but a lot of coders I've met tend to have the mentality "I can throw whatever query I want at the system, and if it doesn't work, it's a DBA problem".

      Thankfully I am both the programmer and DBA for our system, so I don't have to worry about a.n.other dumbass making a query that joins 27 tables with combinations of LEFT, RIGHT and INNER joins that end up running a WHERE clause of 1 million billion gazillion fufillion shabady-u-illion ... yen ... sorry rows, got carried away thinking about sharks and laser beams.

      Rules of thumb, which I repeat ad nauseum to any new junior I have to take under my wing :-

      Don't use LEFT or RIGHT joins unless absolutely necessary - and even then, one or the other, NOT both in a single query.

      Don't use ORDER BY unless absolutely necessary ... once you get above about 4MB, he's going to start using filesorting rather than in memory sorting.

      Don't request every column of every table just to get a count of the total records ... do a separate minimal query that hopefully can use index only, and never touch the actual table, and then the main query to fetch a subset of rows using LIMIT N,M.

      Use DESCRIBE on every query ... if you don't see the estimated row counts that look like N x 1 x 1 then the query is no good ... something that is N x M x P can quickly explode into hundreds of millions of possible combinations before you know it, even if N,M,P are small magnitude like 1000 rows per table.

      Multi column indexes are your friend, but even then, a little care in the order of the columns can make all the difference in the speed of the index search ... think cardinality, and order your multi column index so that the most rows can be elimiated at each step. Likewise, don't use a column that might infer a range search (timestamps etc) at the start of the multi-column, as you'll lose all the benefit of the later columns (he simply won't use them). Flags and status columsn first, timestamps and other ranges of values later.

      A little care and attention can turn a 45 minute query into a 15 second query ... believe me, I've done this kind of optimization myself.

    29. Re:Should have included PostgreSQL and DB2 by ta+bu+shi+da+yu · · Score: 1

      Well, for a start, this bug doesn't exactly inspire confidence.

      Even less inspiring was this quote from the former founder that his "main reason for leaving was that I am not satisfied with the way the MySQL server has been developed, as can be seen on my previous blog post. In particular I would have like to see the server development to be moved to a true open development environment that would encourage outside participation and without any need of differentiation on the source code. Sun has been considering opening up the server development, but the pace has been too slow."

      Of course, you should be downright worried by the following:

      "The reason I am asking you to be very cautious about MySQL 5.1 is that there are still many known and unknown fatal bugs in the new features that are still not addressed.

      To prove my points, here is some metrics and critical bugs for 5.1:

      * We still have 20 known and tagged crashing and wrong result bugs in 5.1 35 more if we add the known crashing bugs from 5.0 that are likely to also be present in 5.1.
      * We still have more than 180 serious bugs (P2) in 5.1. Some of these can be found here.
      * We have more than 300 known and verified less critical bugs that are not going to be addressed soon. (The total reported number of bugs to the MySQL server is of course much larger)"

      --
      XML is like violence. If it doesn't solve the problem, use more.
    30. Re:Should have included PostgreSQL and DB2 by julesh · · Score: 2, Interesting

      1) MySQL transactions are built into the table engines, and by default (last I checked, and meaning you don't install innodb, etc), the tables will not be transactional. This means that if you are building an inhouse app, you can trust it more than you can if you are distributing your software. In short, if you are distributing software you can't guarantee that it is running on a system with transactions without a great deal of headache........ The same goes for referential integrity enforcement.

      It's easy enough to set up a database creation script that ensures the tables have the necessary support. Sure, if you're working on tables that somebody else has created you can have issues, but in your experience how often does that happen? And if it does happen, how often do you have a choice as to what dbms is in use? In my 10+ years as a consultant I've only ever had to do that once, and never with mysql (because the client's existing system I had to integrate with used informix).

      2) Strict mode can be turned off by any application. This means that the more recent data integrity checks cannot be relied upon. This is an issue on both inhouse and distributed software because it adds quite a bit of overhead to the QA process internally, and can add support headaches in software for distribution.

      On the whole, this is probably a good thing. If the application is under your control, you can use whichever mode you want. If you're relying on somebody else's application, forcing it to use strict mode when it wasn't written for this environment could introduce subtle bugs. Now, if you were to argue that the _existence_ of these different modes of operation was an issue, then I'd probably agree. But given the existence of the modes (and that's unfortunately a necessity for backwards compatibility reasons) the ability of the application to change the mode to the one it expects is crucial.

      MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port.

      Of course, you are describing something like 90%+ of all database applications there. Yes, this book should cover MySQL: it's a very popular database that is perfectly adequate for most uses. Sure, there are applications where it shouldn't be used, but that hasn't stopped it from becoming extremely widely deployed and being used as the database of choice by web developers everywhere (which is probably the target market for this book).

    31. Re:Should have included PostgreSQL and DB2 by julesh · · Score: 2, Interesting

      I'm not holding anything against it that regard. The simple fact is that I've had two fairly low traffic MySQL databases become corrupted beyond the point of being usable within the last 3 years. The hardware wasn't at fault here (nor was it old or outdated).

      I'm not sure what you're doing wrong here, but I think many of us have been running a lot more MySQL databases than that and never experienced corruption. Myself, I have been maintaining on average about 20 MySQL instances spread across 3 different servers for the last 10 years, and have never[1] experienced corruption that wasn't tied to a hardware failure. Sometimes after a system crash I need to do a myisamchk --recover, but even that's rare. I've never needed to do myisamchk --sort-recover, although I understand there are circumstances where this is necessary.

      So my experience has been that mysql is, essentially, rock solid. It could be that we're using it for different applications and it works for mine but not yours, but I've also never seen any reports of problems with mysql corruption that cannot be fixed automatically by the tools mysql provides.

      [1]: OK, this isn't quite true. There was one instance of corruption that was tied to a linux kernel version that had a bug where dirty buffers were sometimes corrupted prior to being flushed to disk. Hard to pin that one on MySQL either, though.

    32. Re:Should have included PostgreSQL and DB2 by hesaigo999ca · · Score: 1

      Dude, he goes for the most used and easily understood, not what is on the way out.
      He wants to stay current with today's business model, I have yet to hear of a webfarm making available either of the dbs you mentioned....the 2 main ones available from any ISPs like Godaddy etc....are MySql and SLQServer from M$.

    33. Re:Should have included PostgreSQL and DB2 by einhverfr · · Score: 2, Interesting

      On the whole, this is probably a good thing. If the application is under your control, you can use whichever mode you want. If you're relying on somebody else's application, forcing it to use strict mode when it wasn't written for this environment could introduce subtle bugs. Now, if you were to argue that the _existence_ of these different modes of operation was an issue, then I'd probably agree. But given the existence of the modes (and that's unfortunately a necessity for backwards compatibility reasons) the ability of the application to change the mode to the one it expects is crucial.

      It is a good thing in some, limited circumstances (where data integrity is not critical). Where data integrity is critical, this feature means you need to have careful policies of auditing ALL applications that hit the db to make sure they never, under any circumstance, turn off strict mode. This really requires a code audit.

      You say I am describing 90% of all database applications. Fair enough, if you are looking at this from the application, rather than the information, perspective. For a single-app db it isn't bad and it scales reasonably well for simple queries. However, where it breaks down is for real enterprise information management. Say what you will, the criteria I mentioned don't apply to that environment.

      One of the things we have worked hard to do for the LSMB project is to ensure other applications can safely access the database with appropriate data integrity constraints. Given that we came from the SQL-Ledger codebase, that has been a real headache, and will probably take us another year to make a reality. However, it is a powerful feature if your customer can write add-ins to the db app in a different language without having to worry as much about data integrity issues even where it is an accounting app.

      --

      LedgerSMB: Open source Accounting/ERP
    34. Re:Should have included PostgreSQL and DB2 by grepya · · Score: 1

      Maybe I'm unfairly biased...

      Correct. You are unfairly biased.

    35. Re:Should have included PostgreSQL and DB2 by thePowerOfGrayskull · · Score: 1

      Now, both MySQL and PostgreSQL have come a long way in the nearly-a-decade since then. MySQL has added transactions (for some table types not installed by default), foreign keys (for some table types not installed by default)

      Just a point of clarification - InnoDB supports transactions, FKs, etc and is installed by default (at least in the windwos version, and in the most popular linux distros.

    36. Re:Should have included PostgreSQL and DB2 by Anonymous Coward · · Score: 0

      but it's use
      in it's tracks

      "its".

    37. Re:Should have included PostgreSQL and DB2 by tepples · · Score: 1

      So the other half of the statement needs to be "but there are some cool things you can do with a toy db...."

      Like sell toys?

    38. Re:Should have included PostgreSQL and DB2 by tepples · · Score: 1

      Just a point of clarification - InnoDB supports transactions

      Until MySQL decides it wants to ALTER TABLE es_inventory ENGINE=MYISAM without failing your transactions, as Splab explained.

    39. Re:Should have included PostgreSQL and DB2 by Slashdot+Parent · · Score: 1

      If I want a reliable open source database for a serious project, I'd go for PostgreSQL in a heartbeat.

      It's funny that you say that. I would choose MySQL over PostgreSQL for any high-availability application because of Postgres's piggish refusal to include replication and clustering as part of the package.

      I find it hilarious that when MySQL developers pig-headedly refused to include transactions in the database saying that was an application-level problem, the PostgreSQL community cried loud and shrill foul. Hilarious, because now I believe that the PostgreSQL developers are eating crow for taking the same attitude toward replication and clustering. I could never use Postgres in a production system because there is no high-availability solution.

      And, no, I'm sorry, but Slony is not replication--it's an ugly hack misusing triggers. And pgpool is not clustering, considering I can't even add a node to the cluster without locking out all database clients. I'm sorry, but that is not acceptable.

      MySQL has really figured out the high-availability thing, as far as open-source databases go. You should read about their clustering solution sometime.

      --
      They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
  2. Problem is not the SQL writers..... by Lumpy · · Score: 5, Insightful

    But with management.

    when I spent a few years as a DBA it was common to be told to not work on that project any more as soon as it produced usable data. That means as soon as you have a working prototype you are required to drop it and start the next project. Many times after you get a working prototype you then go back and refine it so that it's faster and uses less resources.

    Management is the blame. Unrealistic deadlines for DBA's and if you are honest with them and give a report that you have data they think it's good to go. I actually got wrote up once for taking one of the old procedures we had and rewriting it so that it worked much faster and the resource hog it was was reduced to the point that others could use the DB while it ran. I was told I was wasting time.

    --
    Do not look at laser with remaining good eye.
    1. Re:Problem is not the SQL writers..... by Samalie · · Score: 3, Interesting

      Agreed COMPLETELY.

      I work as a DBA as well, and the moment the prototype produces reliable data, its immediately off to the next project. Only time I ever get to go back and tweak code is if some random variable that was not thought of was missed in the original design, or a bug, forces me back into the code.

      I've got some code out there that I know beyond a shadow of a doubt is horribly inefficient...but I'm not given the time and opprotunity to correct that.

      --
      09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
    2. Re:Problem is not the SQL writers..... by CodeBuster · · Score: 3, Insightful

      This experience speaks to a more general issue that I have with non-technical MBA types who tend to reduce everything to a dollars and cents issue without fully appreciating or even being able to fully appreciate either the technical OR the financial consequences of their decisions. They assume that their MBA piece-of-paper mail-order diploma makes them oh-so-much smarter than anyone else who doesn't have one, when in fact the smartest people tend to study mathematics, physics, engineering, other hard science, or even philosophy while the intellectual light-weights study social science and get their MBA. If anyone is actually a waste of time and resources then it is the middle management social climbers who produce a lot of hot air using the latest "management techniques" that they read about in a trade magazine on an airline flight or heard about at a conference held in a cheap hotel ballroom.

    3. Re:Problem is not the SQL writers..... by Anonymous Coward · · Score: 1, Insightful

      I actually got wrote up once for taking one of the old procedures we had and rewriting it so that it worked much faster and the resource hog it was was reduced to the point that others could use the DB while it ran. I was told I was wasting time.

      You were wasting time. You only have so many hours in your life; why spend them working for those losers?

    4. Re:Problem is not the SQL writers..... by afidel · · Score: 1

      DBA time is expensive but nearly as expensive as Oracle licenses so our DBA gets plenty of time to analyze and tune SQL where he can, but we are mostly a COTS house, we do minimal in house development.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:Problem is not the SQL writers..... by Alpha830RulZ · · Score: 1

      Oh, please. If there are technical or financial consequences, then they are capable of being expressed in a spreadsheet that an MBA can understand. The weak link here is communication skills. Which, sad to say, are generally worth more in the marketplace than being able to solve partial differential equations, because they are rarer.

       

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
    6. Re:Problem is not the SQL writers..... by Anonymous Coward · · Score: 0

      Completely disagree.
      It is the managers job to push on and speed up the process, it is the DBA/developers job to speak up and tell management when the product is incomplete. If management tries to brush you off you speak louder, simple as that.

    7. Re:Problem is not the SQL writers..... by CodeBuster · · Score: 1

      Oh, please. If there are technical or financial consequences, then they are capable of being expressed in a spreadsheet that an MBA can understand

      If the MBA was shown such a spreadsheet that had already been created by someone else then yes, but my point was that they often fail to see how important technical factors effect the cost outcomes until someone else with the requisite background knowledge brings it to their attention and even then they may choose to ignore it because their massive egos have difficult adjusting to the fact that someone else, even worse if they are not a fellow MBA, saw something that they didn't. It is not so much a matter of "wouldn't understand it if they saw it", but rather "doesn't think of it".

    8. Re:Problem is not the SQL writers..... by Alpha830RulZ · · Score: 1

      Why do you expect them to understand the technology factors? That's apparently your job. If they choose to ignore your input, either you don't understand something about what they are asking for, or they are about to learn a painful lesson.

      It may well be that your input is being ignored because you come off as a condescending and arrogant prick. Labeling someone else as having a "massive ego", apparently because they don't listen as well as you would like them to is not a strategy that will help you win friends and influence people.

      I have both an accounting/business and computer science/development background. It's interesting to watch the two cultures bash each other. "Massive egos" are not solely exhibited by the MBAs.

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
    9. Re:Problem is not the SQL writers..... by CodeBuster · · Score: 1

      either you don't understand something about what they are asking for, or they are about to learn a painful lesson.

      In my experience it has generally been the later rather than the former.

      It may well be that your input is being ignored because you come off as a condescending and arrogant prick.

      I generally come off as the thoughtful type when speaking with my coworkers and I allow them to ask me if they don't understand something or want clarification, but otherwise I presume that they know what they are doing even if I know that they don't. The MBAs I have known seemed perfectly fine with walking off a cliff rather than having someone else suggest that their financial analysis could be wrong.

      "Massive egos" are not solely exhibited by the MBAs.

      I don't think that any category has the sole monopoly on that one.

  3. compare SQL to Code by trybywrench · · Score: 2

    I'd like to see some work done on the balancing act of how much to do in code and how much to do in SQL. My coworker can put SQL statements together that if printed on an 8.5x11 would fill the whole sheet if not run over. Me, on the other hand, I tend to break up huge sql statements into a set of smaller ones and then use code to do some of the work that could possible have been done in SQL. I don't have the time to find out what works best on my own but I do have the time to read about it.

    btw, how come tech books don't come on tape/cd?

    --
    I came to the datacenter drunk with a fake ID, don't you want to be just like me?
    1. Re:compare SQL to Code by Saint+Stephen · · Score: 4, Informative

      On a REAL database, like Oracle, the query optimizer will factor common expressions, eliminate unused branches, and in general execute your SQL in completely different manner than what you write.

      Doing things in a "relational calculus" way, where you specify what to be done (i.e., with SQL) is superior to doing things in a "relational algebra" way (individual statements correlated by procedure code).

      I've written some queries that were a dozen pages long for a individual statement, mostly because I use a python-like style where the indentation specifies the structure and thus you can string together monstrous subexpressions and not get confused. The DBA was like "you're not running that on MY box," but it ran super fast because of the query optimizer.

      That's what I mean when I say MySql is a Toy, compared to DB/2, Oracle, or SQL Server. The query optimizer.

    2. Re:compare SQL to Code by OG · · Score: 1

      btw, how come tech books don't come on tape/cd?

      Only on Slashdot do you find someone who wants to listen to Natalie Portman talk SQL.

      Joking aside, I doubt I'd find tech books on tape all that useful. Without diagrams, code examples, etc., you lose quite a bit of the value, IMO.

    3. Re:compare SQL to Code by stoolpigeon · · Score: 1

      I think the emphasis here is on writing the best sql so you can write the best code. Removing unneeded iteration on either side can be a huge benefit. Repeated calls to a database can be expensive - in numerous ways - so I think they aim the reader towards a state where more work is done with less trips.
       
      I think that it is also safe to say that many of the tools they give for testing performance would be very useful in nailing down just where the issue is. It's not an issue of finding what works best - but rather figuring out in each case where the trouble is at.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    4. Re:compare SQL to Code by avandesande · · Score: 1

      If you are competent you can accomplish 99-100% of your business logic in the database.

      --
      love is just extroverted narcissism
    5. Re:compare SQL to Code by stoolpigeon · · Score: 3, Informative

      Chapter 5 - "Statement Refactoring" includes, according to the author, "...how to analyze SQL statements so as to turn the optimizer into your friend, not your foe." It's solid and probably points people towards writing things that work just as you describe.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    6. Re:compare SQL to Code by he-sk · · Score: 2, Insightful

      That's possibly a VERY bad idea. Even with small queries it's possible to create huge intermediate result tables and loading all that data into your application will make it crash. And if that doesn't happen, breaking a complex SQL statement into separate parts robs the SQL query optimizer of useful information. Your code limits the choices for an optimum evaluation plan, but how close is your code to the optimum plan that can be achieved?

      Having said that, the optimizers can't work magic. I sometimes split up SQL statements to force a certain strategy, when the result of the optimized plan is garbage.

      --
      Free Manning, jail Obama.
    7. Re:compare SQL to Code by Anonymous Coward · · Score: 0

      Yes, because many databases can now run Java stored procedures. :P

      Seriously, though...writing all your business logic in SQL and stored procedures may give faster execution, but (IMHO) the development may be slower. Last I knew, the database world was far behind the java/.net/c++/general-purpose language world in the things that ease code maintenance, reuse and legibility:

      * rich IDEs
      * refactoring plugins/tools
      * unit testing tools and plugins.

      Also, the vendors' stored procedure languages may lack the rich language paradigms present in general purpose languages, which promote reuse or easy code maintenance in a variety of ways.

      I also found that the development resources (examples, mailing lists, forums, etc.) were more numerous and varied for general purpose languages.

      Plus, it's easier to staff a shop with 100 java/.net/c++ devs than PL/SQL (or similar) devs. They are simply more general-purpose skillsets.

    8. Re:compare SQL to Code by avandesande · · Score: 1

      The problem is that 'in code' the modules driving the business rules are far and away from the data driving the rules- requiring you to spend an inordinate amount of time ferrying the data back and forth from the db to the code modules.

      The real problem is that 'everything is a nail' and most developers do not have a solid handle on data design and database coding concepts.

      --
      love is just extroverted narcissism
    9. Re:compare SQL to Code by Anonymous Coward · · Score: 0

      btw, how come tech books don't come on tape/cd?

      Would you want to sit through a University mathematics, physics, engineering, or computer science lecture in which the professor talks without using a blackboard or projector? If so, you're unusual. I certainly wouldn't - I am almost completely unable to understand even the simplest formula in spoken form.

    10. Re:compare SQL to Code by jadavis · · Score: 1

      "relational algebra" way (individual statements correlated by procedure code)

      Relational Algebra is not procedural or imperative. Any Relational Calculus expression can be written as a logically-equivalent Relational Algebra expression, and vice versa.

      And they can both be optimized into whatever procedure will execute fastest on a given machine.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  4. Performance Tuning is Not Refactoring by puppetman · · Score: 4, Insightful

    I have the misfortune of working with a database that is primarily a couple of tables with key-value pairs (not a traditional database model).

    There is only one column that can be indexed, and it has to be done with a full text index.

    Every once in a while, there is a discussion about moving this mess to something more traditional. I was excited to read the review on this book, but as I read through the review, it seemed like this was more of a "performance tuning guide".

    Re-factoring a database is a lot more involved - changing tables, stored procedures, maybe even the underlying database.

    The term Database Application is fuzzy and poorly defined. Is it the front end? The stored procedures? The database tables? I would consider a database application to be any part of the code that stores, retrieves, removes or modifies data stored in a database, and the entities that have been defined to store that data.

    Using that definition, this book is about tuning, not refactoring.

    1. Re:Performance Tuning is Not Refactoring by Anonymous Coward · · Score: 0

      The definition of refactoring, according to Martin Fowler, is, "the process of changing a software system in such a way that it does not alter the external behaviour of the code yet improves its internal structure." Performance tuning is not refactoring because, with luck, it makes the system behave in a measurably different way.

      I'd guess that the reviewer and perhaps the author is not using Fowler's definition, as is evident in the statement, "The authors relate that often refactoring for this type of application comes into play when something that used to work does not work any more." Refactoring should never fix software that doesn't work as this requires a change in behaviour, something that refactoring explicitly attempts to avoid.

    2. Re:Performance Tuning is Not Refactoring by stoolpigeon · · Score: 1

      It discusses how to change client code - which is definitely not database tuning. There are database tuning techniques involved, but really it is much more than that. I tried to express that in the review but maybe I didn't do as well as I would have liked.
       
      Here is how Wikipedia defines refactoring, "Code refactoring is the process of changing a computer program's internal structure without modifying its external functional behavior or existing functionality. This is usually done to improve external or internal non-functional properties of the software, such as code readability, simplify code structure, change code to adhere to a given programming paradigm, improve maintainability, improve extensibility, or increase execution performance."
       
      This book is full of examples on how to do that on the client side and on the database side. When I think of tuning, I think of the items mentioned in the first paragraph of the review - changing db parameters, adding or removing indexes and adding hardware. The book does touch on those but spends more time going over how to rewrite code inside and outside of the database - so I wouldn't say it is just tuning.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    3. Re:Performance Tuning is Not Refactoring by stoolpigeon · · Score: 2, Informative

      Let me explain what I meant by "doesn't work any more". For example a query that originally took 30 seconds now take 3 hours. It still 'works' from a functional perspective but from a business perspective may have become completely useless. Refactoring can make it work again. I should have been more clear on what I meant there.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
    4. Re:Performance Tuning is Not Refactoring by David+Gerard · · Score: 3, Funny

      Well, you can hardly trust Wikipedia, given it runs on MySQL.

      --
      http://rocknerd.co.uk
    5. Re:Performance Tuning is Not Refactoring by Fulcrum+of+Evil · · Score: 1

      Re-factoring a database is a lot more involved - changing tables, stored procedures, maybe even the underlying database.

      This particular case is easy: identify one form of data storage (for instance, a customer record) in the tables, build a schema, change the clients, and migrate the data. You will need a history of used queries if your codebase is murky to identify all the users of this data. Follow this process for 1-3 items at a time until you empty the old table.

      If you have a dev environment, pull the table apart and identify all the uses of the table, make a schema that supports them, and migrate your code to use the DB. You'll probably break some obscure excel spreadsheet along the way, so do the actual migration in stages (as above) and don't worry if you can only shrink the table by 99% - making it an obscure minor thing is almost as good as killing it outright.Just don't let anyone add use cases to it.

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    6. Re:Performance Tuning is Not Refactoring by fgaliegue · · Score: 1

      Then you may want to "try out" this book:

      http://books.slashdot.org/article.pl?sid=06/06/07/1458232

      "Incidentally", it was written by... Stéphane Faroult. I've read it a few times, and used its lessons (there are no other words for it, really) to prove by figures that the redesign of the data model that I suggested could improve the performance by a factor of 10.

      Before reading that book, I knew that the data model was broke, but couldn't explain why. This book told me why. We use Oracle, but the lessons taught in this book apply to ANY (R)DBMS.

    7. Re:Performance Tuning is Not Refactoring by puppetman · · Score: 1

      Very good comment. What's funny is that when I hear the term "refactoring" it usually means, "Lets clean up the code so that it's more compatible with new features we need to add".

      If the code is cleaned up, then new features added, then it really is refactoring.

      But I would suspect that new features are added while the "refactoring" is going on, and thus it's not really refactoring.

    8. Re:Performance Tuning is Not Refactoring by puppetman · · Score: 1

      Fair enough. Tuning is a fuzzy term as well. Tuning can imply hints in what you've mentioned (indexes, parameters, hardware, etc), but it can also imply hints, views, etc, etc, which could technically be called refactoring.

      You're right - it's hard to summarize a book - I'll take a flip through it the next time I'm at a bookstore that carries it.

      Thanks for the reply.

    9. Re:Performance Tuning is Not Refactoring by complete+loony · · Score: 1

      Ah, that nasty anti-pattern. Definitely not a good model.

      If you need to store data that you can't define at design time, you're still better off creating an actual table, or set of tables. Yes it can make the rest of the application that references these values more complex, but it pays off in the long run.

      Another nasty anti-pattern we have to deal with is the compound value stored in a single column. The worst example I've seen involves storing XML strings to define the state of an object. The only reliable way to manipulate the XML is to serialise it to a class, modify the class properties and serialise it back again. Nasty nasty stuff. You can't report on it, you can't write simple update statements to fix something.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
  5. Anonymous Coward by Anonymous Coward · · Score: 0

    There's not Postgres in the book, so, it will be usefull for me?

  6. Server performance is important, but... by Ukab+the+Great · · Score: 3, Insightful

    I've found that the biggest issues with SQL applications (writing rich clients) is not in performance turning of server/sql but in dealing with ORM issues, where to draw the line between how much work the client does vs. how much the server does, reconciling changes made in memory with data in tables, concurrency, database architecture designed to cope in advance with poorly thought-out requirements you're given, etc. I'd hope that book on refactoring SQL *applications* would touch on these issues.

    1. Re:Server performance is important, but... by anomalous+cohort · · Score: 1

      The review of this book doesn't make it obvious to me. Is this book really about refactoring or is it about query tuning?

      IMHO, the former doesn't really need to be db vendor specific. Refactoring should encompass all code and not just the SQL. Looking for ways to refactor from an ORM perspective makes sense such as lazy evaluation and strategic caching.

      Query tuning is an important topic with which there are already plenty of resources devoted to it.

  7. Shoot the developers by kafros · · Score: 2, Interesting

    I am a developer, and my experience has shown that if you use one of: Oracle, SQLServer,
    PostgreSQL, DB2 and application performance is poor, 99% of the time it is poor design from our
    (developer's) side.

    Developers without good understanding of Relational Databases and SQL often produce problems that
    cannot be solved by indexes, or throwing transistors at them.

    It is so nice to see a "custom" made map implemented in the database using temporary tables instead of
    using the language's built-in map functionality :-)
    sorting arrays using the database gets extra points (no kidding, I have seen this!)

    1. Re:Shoot the developers by Bright+Apollo · · Score: 2, Informative

      I would cautiously agree with the developers being less educated on SQL than they should be. The trouble seems to be in the different mind-sets required to solve the application problems. SQL is a declarative language, and it operates on your set of data as described. The host language, for what of a better term, is C or Java or some other iterative language, and it operates on each individual member of the set, stepwise. If you primarily think "stepwise" or algorithmically, you're already framing your problems as challenges to be met with an iterative approach. Learning to recognize which parts of the application require declarative, functional, or procedural solutions is where good developers becomes great.

      I disagree with any statement that pits developers against DBAs. I have found a few DBAs that have more ability than myself in SQL, but not many and not recently. I don't rely on them to work with my data, and then don't rely on me to manage four hundred schemas across two dozen machines. A good separation of duties always helps.

      -BA

    2. Re:Shoot the developers by BigJClark · · Score: 1


      Custom made map implementation? You mean a view? :)

      This is why I like our new company architecture model. Essentially the application developers only see a view. The view should closely match the "screen" of the application. The view is designed, and optimized by architects, and then, of course, all these views are available in our DBMS, managed by our DBA's. Never should developers make complex queries; they just query the view. Although some definitely have the touch, the majority routinely do things incorrect or inefficient. The architects are a group of highly trained individuals who optimize the heck out of the view, and of course the DBA's ensure uptime, backup and DBMS tuning.

      Rarely does the DBMS have to be tuned, after its go.

      Its a great model and it has allowed a much less dependence on developers, who are the equivalent of educated transient gypies ;)

      --

      Hi, I Boris. Hear fix bear, yes?
    3. Re:Shoot the developers by kafros · · Score: 1

      I will assume that you are joking
      Your database is only queried? No insert/update/delete? If yes who does these? Ah developers :-)

      Database model design is not the only source of problems. A table locking where you should not lock
      (everything) can bring a good designed system to it knees

      As for the map example on my previous post, imagine a Java TreeMap, or a .NET Dictionary
      implemented as a two column temporary table to be dropped after use - yeah some of my previous
      colleagues did these things :-(

    4. Re:Shoot the developers by BigJClark · · Score: 1


      Ahhh, no I'm not joking :) Sorry for being vague. Inserts/Updates/Deletes are handled from the view as well. We have an instead-of trigger on the view that manages our "business rules" and the base tables. Locking is completely handled by the Oracle, and as a DBA, I know that the Oracle manages this by issuing a RX lock, which allows other sessions to insert/update/query on the resource(table). Perhaps a small example.

      A java front-end requires customer vital data: DOB, first/last name, city. This information for all customers is stored in a view. Lets say this view is composed of two tables. Table person inner joined with table status. When the java developer wishes to query existing information, he queries the view. When he wishes to insert a new customer, the instead-of trigger kicks in, and manages inserting into the base table (Oracle handles view insertion nicely, its possible to insert into a view w/o an instead-of trigger, but this example is exceedingly simple, and not how I would model it).

      Essentially what has happened, is there is a communication layer between business and the tech side. Its easy for a business analyst to say, "I wish we could add postal code" and add a new column in a our view, that maps to an existing column in the underlying table. It makes changes incredibly easy, and dare I say, agile ;)

      I can go into much more detail if you would like. I have time; another benefit is like I said above, its quick/easy to do changes, which leaves people like me with a lot of time on their hands :)

      --

      Hi, I Boris. Hear fix bear, yes?
    5. Re:Shoot the developers by kafros · · Score: 2, Insightful

      So you have code in two places? Application layer (lets say JAVA) and DB layer (triggers)?

      How do you debug, log, source control, deploy?

      I found triggers to be a pain in the ass on the problems above
      (Unless you just have one big customer and nothing else)

    6. Re:Shoot the developers by einhverfr · · Score: 2, Interesting

      You throw transistors at your developers? ;-)

      Actually I agree with you. One of the big wins on the LedgerSMB project was the approach of moving the all main queries into user defined functions (and then using them as named queries). One of the nice things about this approach is that dba-types (like myself) can address performance when it crops up rather than dealing with the ABOLUTELY horrendous approaches found in the SQL-Ledger codebase..... (Assembling queries as little bits of text strings, throwing them all together, and doing REALLY braindead things like simulating HAVING clauses in the middleware...)

      For example, we had an issue where one large user (thousands of invoices per week) was having issues paying them without spending 8 hours waiting for the page to load. We were able to get this down to something like 10 minutes across 5 pages. No added hardware, just reviewing why the old code sucked, moving in new code, and then optimizing it when its performance sucked.

      --

      LedgerSMB: Open source Accounting/ERP
    7. Re:Shoot the developers by oldhack · · Score: 1

      Hey, that sorting using DBMS temp table sounds like a novel solution when external sorting is called for.

      --
      Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
    8. Re:Shoot the developers by Anonymous Coward · · Score: 0

      Your right. We should just move to object databases and take you out of the loop.

    9. Re:Shoot the developers by BigJClark · · Score: 1


      That is exactly how we do it. It also grants us a very distinct separation from the two, meaning in a few years we can rollout a brand spanking new UI, and ever have to touch the DB code.

      1) Debugging/logging through Oracle is one of the largest pain in the arses I've ever undertaken. I don't even bother to debug code through toad anymore, its primarily just trace statements, that when "debug" mode is turned on, it traces pretty much every single operation, much like java stack tracing.

      2) We manage source control through Oracle Designer, which isn't as nice as some of the fancy java source control apps, but it does handle versioning, which is all I really care about.

      3) Deploying in PLSQL is just like Java, if not easier. As we have all of our tiers, when the decision to rollout to production environment, we schedule an outage, open our Oracle server in restrict mode, execute the rollout scripts, recompile the entire database, and then reopen the Oracle. As we are contractually obligated to be 24/7, we have to work closely with our clients (in this case, the Government, I won't get any more specific) and generally it should take no longer than 5min.



      But you hit the nail on the head with triggers. Triggers are nasty little beasties that can cause bugs like you wouldn't believe. Having old triggers fire at obscure times, under changing circumstances is such a headache. It requires a strict adherence to naming conventions, standards, and above all, intelligent people to preventing trigger abuse.

      --

      Hi, I Boris. Hear fix bear, yes?
    10. Re:Shoot the developers by blinky · · Score: 1

      Agree, once we had a case of developers (cobol) were sorting the data using "Bubble sort" in the main part of the application. The DBA had been through the various tricks (analyzing tables, indexes etc) for a 5% gain, till we discovered what the developer had done.

      One of the big issues is a lot of developer who know OOD know bugger all about databases. What i've found is the the quick to develop apps usually the worst for performance. I come from a development DBA background along with Java and C - the trick is knowing what should be done in each tier and making the best use of the tier.

    11. Re:Shoot the developers by MillionthMonkey · · Score: 1

      Everyone's always down on bubble sort. It's a shame- bubble sorting is easy to write and it's actually the best sorting method to use when your list has ten elements or less.

      Plus by the time it has a million elements you'll probably be on your next job.

    12. Re:Shoot the developers by daveime · · Score: 1

      When you are sorting 10 items or less, it's kind of academic what you use. The choice between 10ms and 11ms is an easy one to make.

      But thinking that "you'll probably be on you next job by the time it has a million elements", isn't a terribly good work ethic. What if the requirements explode while you are still on that job ... you're gonna look like a twat then, after your 10ms becomes 10 minutes and the boss is leanign on you for a fix.

      Anyway, swap sort FTW ... also easy to code, and way more efficient for > 10 items.

    13. Re:Shoot the developers by julesh · · Score: 1

      You throw transistors at your developers? ;-)

      Yeah. I started with BC10s and 2N7000s, but it wasn't helping much. I found to get results, it has to be one of these.

    14. Re:Shoot the developers by oliderid · · Score: 1

      Yes totally agree.

      As a developer I used to have a very poor understanding of basic things like indexing columns and optimizing SQL queries.

      A significant fraction of the workload were wrongly put on the application shoulder.Or worst multiple queries were launched instead of just one. Man how wrong I was...

      I remember a case where a query used to take "minutes", with the proper optimizations it became a mere second or so.

      Now I've got some sorts basic rules I try to respect.

      • If you have to parse query results in your code, this is almost always because of a poorly written SQL query or a bad Database design.
      • If you need multiple queries to retrieve a list/value...Go back to the design board.
      • Index columns, idiot.
      • "Join" is your friend, but don't abuse it.
      • Think before to code.
    15. Re:Shoot the developers by Abcd1234 · · Score: 1

      Developers without good understanding of Relational Databases and SQL often produce problems that
      cannot be solved by indexes, or throwing transistors at them.

      It is so nice to see a "custom" made map implemented in the database using temporary tables instead of
      using the language's built-in map functionality :-)
      sorting arrays using the database gets extra points (no kidding, I have seen this!),

      Dude, that isn't a lack of understanding regarding SQL. That's complete, utter incompetence.

    16. Re:Shoot the developers by MillionthMonkey · · Score: 1

      But thinking that "you'll probably be on you next job by the time it has a million elements", isn't a terribly good work ethic. What if the requirements explode while you are still on that job ... you're gonna look like a twat then, after your 10ms becomes 10 minutes and the boss is leanign on you for a fix.

      WHOOSH ;)

  8. Use views by bytesex · · Score: 2, Insightful

    I think it's usually best to have views (whether with rows that are the result of code, or with a pure 'select' definition, or materialized ones) define what your application 'sees', so that you can always change the underlying datastructure. That way refactoring becomes a bit more easy.

    --
    Religion is what happens when nature strikes and groupthink goes wrong.
    1. Re:Use views by avandesande · · Score: 1

      I do something similar by using a 'facade' pattern to assemble several modules' output into one stored procedure.

      --
      love is just extroverted narcissism
    2. Re:Use views by Buzz_Litebeer · · Score: 1

      I have not found this to be true. It sounds good in theory, but a view cannot be optimized in some cases and depending on the architecture can also have conversion problems that simply do not exist otherwise.

      If you are working at the problem strictly from the point of view of an application, oftentimes your perspective is correct, but if your looking at raw performance issues, and outright bugs, sometimes views can have a significance performance issue. Also, views often only grab subsets of data, and when you are doing heavy data manipulation you may end up joining the view against a table that is in the view itself causing slow downs.

      --
      If you don't vote, you don't matter, so don't waste your time telling me your opinion
    3. Re:Use views by bytesex · · Score: 1

      I know what you're saying; that's why I explicitly included views that are the result of code (can be done in postgres and oracle) and materialized views (oracle). Especially materialized views can make you forget about performance issues; the data is just available as you want it to be as if it were a proper table, indexed and all, and if your original data has been abstracted and normalized correctly, you don't have the headache of pushing hacks into your original data purely for performance reasons either.

      --
      Religion is what happens when nature strikes and groupthink goes wrong.
    4. Re:Use views by he-sk · · Score: 2, Interesting

      I have to agree. In DB theory we have learned that you should normalize your data for a good database design. However, materialized views can give HUGE performance gains, by eliminating multistep joins between tables. You can't built customs indexes for queries that have these joins when the index condition is not in adjacent tables and you always have to deal with large intermediate results.

      If the app is read-only and performance is critical the best strategy is to used materialized views built from normalized source tables. If the database doesn't have materialized views (PostgreSQL) you can simulate with CREATE TABLE foo AS SELECT * FROM bar WHERE ...

      --
      Free Manning, jail Obama.
    5. Re:Use views by bjourne · · Score: 2, Informative

      That's silly. When you change the data model you must change the views too. Then you could as well have changed how the application uses the database instead and avoid a whole layer of indirection. Plus, views are read-only so the client application still needs direct access to the tables to update data. Views are useful and very under appreciated, but not in the way you suggest.

    6. Re:Use views by Thundersnatch · · Score: 1

      Views are updatable in MS SQL Server, and probably Oracle and DB2. THere are some restrictions of course (no aggregates for example are allowed in the view). We use this fetaure all the time to hide schema changes from older code that we don't want to mess with.

    7. Re:Use views by daveime · · Score: 1

      Normalize ... how I hate that word, as so many people who are obsessed with normalization tend to lack basic common sense, and can't see beyond next week in terms of flexibility and expandability of the system.

      I remember one fix I had to do was on a simple address table. The original guy had implemented ZIP codes as an integer, because "everyone knows that US ZIP codes are numeric, and 5 digits maximum, right ?". Hey, that's such an efficient use of space, it must be good ?

      Then we had to expand our list of countries to include UK also ... imagine the shock when he realised that UK ZIP code (Postal Codes), can look like "EC13 1XY".

    8. Re:Use views by bytesex · · Score: 1

      1) That's nothing to do with normalization.
      2) So you had to pour over data from one column into a temporary varchar one, delete the original column and then rename the temporary one to the original one ? Poor you. Look how they make you work.

      --
      Religion is what happens when nature strikes and groupthink goes wrong.
    9. Re:Use views by daveime · · Score: 1

      1) Yes, I KNOW it has nothing to do with normalization, but too many people equate normalization with "the most efficient use of a suitable datatype for the data it will contain", without thinking that maybe next week, the data might change.

      2) Yes, and with a bit of forethought, the column probably would have been varchar in the first place. See point 1) for clarification. Oh, and try doing that on InnoDB on a live system, and come back to me in 5 hours when he's finished rebuilding the indexes, and your boss is screaming bloody murder that the system is slow.

    10. Re:Use views by mcmonkey · · Score: 1

      Then we had to expand our list of countries to include UK also ... imagine the shock when he realised that UK ZIP code (Postal Codes), can look like "EC13 1XY".

      I'm surprised you didn't have issues with zip codes before that point. Nobody noticed the 4 digit zip codes when the leading zeroes got lost?

  9. Actually by voltheir · · Score: 1

    While they focus on MySQL in their examples, these are the same authors of "the Art of SQL", which tackles how to formulate application schema in a optimal way. It's actually a very good read - and if anyone had read and understood their first book, they would have no need for this second one, regardless of what RBDMS they are using. Also, mod parent Flamebait please - because that's all it is.

    1. Re:Actually by stoolpigeon · · Score: 1

      They don't focus on MySQL - and I don't think the gp (or whatever it was) said they did - but just so it's clear. They do everything evenly between the three. But no - they do not do this for DB2 or PostgreSQL. I don't know how or why they chose those 3 - but that is what they cover. Though I have to imagine much of this will, as in The Art of SQL carry over to any RDBMS.

      --
      It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
  10. Anything by Stephane Faroult... by tcopeland · · Score: 1

    ...is probably a good read. He has a lively writing style; it kind of reminds me of Bertrand Meyer's "Object Oriented Software Construction". Anyhow, I've got both this book and Faroult's "The Art of SQL"... both are excellent.

    1. Re:Anything by Stephane Faroult... by ptudor · · Score: 1

      Wow, a useful comment.

  11. Re:useful step in performance the authors missed by Anonymous Coward · · Score: 1, Funny

    It doesn't matter what language you use, the DB will always be the bottleneck in DB applications, duh!

  12. SQL Tuning is also pretty good by johnjaydk · · Score: 1
    In cases where the problem is query performance, I've had pretty good results with the techniques in "SQL Tuning" by Dan Tow.

    This, of course, only works if the rest of the database setup is more or less ok ;-)

    --
    TCAP-Abort
    1. Re:SQL Tuning is also pretty good by einhverfr · · Score: 1

      Sometimes problems are remarkably difficult to address, especially in PL/(Pg)?sql environments.

      Let me give you an example.

      I was doing a rewrite of a bulk processing portion of an application and we were running into performance issues. Eventually we ended up talking to some REAL PostgreSQL experts, reviewed the affected functions, etc. The function had a few warning issues because we had to pass into it really large (5000x5) text arrays into the function.

      At one point we all discovered that we could create array creation routines in PostgreSQL that performed really badly. So we thought maybe there was fundamental problem with the interface. So I asked on IRC and got an answer that this was not likely to be the issue with parsed text arrays and that I should look elsewhere. I created tests and we were able to demonstrate that we had all (including the experts) gone down the wrong path in troubleshooting the performance problem.

      Eventually we were able to prove that the problem had to do with what was happening inside the array loop and address the issue there (turned out the issue was that it was causing cache misses and that we ended up with something which was entirely inefficient inside PL/PGSQL). Eventually we got it solved and cut the execution time from 45 minutes to 90 sec.

      --

      LedgerSMB: Open source Accounting/ERP
    2. Re:SQL Tuning is also pretty good by einhverfr · · Score: 1

      Just to note the expert that was consulted actually gave us the correct answer at first but then we all went down the wrong path. :-)

      --

      LedgerSMB: Open source Accounting/ERP
    3. Re:SQL Tuning is also pretty good by snowtigger · · Score: 1

      Although Oracle specific, I would also add "Effective Oracle by Design" by Thomas Kyte to the list of good books about database application design and performance optimisation.

      I read this book while working with Oracle, but now apply the concepts to my current job with Postgres.

  13. "to the table by Anonymous Coward · · Score: 0

    "Faroult and L'Hermite bring a much wider range of options to the table."

    Ah! I get it! "To the table", like SQL table! You funny!

  14. Much better ways to do complex sql by avandesande · · Score: 2, Informative

    I have developed a design pattern using in memory data table objects that can satisfy the most complex requirements without using any dynamic code.
    It also allows the queries and business logic to be broken into discreet chunks that are easily optimized and debugged.

    --
    love is just extroverted narcissism
    1. Re:Much better ways to do complex sql by Anonymous Coward · · Score: 0

      Can you please elaborate on that and if possible, post the code?

    2. Re:Much better ways to do complex sql by MillionthMonkey · · Score: 1

      Dude, he said it was discreet.

    3. Re:Much better ways to do complex sql by Abcd1234 · · Score: 1

      And let me guess. This margin is too narrow to contain a description of your magic design pattern?

  15. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  16. Oh if only... by RulerOf · · Score: 4, Funny

    Only on Slashdot do you find someone who wants to listen to Natalie Portman talk SQL.

    SELECT * FROM Memes WHERE Reference LIKE '%Portman%' AND LIKE '%naked%' AND LIKE '%petrified%' ORDER BY SlashdotCommentScore, HotGrits;

    27,154,947 Rows Returned.

    --
    Boot Windows, Linux, and ESX over the network for free.
    1. Re:Oh if only... by daveime · · Score: 1

      Because you used '%string%', you forced the query to do a full table scan, i.e. he couldn't use any indexes, and then you sorted 27 million rows, forcing a filesort rather than in-memory sort.

      Remind me never to offer you a job as a DBA.

    2. Re:Oh if only... by RulerOf · · Score: 1

      Remind me never to offer you a job as a DBA.

      Please don't! While I'd be tempted by the substantial (I hope) pay increase, I much prefer systems administration. My general interaction with SQL is via WMI and MS SQL via VBScript (e.g. putting WMI query results into an MSSQL database).

      I knew someone would probably point out my terrible SQL query, though ;)

      --
      Boot Windows, Linux, and ESX over the network for free.
    3. Re:Oh if only... by Anonymous Coward · · Score: 0

      What dialect of SQL lets you omit the field for the second and third LIKE comparisons?

  17. That's where I go first by Onyma · · Score: 1

    Perhaps it speaks ill of my skills but usually the first place I look is at my application / query structure. Most modern DBs have enough raw speed to pull off large tasks in amazingly short amount of times. Typically I look towards how I am managing the tool first as this frequently yields the desired gains and nets me more experience as well. Because slow queries always deal with large amounts of data, small "mistakes" have exponential effects... thus small improvements offer the same reward.

    Only when that fails to I go further into modifications of the hardward / DB engine, etc. as required.

    --
    Play me online? Well you know that I'll beat you. If I ever meet you I'll "/sbin/shutdown -h now" you. -Weird Al, kinda.
  18. The best way to refactor an SQL Application.... by Anonymous Coward · · Score: 0

    ..is to rewrite is in a language that actually lends itself to applications.

    AC because I know people will regard this as a troll. It is not - I have seen countless DBAs who think that SQL is actually an application language that should be used for all manner of things it is really hopeless at....

    This even included:
    - email
    - CSV processing
    - error reporting

    Said systems were always incredibly difficult to manage normally and next to impossible to manage by people who did not write them.
    Now I know not all "applications" being dicussed are actually APPLICATIONS as such and are more just slightly complex procedures, but that is not what is implyed by the words chosen here.

    1. Re:The best way to refactor an SQL Application.... by einhverfr · · Score: 1

      see http://www.sqlonrails.org/ for why you are wrong. SOR is a great app framework!

      Actually, though there are some things that SQL is very, very good at, but you are right that people often do the wrong sorts of things in it. My favorite approach is to use SQL stored procs to do named queries and then put processing the results of those queries into the middleware or thick client level.

      --

      LedgerSMB: Open source Accounting/ERP
  19. PostgreSQL and the book by einhverfr · · Score: 1

    I don't know. Are the ideas worth testing on PostgreSQL? I am willing to bet they are.

    --

    LedgerSMB: Open source Accounting/ERP
  20. Bad Applications by rossz · · Score: 1

    Recently I was asked to improve the performance of a MySQL based PHP web application. After turning on query caching and tuning the settings I was left with looking at the queries involved. It turned out the application was really the problem, not the queries. Just loading the main page involved several hundred queries. For example, settings were saved in a table. Instead of loading all of the settings with a single query, it grabbed them one at a time. It wasn't like they had a few hundred variables and only used a few. There were a couple of dozen and they loaded all of them individually. This kind of bad design was consistent though out the application. Turning on query caching made a huge difference, but the thing was packed with similar problems that could not be overcome with a simple fix like that. A complete rewrite was really the only answer -- and wasn't going to happen.

    --
    -- Will program for bandwidth
  21. Confusing volume with data integrity by mcrbids · · Score: 5, Insightful

    The typical argument goes something like: 'MySQL suxorz - nobody uses it for serios work' followed by: 'Yeah? well explain that to =HIGH VOLUME SITE=!'

    Such responses show a misunderstanding of what serious work is being discussed.

    MySQL does a fabulous job with simple, high-volume transactions, exactly the type seen by Yahoogle/Flicker/Blogsites. They need to sore simple data (EG text) and be able to retrieve it quickly, and for these uses, MySQL is probably a better bet than Postgres or DB2.

    But 'serios work' means thing like strong, ACID compliant transactions, row-level locking, strong integrity of field types, and a query scheduler that holds its own when you combine inner, outer, nested, subqueries mashing together a dozen or more tables with millions/billions of records/combinations.

    Postgres will do this, MySQL won't. MySQL isn't bad because of this, it's just a tool not well suited to this specific job. I use MySQL for website CMS, I use Postgres for financial applications.

    Does your dishwasher suck because it does a piss-poor job cleaning your socks? Use the right tool for the job.

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:Confusing volume with data integrity by Angvaw · · Score: 1

      I agree 100%. Unfortunately, to many folks, the distinction isn't quite so clear as dishwasher vs. laundry machine. As a friend once told me, without any knowledge of our requirements, "Your company is pissing away tens of thousands of dollars on Oracle, when you could use MySQL for free!"

    2. Re:Confusing volume with data integrity by mcrbids · · Score: 2, Insightful

      PS: Your company is pissing away tens of thousands of dollars on Oracle, when you could use PostgreSQL for free!

      And no, I haven't read your requirements, but I'd be intrigued to find out what needs Oracle answers that PostgreSQL can't!

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
    3. Re:Confusing volume with data integrity by Alpha830RulZ · · Score: 3, Funny

      And no, I haven't read your requirements, but I'd be intrigued to find out what needs Oracle answers that PostgreSQL can't!

      See, I have this budget that I need to use up, or I lose the budget, and then my pay grade goes down, and I don't get to keep my secretary and this office with the nice window...

      Or, I have this Oracle DBA, and I can't convince him to learn any other platform, because he sez it's bad for his career, and he's my brother in law...

      Other than that, I like PostgreSQL real well, too. MS SQL Server is a pretty good low cost solution for a lot of smaller uses, too, if your company insists on spending money.

      --
      I was taught to respect my elders. The trouble is, it's getting harder and harder to find some.
    4. Re:Confusing volume with data integrity by berend+botje · · Score: 2, Informative

      Well, if parent is using fuzzy full-text searches he's well better off with Oracle.

      Even non-fuzzy full-text searches on Postgresql are a pain. Yes, they do work great, but the syntax is an abomination.

      I loathe Oracle as much as the next guy, but even MySQL does a better job at fuzzy string matching! Really.

    5. Re:Confusing volume with data integrity by berend+botje · · Score: 1

      I ment to say: ...but even MySQL does a better job [b]than Postgresql[/b] at fuzzy string matching.

    6. Re:Confusing volume with data integrity by berend+botje · · Score: 1

      Man, I need some coffee...

    7. Re:Confusing volume with data integrity by skarphace · · Score: 1

      What do you find so hard? You have to install an extra module, but it'll only take a few minutes of your day. This is Levenshtein matching, but it's just an example. There are plenty others.

      SELECT firstname, lastname, unaccent(c.lastname) ~* unaccent('Smith') AS match FROM people WHERE (unaccent(c.lastname) ILIKE 'Smith' OR c.lastname % 'Smith') ORDER BY match DESC

      http://www.postgresql.org/docs/8.3/static/fuzzystrmatch.html

      --
      Bullish Machine Tzar
    8. Re:Confusing volume with data integrity by Anonymous Coward · · Score: 0

      Does your dishwasher suck because it does a piss-poor job cleaning your socks? Use the right tool for the job.

      My dishwasher does a great job on keyboards!

    9. Re:Confusing volume with data integrity by marhar · · Score: 1

      but I'd be intrigued to find out what needs Oracle answers that PostgreSQL can't!

      I like PostgreSQL a lot, but RAC (real application clustering) lets us scale our Oracle database nicely by adding boxes to the existing cluster.

  22. Re:useful step in performance the authors missed by Anonymous Coward · · Score: 0

    Why?

    Sorry don't get it.

  23. also, please reread this: by einhverfr · · Score: 2, Interesting

    MySQL is a good db for single-app databases, where data integrity is not a tremendous issue or where you are deploying a separate MySQL instance on a different port. It is quite a bit worse than PostgreSQL for anything else.

    From your description you are using MySQL for a single-app database where you run a dedicated instance of MySQL for your app. That is not the usage case I was describing, which is a central RDBMS serving out the same data to a myriad of different applications. If you are trying to go beyond ONE app for your data, you should look to PostgreSQL instead of MySQL.

    --

    LedgerSMB: Open source Accounting/ERP
  24. Hardware cheaper than refractor by ConfessWitch · · Score: 1

    "and, if the budget allows, throwing hardware at the problem." These days its actually quite the opposite. Having a full database cached in memory does wonders for performance.

  25. Here's an easy solution... by GuyverDH · · Score: 1

    I've found that the easiest way to make applications run great is to give the developers systems that are at least 2 generations older than what will be used in production (with the latest software, patches, drivers, etc)...

    Then, hold them to making the application perform as you want it to, on that hardware. They don't get paid (their final lump amount) until said application performs as you'd like on the 2 gen old hardware.

    Then, when you migrate to the production hardware, it's quite a bit faster, and doesn't contain the bloat that could have crept in, if the developers had been given access to hardware identical to the production hardware.

    --
    Who is general failure, and why is he reading my hard drive?
  26. I Only Know Oracle by bloobamator · · Score: 5, Informative

    I only know Oracle but I've known it since version 5.0. Intimately. I haven't read the book but I read the review. Here are a few tips I've learned over the decades that you might find useful, just in case they aren't covered in the book:

    1) You have to establish a general rule of thumb for each production db whereby any one sql that consumes more than x% of the db resources needs to be tuned. The value of x varies from db to db. If it cannot be tuned below x% then it needs to be refactored.
    2) Learn to use stored outlines. If you can get them to work they will save your ass and make you look like a total hero.
    3) Never turn your back on the optimizer. Really. Even for simple queries, even with the deepest stats.
    4) Bind variables are a necessity for high-repetition sql. Bind variables are something you might want to avoid for reports queries for which the optimal plans depend on the user input values. This is because a sql's plan is cached along with it the first time it is parsed, and if you use bind variables then the first plan you get is the plan you will always get so long as the sql remains in the shared pool.
    (You can sometimes work around this issue by turning off bind variable peeking, but consider doing it on a per-session basis instead of changing it system-wide. Scary!)
    5) Nowadays a 32GB SGA is no big thing. Get yourselves a ton o' RAM and set up a keep pool in the buffer cache to pin your most important reporting indexes and tables. Partition your big reporting tables and maintain a sliding window of the most recent partition(s) in the keep pool.
    6) No sorting to-disk. Ever. If you cannot let the session have the PGA it needs to sort the query in memory then the SQL needs to be "refactored".
    7) Once you have eliminated most of the physical disk reads it then becomes all about the buffer gets (BG's). When disk reads are low the high-logical-BG queries immediately become the new top SQL. This is because logical BG's are all CPU and your db is now cpu-bound, which is right where you want it. So from this point it's back to item #1 and we prune and tune (thanks KG!)

    I could go on all day. Perhaps I should write a book?

    --
    "Crude and slow, clansman. Your attack was no better than that of a clumsy child."
  27. Use views as transition states... by SuperKendall · · Score: 1

    I think it's usually best to have views (whether with rows that are the result of code, or with a pure 'select' definition, or materialized ones) define what your application 'sees', so that you can always change the underlying datastructure.

    As another poster noted, I don't see much benefit from that over just changing what the app sees. At least not long term.

    I agree with Joel On Software Joel, in that you should set up views as you stated to present the best view possible for the application - then change all your code to use that view, and when it does make the view the real table you are going against.

    Otherwise I have seen way too many views abused and grown to monsters. If you always use them as transition states, then they can never get out of hand as once they are the real tables you proceed to make new views, and the schema evolves.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  28. Re:Shoudnt developer concentrate on app-logic? by Anonymous Coward · · Score: 0

    This study would have carried more weight if it had included PostgreSQL and IBM's DB2. These two databases do more serious work than MySQL although many believe MySQL is more widely deployed.

    Shouldn't developer concentrate on app-logic? The db should just take care of every thing. Then what are the db vendors offering...1000 different parameters to learn & tune?

  29. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  30. netezza does it for you by Anonymous Coward · · Score: 0

    checkout netezza.com 's data warehouse appliance. No refactoring required, it just runs fast

  31. Also available on BWB for 31.98 + free shipping US by misterjava66 · · Score: 1

    Its also available on BetterWorldBooks for $31.98 with free shipping to the US

    And, as always, save the world, fund global literacy,

    details: http://www.betterworldbooks.com/Info-Our-Impact-m-51.aspx

    Over 300 titles of used (=cheap) on SQL

    http://www.betterworldbooks.com/List.aspx?Category_ID=3804&&s=23329287

    I'm a dba. I think I will actually buy and read this one though. :-)

  32. "don't install innodb"? by tepples · · Score: 1

    by default (last I checked, and meaning you don't install innodb, etc), the tables will not be transactional.

    Why would one assume InnoDB is turned off? MySQL binaries in Ubuntu 8.04's repository and the MySQL essentials installer for Windows include InnoDB. So does Go Daddy's hosting platform, if one believes this page deprecating Berkeley DB in favor of InnoDB.

    Strict mode can be turned off by any application.

    Per the MySQL manual: You could try not handing out the SUPER privilege like candy. This way, other apps can't SET GLOBAL sql_mode, and when they try to SET SESSION sql_mode, it won't affect any other apps' connections.

  33. SQLite ignores referential integrity by tepples · · Score: 1

    If you are looking for a single-app database try SQLite (mentioned in the tools section by the authors) http://sqlite.org/whentouse.html

    SQLite's web site states that SQLite silently ignores REFERENCES (foreign key) constraints.

  34. What version of MySQL? by tepples · · Score: 1

    still, I'd prefer not to handle mysql structural data, as it mangles text removing spaces

    What version of MySQL are you talking about? This page claims that MySQL 5.0.3 and later preserve trailing spaces in VARCHAR columns, and all versions preserve trailing spaces in TEXT columns.

    Now, for the citation needed folks: http://sql-info.de/mysql/gotchas.html

    From the linked page: "Update: Apparently this behaviour will be corrected in the planned 5.0 release", and it was.

  35. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  36. Their use is not recommented. by tepples · · Score: 1

    The SQLite source tree contains source code and documentation for a C program that will read an SQLite database, analyze the foreign key constraints, and generate appropriate triggers automatically.

    The searches I tried in Synaptic on Ubuntu 8.04 LTS didn't appear to turn up any program to make sqlite triggers. Was this program added to Ubuntu only recently, or is it not in Ubuntu at all? And then the download page scares people away from downloading the source code packages: "Their use is not recommented. The SQLite developers do not use them. You should not use them either" (spelling and emphasis preserved).

    For MySQL you generally needed InnoDB for referential integrity

    But at least MySQL with InnoDB comes packaged for Ubuntu.

  37. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  38. So it's genfkey. Thanks. by tepples · · Score: 1

    The READ.ME for this is at http://www.sqlite.org/cvstrac/fileview?f=sqlite/tool/genfkey.README You can access this functionality directly from the sqlite3 shell tool (type sqlite3 at the CLI). The syntax is .genfkey ?--no-drop? ?--ignore-errors? ?--exec?

    Thank you for pointing out genfkey. Now that I have a keyword to go on, I can understand how one would "recompile" foreign key constraints. Don't get me wrong, I've been using SQLite 3 in a Python-based internal web site for about a year now; I just wanted to get straight what it does and does not do.