Slashdot Mirror


The Art of SQL

Graeme Williams writes "One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does. The Art of SQL is the opposite of a cookbook – or rather it's about cooking rather than recipes. It's not a reference manual, although there's plenty to refer back to. It's an intermediate level book which assumes you know how to read and write SQL, and analyzes what SQL does and how it does it." Read on for Graeme's review. The Art of SQL author Stéphane Faroult with Peter Robson pages xvi + 349 publisher O'Reilly Media rating 9 reviewer Graeme Williams ISBN 0-596-00894-5 summary An excellent way to improve your approach to SQL

I guess it's normal for an intermediate text to present a number of serious examples, the idea being that the code from an example can be applied to roughly similar problems with roughly similar solutions. I think Faroult's goal is both more abstract and more ambitious. He wants to expand your ability to navigate among and analyze alternative SQL statements with more confidence and over a larger range. This isn't so much a book about SQL as it is about thinking about SQL.

There's almost no chance that the SQL examples in the book will be directly applied to a real problem. The examples are relevant at one remove: What does thinking about this example tell me about thinking about my current problem? So the book doesn't come with downloadable samples. There's no point.

The first few chapters of the book lay a foundation for the rest. As each brick in this foundation is placed, it sometimes feels as though it's placed firmly on your head. Think about indexes ... whack! Think about join conditions ... whack! These chapters have very few examples – the goal is to force you to think through queries from first principles. It's more effective (and less painful) than it sounds.

These introductory chapters cover how a query is constructed and executed, including how a query optimizer uses the information which is available to it. Faroult discusses the costs and benefits of indexes, and the interaction of physical layout with indexes, grouping, row ordering and partitioning. He also explains the difference between a purely relational query and one with non-relational parts, and how such a query can be analyzed in layers. Chapter 4 is available on the book's web page. It will give you a good idea of the style of the book, but not of the level of SQL discussed – the longest example in the chapter is just 15 lines.

Chapter 6 presents and analyzes nine SQL patterns, from small result sets taken from a few tables, to large result sets taken from many tables. The chapter falls roughly in the middle of the book, and feels like its heart. Prior chapters have built up to this one, and subsequent chapters are elaborations on particular topics. The theme of the book, to the extent that it has one, is that details matter. Different SQL statements can be used to produce the same result, but their performance will be different depending on details of the data and database. A change to the database structure, such as adding an index, might improve performance in one set of circumstances, but make it worse in another. The case analysis in this chapter will make you more sensitive to details in query design and execution.

The authors almost never mention particular database products. Their justification is that any absolute statement would be invalidated by the next release, or even a different hardware configuration, and anyway, that's not the business they're in. But sometimes this can go too far. The phrase "A clever optimizer ... will be able to" is too hypothetical by half. Is this an existing hypothetical query optimizer, or a vision of a future optimizer? Or the optimizer of one hypothetical database product and not of another? I suspect that Faroult knows and is simply being coy. It's just unhelpful not to tell us what existing databases will do, even if depends on the release or the hardware.

Faroult does this because he's not much interested in telling you what actually happens when a particular SQL statement is executed by a particular database. If the authors wanted a cute title for the book, I'm surprised they passed over The Zen of SQL Maintenance. When you look at an SQL statement, Faroult wants you to see what other SQL statements would do under other circumstances. He literally wants you to see the possibilities.

The second half of the book continues the analysis of chapter 6 into special cases, such as OLAP and large volumes of data, monitoring and resolving performance issues, and debugging problematic SQL.

Chapter 7 discusses tree-structured data, like an employee table with a column for the employee's manager. Faroult likes his own solution best, but presents an alternative approach by Joe Celko clearly enough for you to explore that as well.

Chapter 8 includes a series of examples of SQL and PHP. For anyone like me who spends more time in various programming languages than in SQL, this chapter is a small gem. It nicely illuminates the care needed in deciding what happens in code and what happens in SQL.

Chapter 9 addresses locking and concurrency, as it applies to both physical and logical parallelism. Transactions are included, but the discussion is just one part of a 20-page chapter and seems thin.

The Art of SQL is very clearly written. Whether it is "easy" will depend on how comfortable you are with SQL. This book is targeted at (page xi) "developers with significant (one year or, preferably, more) experience of development with an SQL database", their managers and software architects. I have months of experience spread over a decade or more, so I'm nominally outside the target audience. I found the SQL examples and discussion clear once I had a chance to let them sink in. If you're working with SQL regularly, they'll be perfectly clear.

The graphs let down the otherwise high quality of the book. For example, Figure 5-3 shows a rate (higher is better) but the legend says "Relative cost" (higher is worse). Figures 9-1 through 9-3 on facing pages 228 and 229 show response time histograms for three different query rates but don't show what the rates are. The x-axis of Figure 10-1 seems to be calendar time, but it's decorated with a stop watch icon. And as a representative of rapidly aging boomers with rapidly deteriorating eyesight, could I beg book designers not to put figure legends in a smaller font than the text of the book? Diagrams should be simple and clear, not something to puzzle over.

This is a book to conjure with, but it's not a book for everyone. Some people may find it too abstract, with too much discussion of too few examples. If you're completely new to SQL, the book will be hard going. If you have very many years of experience with SQL, it's just possible that you won't find anything new in the book, although I expect you'll find a lot to think about. For anyone in between, The Art of SQL is a excellent way to improve the way you attack problems in database and query design.

You can purchase The Art of SQL from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

23 of 225 comments (clear)

  1. BN vs Amazon by beavis88 · · Score: 3, Informative

    I know Amazon has software patents and all, but this (and just about every other book I see reviewed here) is ~20% cheaper at Amazon than it is at BN...

  2. art by Lord+Ender · · Score: 4, Insightful

    If you think SQL is an "art," you are a hack. Designing proper databases and the SQL to use them optimally falls under the domain of science/engineering. 95% of developers see relational databases simply as a means for a persistent data store, but that's not what it was designed to do. If you don't know engineering (what you do when designing functional systems*) from art (painting pictures, etc) you should have gone to a better college.

    See this page for a start on the science of databases.

    *Yes, I know creativity is usually involved when designing things. That doesn't make it art.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
  3. Great Statement by lbmouse · · Score: 3, Funny

    "for SQL there's a bigger gap between what the code says and what the code does"

    I couldn't agree more. Sometimes while working in SQL I really wish I had a time machine and a rubber hose.

  4. Bummer, trees by plopez · · Score: 3, Insightful

    Chapter 7 discusses tree-structured data

    Looks like no discussion of many-to-many relationships. This would make any book on databases and sql queries of limited value, not much more than a beginner book.

    Trees are of limited value, they only exist in special circumstances. If you stick to tree structured data relations then you will almost always have to do wierd hacks that may threaten data integrity.

    While many-to-many *seems* harder, as a data model M:M is often a much better practicle solution. As well as modeling the reality of the situation in a much more accurrate manner.

    My $.02

    --
    putting the 'B' in LGBTQ+
    1. Re:Bummer, trees by dbdweeb · · Score: 3, Insightful

      You ARE just being facetious right?

      Tree structures are everywhere in computing... Like file systems... Like the DOM for every web page you have ever looked at is represented by a tree structure.

      As regards the coverage of M:M... Another post pointed out that it IS covered in the book.

      As regards the usage of M:M... That's just for high level conceptual modeling right? Surely you are not actually going to implement that way but will instead insert an intersect object AKA associative table, right? Database Programming 101 thorough covers this topic.

      Not only do DUH velopers need to stop thinking of the RDBMS as just a bucket to hold stuff, they desperately need to be know SQL and aspire to database programming beyond cutting more code. And even more significantly, they need to understand the importance of a good ERD so they don't fall into the trap of trying to implement a M:M.

  5. One year of SQL is significant experience? by Osty · · Score: 3, Insightful

    Perhaps that's what's wrong with database development these days (just check out The Daily WTF, as it seems they have a SQL example every other day). When a single year of experience is considered "significant" and "experienced", it's no wonder there are so many crap DBAs out there. We look for people with 5+ years of C# experience (ha! Good luck finding someone with more than 5 years experience ...) for intermediate-level developer positions. There's no way someone with only a year of SQL experience would qualify for an intermediate-level DBA position.

    Just as background, I've been doing development on SQL Server for 6 years now (from SQL 7 to SQL 2005). I'm still learning, still finding ways to improve my code's cleanliness and performance, still finding new things I can do in SQL. For example, SQL 2005 finally has CTEs, making it only the second database to implement that ANSI SQL99 standard. CTEs make it very easy to do things that were painfully hard before, like walking a tree or implementing a recursive algorithm over sets of data.

    After my fourth year of working with SQL, I'd have been willing to say I had "significant" experience with SQL. Four years is arbitrary -- it really depends on how much you work with it day to day. Someone may have "significant" experience after only two years, while someone else may not be significantly experienced until he's worked with SQL for eight years. If you had to put a number of years on what would constitute significant experience, I'd err on the safe side and go with three or four years. Certainly not just one year.

  6. Developers and SQL by DebianDog · · Score: 5, Insightful
    As a DBA, if developers would read... oh.... I dunno... just Chapter 1... the basics of SQL... of this book... any SQL book really AND understand "the basics"... My job would be 100 times easier!

    I spend much of my time explaining why a 5 page SQL statement "that takes a long time" is NOT A DATABASE PROBLEM!
    /rant

  7. Re:Useless to all but theoraticians by stoolpigeon · · Score: 4, Insightful

    There are differences on the different platforms, but there is a standard and standard syntax ought to work in any rdbms. When it doesn't (access is the first example that comes to mind) that is a sign that what you are working with is not as good a system as it should be. One of the things I really like about postgres is that it is very standards compliant.
     
    There is a transact sql book that I use frequently on multiple database systems. A small amount doesn't carry over, due to syntax differences. But the ideas on how to deal with sets of information in sql carry over. It appears that this book does that intentionally. And it should be useful in a very practical way if it is at all like the description.

    --
    It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
  8. SQL says what to do by booch · · Score: 5, Insightful

    there's a bigger gap between what the code says and what the code does

    That's stated incorrectly. With SQL, the code says what to do, but it does not say how to do it. That's the difference between "normal" procedural code and languages like SQL.

    --
    Software sucks. Open Source sucks less.
  9. Art is about creativity, not rote coding by Graboid · · Score: 5, Insightful

    Ahhh - but the best scientists are artists as well. (In fact, scientists and mathematicians often have more in common with artists than engineers).

    Sure, the mechanics of programming is rather dull and boring, but large scale system design often requires considerable creativity that is much better done by people not constrained by artificially perceived IT limitations.

    Coding J2EE isn't an art, but designing/building a massive neural net or complex, distributed game/simulation is. MySpace, Google, eBay, etc weren't concieved by 'classic' engineers, but, rather, by creative people who understood how technology can enable new paradigms.

    1. Re:Art is about creativity, not rote coding by gowen · · Score: 4, Funny

      Ah. You were doing so well, and then you said "paradigm".

      --
      Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  10. SQL Books by Municipa · · Score: 3, Insightful

    Pretty much every book on SQL I've seen only gives you obvious examples and covers the most simple uses. Every project I've worked on (for about 10 years) where there is pre-existing SQL written, almost all of it is written inefficiently. I'm not sure this book explains this kind of thing. But I've found 99%+ of the time you don't need to use a cursor, and it's almost always slower.

    SQL can do a lot more than most programmers ever try to do with it. There are a lot of clever tricks you can use exploiting its set based nature. The only place I've seen clever solutions beyond simple insert/delete/update statements is some of the trade magazines; the one for MS SQL Server sometimes has some very neat examples. These trade magazines have examples and ideas presented using the SQL language of a particular database, but it's almost always portable wihtout much work. I consider myself pretty good at SQL and even I find it's hard to learn more to get to the point where I can design clever SQL more frequently. Anyone else find that too?

    Another thing I've noticed is on some open source projects (and perhaps some closed source ones), particularly web based ones, there is displayed at the bottom the number of database queries used to generate the page. They are often 10 or more, which almost always seems ridiculous. I think there just aren't all that many people out there who understands what SQL can do, how it's different than procedural languages and how to use it beyond a simplistic straight forward approach. Hopefully this book helps explain that - I'll probably browse a bit the next time I'm in a book store.

  11. Spoken like a hacker, rather than a pro by PCM2 · · Score: 4, Insightful
    It's great to see a book that tells me SQL can do pretty much anything - but I pretty much already knew that. This book might be good for THEORY, but for actually getting useful and applicable information, the review leaves me wondering who would be a worthwhile reader.

    And yet, if you get out and talk to some of the real-world database consultants who get called in to clean up other people's messes, one of the complaints you hear again and again is that too many so-called DBAs learned their trade on a specific product, rather than understanding why databases work the way they do.

    Optimizations that you introduce into your applications to cater to specific products' features (or work around their shortcomings) may be a fact of life, but they make for poor design choices. You should know what you're doing first -- which means a good understanding of database theory -- and layer all that syntactic hot-rod stuff on later.

    --
    Breakfast served all day!
  12. Re:Useless to all but theoraticians by kfg · · Score: 5, Insightful

    . . .the review leaves me wondering who would be a worthwhile reader.

    Software engineers and Database Administrators.

    An intuitive "hackers" understanding of physics is perfectly sufficient to construct a gocart out of 2x4s and baby coach wheels, but automotive engineers find that a knowledge of "theory" is rather useful in getting practical work done.

    In fact if your software does not have a solid grounding in theory it may well be worse than useless, as software is nothing more than applied science. The computer is a mathematics engine. Nothing less, nothing more.

    If you do not understand the underlying structure of your high level language and the low level mathmatical theory below that you liable to make grevious mistakes in first selecting your high level tools, then in the specific models that you impliment with your code and then in your code itself.

    And be utterly clueless that you have done so.

    KFG

  13. bookpool by stoolpigeon · · Score: 4, Informative
    --
    It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
  14. Re:SELECT * FROM first_post; by Anonymous Coward · · Score: 5, Funny

    +---------+
    | You     |
    +---------+
    | Fail It |
    +---------+
    1 row in set (0.08 sec)

  15. Re:Where's the news? by PCM2 · · Score: 3, Informative
    Could Slashdot not post book reviews to the main section??

    I like book reviews.

    Homepage preferences are your friends.

    --
    Breakfast served all day!
  16. Theory not a dirty word by fm6 · · Score: 4, Insightful
    This book might be good for THEORY, but for actually getting useful and applicable information, the review leaves me wondering who would be a worthwhile reader.
    SQL theory is useful and applicable. It's just not complete: you also need the specifics of whatever SQL implementation you're using. For that you need to go to books about the specific RDBMS you're using. You can't expect a general SQL book to cover every implementation of the language, any more than you expect Stroustrop to tell you how to work with Visual C++.

    Not every programmer needs to be a computer scientist, but they do need to learn a little theory now and then. That's especially true when you're work with relational databases, which are full of weird abstractions and subtle performance issues. Not having looked at this particular book, I can't say whether its overkill for what most SQL people do. I can say that most database hackers don't seem to know as much theory as they should.

  17. Cheaper isn't everything by PCM2 · · Score: 5, Insightful

    In fact, if you have access to a local, independently-owned bookseller in your area, you should be buying your books there instead of online.

    Stacey's Books in San Francisco doesn't give me Amazon's 34 percent discount -- in fact, it gives me 10 percent -- but it is a wonderful resource and not one I'd like to see disappear.

    That's not hyperbole either. This year we've seen two classic, quality Bay Area bookstores close their doors: Cody's on Telegraph Avenue in Berkeley and A Clean, Well-Lighted Place for Books on Van Ness in San Francisco. These were not holes in the wall; they were spacious, carried a lot of stock and had served their communities well for years. (And believe me, the Bay Area in general buys a lot of books.)

    The reality is that the book market is changing. Superstores like Borders and Barnes and Noble have a lot to do with it, and so does Amazon. Another factor is the overall decline in book sales to the American public. People walk into Borders to buy DVDs of Friends and they pick up a paperback of Harry Potter at the same time. That's not the model I want my booksellers to be based around; I want to support local businesses that understand their communities and are dedicated to selling books.

    This is not to knock Amazon, or Borders or B&N for that matter; in communities where those are the only option, it's better to have someplace to buy books than no place at all. I still buy plenty of stuff at Amazon. But for books, I vote with my wallet.

    --
    Breakfast served all day!
  18. procedural programming by jbgreer · · Score: 4, Insightful

    "One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does."

    Well, certainly one difference between SQL and a conventional procedural programming language is that SQL isn't procedural, it's declarative. One describes the data a query such produce, rather than state a set of steps necessary to achieve a desired result.

    jbgreer

    --
    The Norton Anthology of English Literature, 4th Ed., Vol 2
  19. Re:Useless to all but theoraticians by truthsearch · · Score: 5, Insightful

    I disagree, but only to a small extent. I have extensive experience with MS SQL, Oracle, and mySQL. The basics of retrieving information are the same across all, but change very much when working on large systems. Select queries have to be written very differently on each system when tables get huge. For example, Oracle scripts with cursors are often much faster then regular joins if you know your data well. Yet on MS SQL cursors are the slowest way to go. On mySQL using temp tables in memory often outperforms outer joins, but not in the same cases as MS SQL.

    When working in the extremes the strengths and weaknesses of each system have to be considered.

  20. Re:sql vs. procedural by cruachan · · Score: 3, Interesting

    Firstly most production databases contain some denormalization. Indeed the art of designing a real database is knowing where and when to denormalize data. How much denormalization is required is dependent upon the database, access paths and application usage and is rarely more than a few fields or a table or two. Nevertheless real production databases that have been correctly denormalized often run orders of magnitude faster than those that rigidly stick to 3rd normalization throughout.

    Secondly what you are asking for is generally straightforward in any real dialect of SQL. Select distinct works fine, as do various scenarios with subselects and group by / having clauses (having is the most overlooked of the standard SQL clauses and it's use generally signifies you are using code written by someone who knows what they are doing).

    However if you have a good dbms to hand that implements user defined functions then usually the best way if to create a function that returns the uid of the record from the multiple recordset you require (i.e. last payroll record for employee x) and use that in the where clause.

    OTOH if you are stuck with MySQL then the first step you have to take is upgrade to Postgres :-)

  21. Re:SQL fun by cruachan · · Score: 3, Interesting

    Don't think I completely agree. True writing SQL to second-guess the optimizer in detail is deadly and pointless with modern rdbms' anyway (but Oracle 5, where you really had to isn't that many years ago). Nevertheless having a feel for how optimizers work is good. For instance setting up your joins on indexed fields or being aware of where the optimizer will use a full table scan and when that is a problem. On of my favourite tricks for example is to use an index to avoid a table access - which can pay mega dividends on large datasets. For example suppose we have a table which contains employee data and is index on an ID. I know that I regularly require a further field from this table - say insurance number. By setting an index on ID and Insurance Number the optimizer saves a record access for each instance when Insurance Number must be retrieved. That's a simple example, but the theme can be extended quite significantly