The Art of SQL
Graeme Williams writes "One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does. The Art of SQL is the opposite of a cookbook – or rather it's about cooking rather than recipes. It's not a reference manual, although there's plenty to refer back to. It's an intermediate level book which assumes you know how to read and write SQL, and analyzes what SQL does and how it does it." Read on for Graeme's review.
The Art of SQL
author
Stéphane Faroult with Peter Robson
pages
xvi + 349
publisher
O'Reilly Media
rating
9
reviewer
Graeme Williams
ISBN
0-596-00894-5
summary
An excellent way to improve your approach to SQL
I guess it's normal for an intermediate text to present a number of serious examples, the idea being that the code from an example can be applied to roughly similar problems with roughly similar solutions. I think Faroult's goal is both more abstract and more ambitious. He wants to expand your ability to navigate among and analyze alternative SQL statements with more confidence and over a larger range. This isn't so much a book about SQL as it is about thinking about SQL.
There's almost no chance that the SQL examples in the book will be directly applied to a real problem. The examples are relevant at one remove: What does thinking about this example tell me about thinking about my current problem? So the book doesn't come with downloadable samples. There's no point.
The first few chapters of the book lay a foundation for the rest. As each brick in this foundation is placed, it sometimes feels as though it's placed firmly on your head. Think about indexes ... whack! Think about join conditions ... whack! These chapters have very few examples – the goal is to force you to think through queries from first principles. It's more effective (and less painful) than it sounds.
These introductory chapters cover how a query is constructed and executed, including how a query optimizer uses the information which is available to it. Faroult discusses the costs and benefits of indexes, and the interaction of physical layout with indexes, grouping, row ordering and partitioning. He also explains the difference between a purely relational query and one with non-relational parts, and how such a query can be analyzed in layers. Chapter 4 is available on the book's web page. It will give you a good idea of the style of the book, but not of the level of SQL discussed – the longest example in the chapter is just 15 lines.
Chapter 6 presents and analyzes nine SQL patterns, from small result sets taken from a few tables, to large result sets taken from many tables. The chapter falls roughly in the middle of the book, and feels like its heart. Prior chapters have built up to this one, and subsequent chapters are elaborations on particular topics. The theme of the book, to the extent that it has one, is that details matter. Different SQL statements can be used to produce the same result, but their performance will be different depending on details of the data and database. A change to the database structure, such as adding an index, might improve performance in one set of circumstances, but make it worse in another. The case analysis in this chapter will make you more sensitive to details in query design and execution.
The authors almost never mention particular database products. Their justification is that any absolute statement would be invalidated by the next release, or even a different hardware configuration, and anyway, that's not the business they're in. But sometimes this can go too far. The phrase "A clever optimizer ... will be able to" is too hypothetical by half. Is this an existing hypothetical query optimizer, or a vision of a future optimizer? Or the optimizer of one hypothetical database product and not of another? I suspect that Faroult knows and is simply being coy. It's just unhelpful not to tell us what existing databases will do, even if depends on the release or the hardware.
Faroult does this because he's not much interested in telling you what actually happens when a particular SQL statement is executed by a particular database. If the authors wanted a cute title for the book, I'm surprised they passed over The Zen of SQL Maintenance. When you look at an SQL statement, Faroult wants you to see what other SQL statements would do under other circumstances. He literally wants you to see the possibilities.
The second half of the book continues the analysis of chapter 6 into special cases, such as OLAP and large volumes of data, monitoring and resolving performance issues, and debugging problematic SQL.
Chapter 7 discusses tree-structured data, like an employee table with a column for the employee's manager. Faroult likes his own solution best, but presents an alternative approach by Joe Celko clearly enough for you to explore that as well.
Chapter 8 includes a series of examples of SQL and PHP. For anyone like me who spends more time in various programming languages than in SQL, this chapter is a small gem. It nicely illuminates the care needed in deciding what happens in code and what happens in SQL.
Chapter 9 addresses locking and concurrency, as it applies to both physical and logical parallelism. Transactions are included, but the discussion is just one part of a 20-page chapter and seems thin.
The Art of SQL is very clearly written. Whether it is "easy" will depend on how comfortable you are with SQL. This book is targeted at (page xi) "developers with significant (one year or, preferably, more) experience of development with an SQL database", their managers and software architects. I have months of experience spread over a decade or more, so I'm nominally outside the target audience. I found the SQL examples and discussion clear once I had a chance to let them sink in. If you're working with SQL regularly, they'll be perfectly clear.
The graphs let down the otherwise high quality of the book. For example, Figure 5-3 shows a rate (higher is better) but the legend says "Relative cost" (higher is worse). Figures 9-1 through 9-3 on facing pages 228 and 229 show response time histograms for three different query rates but don't show what the rates are. The x-axis of Figure 10-1 seems to be calendar time, but it's decorated with a stop watch icon. And as a representative of rapidly aging boomers with rapidly deteriorating eyesight, could I beg book designers not to put figure legends in a smaller font than the text of the book? Diagrams should be simple and clear, not something to puzzle over.
This is a book to conjure with, but it's not a book for everyone. Some people may find it too abstract, with too much discussion of too few examples. If you're completely new to SQL, the book will be hard going. If you have very many years of experience with SQL, it's just possible that you won't find anything new in the book, although I expect you'll find a lot to think about. For anyone in between, The Art of SQL is a excellent way to improve the way you attack problems in database and query design.
You can purchase The Art of SQL from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
I guess it's normal for an intermediate text to present a number of serious examples, the idea being that the code from an example can be applied to roughly similar problems with roughly similar solutions. I think Faroult's goal is both more abstract and more ambitious. He wants to expand your ability to navigate among and analyze alternative SQL statements with more confidence and over a larger range. This isn't so much a book about SQL as it is about thinking about SQL.
There's almost no chance that the SQL examples in the book will be directly applied to a real problem. The examples are relevant at one remove: What does thinking about this example tell me about thinking about my current problem? So the book doesn't come with downloadable samples. There's no point.
The first few chapters of the book lay a foundation for the rest. As each brick in this foundation is placed, it sometimes feels as though it's placed firmly on your head. Think about indexes ... whack! Think about join conditions ... whack! These chapters have very few examples – the goal is to force you to think through queries from first principles. It's more effective (and less painful) than it sounds.
These introductory chapters cover how a query is constructed and executed, including how a query optimizer uses the information which is available to it. Faroult discusses the costs and benefits of indexes, and the interaction of physical layout with indexes, grouping, row ordering and partitioning. He also explains the difference between a purely relational query and one with non-relational parts, and how such a query can be analyzed in layers. Chapter 4 is available on the book's web page. It will give you a good idea of the style of the book, but not of the level of SQL discussed – the longest example in the chapter is just 15 lines.
Chapter 6 presents and analyzes nine SQL patterns, from small result sets taken from a few tables, to large result sets taken from many tables. The chapter falls roughly in the middle of the book, and feels like its heart. Prior chapters have built up to this one, and subsequent chapters are elaborations on particular topics. The theme of the book, to the extent that it has one, is that details matter. Different SQL statements can be used to produce the same result, but their performance will be different depending on details of the data and database. A change to the database structure, such as adding an index, might improve performance in one set of circumstances, but make it worse in another. The case analysis in this chapter will make you more sensitive to details in query design and execution.
The authors almost never mention particular database products. Their justification is that any absolute statement would be invalidated by the next release, or even a different hardware configuration, and anyway, that's not the business they're in. But sometimes this can go too far. The phrase "A clever optimizer ... will be able to" is too hypothetical by half. Is this an existing hypothetical query optimizer, or a vision of a future optimizer? Or the optimizer of one hypothetical database product and not of another? I suspect that Faroult knows and is simply being coy. It's just unhelpful not to tell us what existing databases will do, even if depends on the release or the hardware.
Faroult does this because he's not much interested in telling you what actually happens when a particular SQL statement is executed by a particular database. If the authors wanted a cute title for the book, I'm surprised they passed over The Zen of SQL Maintenance. When you look at an SQL statement, Faroult wants you to see what other SQL statements would do under other circumstances. He literally wants you to see the possibilities.
The second half of the book continues the analysis of chapter 6 into special cases, such as OLAP and large volumes of data, monitoring and resolving performance issues, and debugging problematic SQL.
Chapter 7 discusses tree-structured data, like an employee table with a column for the employee's manager. Faroult likes his own solution best, but presents an alternative approach by Joe Celko clearly enough for you to explore that as well.
Chapter 8 includes a series of examples of SQL and PHP. For anyone like me who spends more time in various programming languages than in SQL, this chapter is a small gem. It nicely illuminates the care needed in deciding what happens in code and what happens in SQL.
Chapter 9 addresses locking and concurrency, as it applies to both physical and logical parallelism. Transactions are included, but the discussion is just one part of a 20-page chapter and seems thin.
The Art of SQL is very clearly written. Whether it is "easy" will depend on how comfortable you are with SQL. This book is targeted at (page xi) "developers with significant (one year or, preferably, more) experience of development with an SQL database", their managers and software architects. I have months of experience spread over a decade or more, so I'm nominally outside the target audience. I found the SQL examples and discussion clear once I had a chance to let them sink in. If you're working with SQL regularly, they'll be perfectly clear.
The graphs let down the otherwise high quality of the book. For example, Figure 5-3 shows a rate (higher is better) but the legend says "Relative cost" (higher is worse). Figures 9-1 through 9-3 on facing pages 228 and 229 show response time histograms for three different query rates but don't show what the rates are. The x-axis of Figure 10-1 seems to be calendar time, but it's decorated with a stop watch icon. And as a representative of rapidly aging boomers with rapidly deteriorating eyesight, could I beg book designers not to put figure legends in a smaller font than the text of the book? Diagrams should be simple and clear, not something to puzzle over.
This is a book to conjure with, but it's not a book for everyone. Some people may find it too abstract, with too much discussion of too few examples. If you're completely new to SQL, the book will be hard going. If you have very many years of experience with SQL, it's just possible that you won't find anything new in the book, although I expect you'll find a lot to think about. For anyone in between, The Art of SQL is a excellent way to improve the way you attack problems in database and query design.
You can purchase The Art of SQL from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
I know Amazon has software patents and all, but this (and just about every other book I see reviewed here) is ~20% cheaper at Amazon than it is at BN...
If you think SQL is an "art," you are a hack. Designing proper databases and the SQL to use them optimally falls under the domain of science/engineering. 95% of developers see relational databases simply as a means for a persistent data store, but that's not what it was designed to do. If you don't know engineering (what you do when designing functional systems*) from art (painting pictures, etc) you should have gone to a better college.
See this page for a start on the science of databases.
*Yes, I know creativity is usually involved when designing things. That doesn't make it art.
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
"for SQL there's a bigger gap between what the code says and what the code does"
I couldn't agree more. Sometimes while working in SQL I really wish I had a time machine and a rubber hose.
Chapter 7 discusses tree-structured data
Looks like no discussion of many-to-many relationships. This would make any book on databases and sql queries of limited value, not much more than a beginner book.
Trees are of limited value, they only exist in special circumstances. If you stick to tree structured data relations then you will almost always have to do wierd hacks that may threaten data integrity.
While many-to-many *seems* harder, as a data model M:M is often a much better practicle solution. As well as modeling the reality of the situation in a much more accurrate manner.
My $.02
putting the 'B' in LGBTQ+
I can't mod you up because I don't have any mod points, but I agree that I preferred the old one.
To keep this at least somewhat on topic, the table with the book information seems to have 0 margins/padding, making it a little ugly/difficult to read.
Perhaps that's what's wrong with database development these days (just check out The Daily WTF, as it seems they have a SQL example every other day). When a single year of experience is considered "significant" and "experienced", it's no wonder there are so many crap DBAs out there. We look for people with 5+ years of C# experience (ha! Good luck finding someone with more than 5 years experience ...) for intermediate-level developer positions. There's no way someone with only a year of SQL experience would qualify for an intermediate-level DBA position.
Just as background, I've been doing development on SQL Server for 6 years now (from SQL 7 to SQL 2005). I'm still learning, still finding ways to improve my code's cleanliness and performance, still finding new things I can do in SQL. For example, SQL 2005 finally has CTEs, making it only the second database to implement that ANSI SQL99 standard. CTEs make it very easy to do things that were painfully hard before, like walking a tree or implementing a recursive algorithm over sets of data.
After my fourth year of working with SQL, I'd have been willing to say I had "significant" experience with SQL. Four years is arbitrary -- it really depends on how much you work with it day to day. Someone may have "significant" experience after only two years, while someone else may not be significantly experienced until he's worked with SQL for eight years. If you had to put a number of years on what would constitute significant experience, I'd err on the safe side and go with three or four years. Certainly not just one year.
I spend much of my time explaining why a 5 page SQL statement "that takes a long time" is NOT A DATABASE PROBLEM!
/rant
Many so-called book reviews on Slashdot fail to review the book. Instead, they simply state what each chapter covers. This review is actually useful. It describes the book's target audience, gives a sense of what the book does and doesn't contain, and helps me understand whether the book would be useful to me. Thanks!
There are differences on the different platforms, but there is a standard and standard syntax ought to work in any rdbms. When it doesn't (access is the first example that comes to mind) that is a sign that what you are working with is not as good a system as it should be. One of the things I really like about postgres is that it is very standards compliant.
There is a transact sql book that I use frequently on multiple database systems. A small amount doesn't carry over, due to syntax differences. But the ideas on how to deal with sets of information in sql carry over. It appears that this book does that intentionally. And it should be useful in a very practical way if it is at all like the description.
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
there's a bigger gap between what the code says and what the code does
That's stated incorrectly. With SQL, the code says what to do, but it does not say how to do it. That's the difference between "normal" procedural code and languages like SQL.
Software sucks. Open Source sucks less.
Ahhh - but the best scientists are artists as well. (In fact, scientists and mathematicians often have more in common with artists than engineers).
Sure, the mechanics of programming is rather dull and boring, but large scale system design often requires considerable creativity that is much better done by people not constrained by artificially perceived IT limitations.
Coding J2EE isn't an art, but designing/building a massive neural net or complex, distributed game/simulation is. MySpace, Google, eBay, etc weren't concieved by 'classic' engineers, but, rather, by creative people who understood how technology can enable new paradigms.
Pretty much every book on SQL I've seen only gives you obvious examples and covers the most simple uses. Every project I've worked on (for about 10 years) where there is pre-existing SQL written, almost all of it is written inefficiently. I'm not sure this book explains this kind of thing. But I've found 99%+ of the time you don't need to use a cursor, and it's almost always slower.
SQL can do a lot more than most programmers ever try to do with it. There are a lot of clever tricks you can use exploiting its set based nature. The only place I've seen clever solutions beyond simple insert/delete/update statements is some of the trade magazines; the one for MS SQL Server sometimes has some very neat examples. These trade magazines have examples and ideas presented using the SQL language of a particular database, but it's almost always portable wihtout much work. I consider myself pretty good at SQL and even I find it's hard to learn more to get to the point where I can design clever SQL more frequently. Anyone else find that too?
Another thing I've noticed is on some open source projects (and perhaps some closed source ones), particularly web based ones, there is displayed at the bottom the number of database queries used to generate the page. They are often 10 or more, which almost always seems ridiculous. I think there just aren't all that many people out there who understands what SQL can do, how it's different than procedural languages and how to use it beyond a simplistic straight forward approach. Hopefully this book helps explain that - I'll probably browse a bit the next time I'm in a book store.
One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does.
What? My SQL code tends to do exactly what the code says it will, are you trying to say that it's a high level language or am I missing something here?
... trying to read the sample chapter screws up the tab holding that page in firefox (1.5.0.4) - scrolling no longer works for that tab until you either close it or go to another url, etc. It's too bad, it sounds like a good read.
And yet, if you get out and talk to some of the real-world database consultants who get called in to clean up other people's messes, one of the complaints you hear again and again is that too many so-called DBAs learned their trade on a specific product, rather than understanding why databases work the way they do.
Optimizations that you introduce into your applications to cater to specific products' features (or work around their shortcomings) may be a fact of life, but they make for poor design choices. You should know what you're doing first -- which means a good understanding of database theory -- and layer all that syntactic hot-rod stuff on later.
Breakfast served all day!
I'm slowly working my way through it; it's a great book on a number of levels. The writing itself is very nice, with a real personality showing through and not just the usual dry technical flavor. The illustrations are done in a nifty "drawing" style that looks good and portrays the data well. The technical insights are very helpful; after reading what I've moved through so far I've rewritten some of my Rails code to be more efficient.
I highly recommend this book; the $40 you'll spend on it will be repaid the first time you delete a swath of Java looping code and replace it with an additional subquery. If I can do half as well on my next book I'll consider it a job well done.
The Army reading list
. . .the review leaves me wondering who would be a worthwhile reader.
Software engineers and Database Administrators.
An intuitive "hackers" understanding of physics is perfectly sufficient to construct a gocart out of 2x4s and baby coach wheels, but automotive engineers find that a knowledge of "theory" is rather useful in getting practical work done.
In fact if your software does not have a solid grounding in theory it may well be worse than useless, as software is nothing more than applied science. The computer is a mathematics engine. Nothing less, nothing more.
If you do not understand the underlying structure of your high level language and the low level mathmatical theory below that you liable to make grevious mistakes in first selecting your high level tools, then in the specific models that you impliment with your code and then in your code itself.
And be utterly clueless that you have done so.
KFG
bookpool has it cheaper than amazon.
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
+---------+
| You |
+---------+
| Fail It |
+---------+
1 row in set (0.08 sec)
I like book reviews.
Homepage preferences are your friends.
Breakfast served all day!
I have found (and who can disagree (just trolling)) that at least half of the production databases that I have come across hare not normalized. Go figgure.
Anyway, this being the case, I have found that SQL is poor in handling a non-normalized table/database. (cant really call a non-normalized table as a database can we? (nuther troll))
For example. We keep a complete record for each person for each pay period. Even inactives.
I am asked to give a list of all active employees for a date range, and a lot of payroll detail, personal detail, etc. Guess what? Simple SQL gives a lot of duplicate names. I wish that there was a simple way filter. (Yes, I can do this in sql, but my point is that it is not handled natively in sql. I would like a simple command - give me all names and all their data for the latest pay period - something like that.
All procedural languages will handle this problem nicely.
metaphors be with you
Not every programmer needs to be a computer scientist, but they do need to learn a little theory now and then. That's especially true when you're work with relational databases, which are full of weird abstractions and subtle performance issues. Not having looked at this particular book, I can't say whether its overkill for what most SQL people do. I can say that most database hackers don't seem to know as much theory as they should.
Wouldn't that rather be "SELECT TOP 1 FROM Posts WHERE sid=#06/06/07/194246#" ?
I think we can keep recursing like this until someone returns 1
You can get "Art of SQL" cheapest at Buy.com, see:
Lowest Prices for 'Art of SQL'
hmmmm...let's test that:
SELECT Creativity.Passion, Creativity.Insightfulness, Ability.Palette, Ability.Colorscheme FROM Creativity INNER JOIN Ability ON Creativity.AbilityID = Ability.AbilityID WHERE Creativity.Passion = "Mediocre";
Result Set:
Creativity.Passion | Creativity.Insightfulness | Ability.Palette | Ability.Colorscheme
Mediocre | Dreamer | Basic | Shit Brown
I have way too much time on my hands.
In fact, if you have access to a local, independently-owned bookseller in your area, you should be buying your books there instead of online.
Stacey's Books in San Francisco doesn't give me Amazon's 34 percent discount -- in fact, it gives me 10 percent -- but it is a wonderful resource and not one I'd like to see disappear.
That's not hyperbole either. This year we've seen two classic, quality Bay Area bookstores close their doors: Cody's on Telegraph Avenue in Berkeley and A Clean, Well-Lighted Place for Books on Van Ness in San Francisco. These were not holes in the wall; they were spacious, carried a lot of stock and had served their communities well for years. (And believe me, the Bay Area in general buys a lot of books.)
The reality is that the book market is changing. Superstores like Borders and Barnes and Noble have a lot to do with it, and so does Amazon. Another factor is the overall decline in book sales to the American public. People walk into Borders to buy DVDs of Friends and they pick up a paperback of Harry Potter at the same time. That's not the model I want my booksellers to be based around; I want to support local businesses that understand their communities and are dedicated to selling books.
This is not to knock Amazon, or Borders or B&N for that matter; in communities where those are the only option, it's better to have someplace to buy books than no place at all. I still buy plenty of stuff at Amazon. But for books, I vote with my wallet.
Breakfast served all day!
umm... dude? SQL has been around since the mid 50's. A guy at IBM developed it. Now, it was made before high level languages, and brother, that's why SQL is anchronistic and irreperably flawed.
Yep, i said it.
I guess you skipped all the threads about computer science vs. programming and uni degrees vs a tech certificate.
Specifically regarding 'the best way to do x,' that may depend to a certain extent on the specifics of the platform at hand, but why do x? What do you hope to achieve? What are the desired results? Why not do y? If your thinking hasn't progressed past "basic syntax" you're not a hacker, you're a button pusher. Bang on your keyboard, you might as well be pounding rocks into gravel.
This book might be good for THEORY, but for actually getting useful and applicable information...
What do you think "useful and applicable information" is?? Think about driving a car. (*ducks* Yeah, the car analogies are played out.) The specifics of each make and model--the dashboard layouts, placement of controls--are your "basic syntax." These details are not the things you really need to know in order to learn how to drive. The THEORY of driving--concepts of acceleration, braking, steering--are the things you need to know BEFORE you can make proper use of the "basic syntax."
The reader for whom this book would be a worthwhile read is the person with an understanding that theory IS useful and applicable information.
"One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does."
Well, certainly one difference between SQL and a conventional procedural programming language is that SQL isn't procedural, it's declarative. One describes the data a query such produce, rather than state a set of steps necessary to achieve a desired result.
jbgreer
The Norton Anthology of English Literature, 4th Ed., Vol 2
The reviewer managed to miss something that's pretty important: the authors are totally infatuated with Sun Tzu's The Art of War, even to the point of copying that book's chapter titles. Which is evidence either that they're educated people building on age-old wisdom — or they're half-educated dweebs copying a book that's faddish right now.
One difference between SQL and a conventional procedural programming language is that for SQL there's a bigger gap between what the code says and what the code does
Or more importantly SQL is not a procedural programming language at all. Please don't try to compare the two together at all, it just leads to misconceptions about what SQL is and how it works.
I like the look of this book quite a bit, judging soley from the sample chapter. It talks in a straight-forward manner about the factors that determine how a database goes about it's job and how you can make that job easier or harder. If the rest of the book plays out similarly then thorough understanding of this book as well as Tom Kyte's would make for a programmer I'd love to hire.
References to Knuth's volumes titled "The Art of Computer Programming" http://www-cs-faculty.stanford.edu/~uno/taocp.html
are sprinkled liberally throughout many, many papers in computer science, especially wrt algorithms. It's more of an abstract art, as opposed to the "physical" arts like paintings and sculptures. You can't ignore the engineering aspect of it, yes, but if you manage to engineer a system well AND do so with simplicity, elegence, and creativity ... well, that distinguishes the true progessional.
EVERYTHING is transferable. That is, everything you've actually learned, everything you understand. If you're just mashing buttons, yeah, you might be a little lost when the buttons change. When telephones changed from rotary dial to push buttons, some people were still able to make calls. If course the basic syntax changed, and knuckle-draggers like the folks who modded the parent comment Insightful were SOL. But most folks who had some ideas about the THEORY of the telephone--that the little spinning disk on the phone didn't make the actual call but rather transferred information, and the buttons were just a new way of transferring the same information--adapted and moved on.
The fact that a computer even let such a concept be typed and communicated gives me hope for the day when machines rule the Earth, that they just might have enough of a sense of humor, or pity, to allow us humans to remain in their midst.
I disagree, but only to a small extent. I have extensive experience with MS SQL, Oracle, and mySQL. The basics of retrieving information are the same across all, but change very much when working on large systems. Select queries have to be written very differently on each system when tables get huge. For example, Oracle scripts with cursors are often much faster then regular joins if you know your data well. Yet on MS SQL cursors are the slowest way to go. On mySQL using temp tables in memory often outperforms outer joins, but not in the same cases as MS SQL.
When working in the extremes the strengths and weaknesses of each system have to be considered.
Developers: We can use your help.
You confuse syntax vs. execution. Your statement is equivalent to saying that since C++ and PHP have different syntaxes, there is no point in studying algorithms or design patterns. Would you agree with this statement as well?
All relational databases rely on predicate calculus at the end of the day. Understanding how relations work is fundamental to understanding what happens when you write something like "select A.x, B.y from A.B where A.z=B.z" Similarly understanding things like b-trees and hashing functions will aid you in both schema design and query optimization. Understanding the theory helps you make the right kind of design. Your design may be implemented differently on different DBs, but simply having knoweldge of a particular DBs syntax will not help you make the right design choices.
If you're worried about syntax variations across databases, then this is clearly not the book for you. However, once you're past syntax you need a book like this -- and I haven't seen another like it. The author is talking about how SQL works. What's the implication of using a correlated-IN clause vs. a correlated-EXISTS clause? Regardless of the syntax of a particular SQL dialect it is crucial that you understand these sorts of things unless you want to stare at the db blindly and wonder why it's slow.
I have to recommend a good under $20.us book to go with it.
http://www.powells.com/biblio?isbn=0071359532
Having to work for a living is the root of all evil.
that makes sense. and i think there is a lot of value in becoming an expert on an rdbms. but think of the vast majority of databases, and the people working on them. i haven't run into many that weren't junk. in fact i'd be willing to wager, if there were ever a way to prove or disprove the assertion, that the vast majority of databases in existence are access databases made by people with no education in relational theory and the tables look just like spreadsheets.
i just started a new job this week and i'll be working as a dba on a very large system running oracle. i'm really looking forward to it. it is a huge step up for me and i have a lot to learn. but i have no doubt that i'll also be putting in time on the side on much smaller projects and doing my best to explain to people why i have these 'crazy' rules about how to design a database or how best to get information out of it. many of these people will be developers-- i have no doubt of that.
It's hard to believe that's how Micronians are made. Why don't we see it right now by having you both kiss one another?
SQL was created in the 1970's. So it is possible to have many years experience with SQL. Heck, it was adopted as a standard in 1986. That is plenty of time to accumulate much experience with SQL.
For he today that sheds his blood with me shall be my brother.
For theory books I think you may do better with a book that has litle to do with sql. I find Database in Depth by C.J. Date a nice theory book
Bills of Materials lend themselves perfectly well to tree structures.
You better watch out, there may be dogs about . .
You don't need a whole book for this - you need a magazine article. A whole book on theory is much less useful than maybe a book with a chapter on theory and a whole lot of chapters on practical appications specific to a given engine.
The problem is that the differences in the engines reach SO DEEP and affect so much that you actually could have a book on theory - for each one of them.
Actually, they don't, in the sense that there is no one single built-in command to handle this case. Just like in SQL, you (or someone else) has to write the function that performs a 'select distinct' equivalent.
Unless you're using a targeted-product (one built specifically for your data needs), nothing you do will be handled natively in any language. You can build this functionality by using correct SQL or writing the appropriate functions in a procedural language.
And why don't you consider 'select distinct' a built-in function?
Storing things as adjacency lists (which, obviously, is an M2M table where the node properties live in their own normalized table) tends to be faster in the long run for all but the largest and most active trees.
;-)
Nested sets are cool, and I've implemented them (in MySQL 4.1 no less), but at the end of the day, traversing a graph happens far more often and more usefully.
This seems to be where the CS majors separate from the rest of the crowd. Point out that they ought to know how to do this unless they failed 2nd year
Remember that what's inside of you doesn't matter because nobody can see it.
Are we talking parent-child hierarchy tables? If so, Oracle's had statements to take care of that for a long time, since 1998 or so. Perhaps not ANSI standard, but they get the job done.
You better watch out, there may be dogs about . .
Grevious? As in General Grevious? You were thinking of the lightsaber post above, weren't you?
Were you thinking of grievous perhaps?
Exactly. In theory, you can write good SQL and each system will do the right thing. In practice, that ain't the case. For instance, I had a query which joined a bunch of tables together to come up with a small result set containing duplicate rows. Ran fine, but I wanted to get rid of the dupes. Add "DISTINCT". Whoops, all of a sudden the query takes forever. "Explain Plan" tells me it's now for some reason ignoring indexing and doing full table scans and that sort of thing. The "solution" was an ugly hack like
...))
SELECT DISTINCT foo, bar, baz from (SELECT foo, bar, baz from
explicitly telling it to create the result set and THEN eliminate duplicates
Contrariwise, the original query performed fine with an ORDER BY clause -- and if I added a "DISTINCT" to that one, it continued to perform fine.
Tell me that sort of thing is applicable across databases. Actually, please don't. I don't do queries any more and I'm glad of it.
. . .but I didn't NEED to go to school to learn this . . .
.you actually could have a book on theory - for each one of them.
Who said anything about going to school? I am a vociferous advocate of the library fine model of education. Most of what you learn in school these days, even at the tertiary level, is just plain wrong. At least in math, physics and chemistry we still require that you test and verify what's in the text book at the lowest levels.
. .
And here it is, written from the point of view of the practioner:
Practical Issues in Database Management
Perhaps if you read it you will gain a better understanding of the very concept of "theory." Your comment reveals you to be a bit weak on this point.
KFG
If you are going to go with the cooking/food reference then I do not think SQL is like cooking. SQL is like ordering at a restuarant, where the restuarant is your DBMS. It's like programming in prolog. You don't tell prolog what to do; you tell it what you want. Just a thought.
I am a certified dysxleci and dysgraiphc. The testing standards are very strinjint.
KFG
Don't think I completely agree. True writing SQL to second-guess the optimizer in detail is deadly and pointless with modern rdbms' anyway (but Oracle 5, where you really had to isn't that many years ago). Nevertheless having a feel for how optimizers work is good. For instance setting up your joins on indexed fields or being aware of where the optimizer will use a full table scan and when that is a problem. On of my favourite tricks for example is to use an index to avoid a table access - which can pay mega dividends on large datasets. For example suppose we have a table which contains employee data and is index on an ID. I know that I regularly require a further field from this table - say insurance number. By setting an index on ID and Insurance Number the optimizer saves a record access for each instance when Insurance Number must be retrieved. That's a simple example, but the theme can be extended quite significantly
What did Aristotle say about engineering disciplines?
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
SELECT FIELD FROM TABLE GROUP BY FIELD HAVING COUNT(FIELD) > 1
Works ON MySql, MSSQL, Oracle, PostgrSQL, DB2, SqlAnywhere, MaxDB etc etc etc.... dupes are easy to find using ANSI Sql, and most other stuff is as well
My mother never saw the irony in calling me a son-of-a-bitch.
That's funny, because I was just thinking it's odd that this book has no theory in it at all. At least in the review I saw no mention of the definition of ACID, the compromises at different transaction isolation levels, Codd's 12 rules for relational databases, Codd's original notation for relational algebra and relational calculus (of which SQL is an approximation), or normal forms.
And it turns out that this theory is useful and applicable. If you haven't caught on yet, I'm disappointed by this omission. A lot of people write horrible systems because they do not understand transactions, how to normalize a database schema, or why constraints are so important.
If you need specifics on RDBMS implementations, look at this comparison website. It's not that long, and it basically fills in the gaps left by this book.
You can usually write standard SQL statements and run them on PostgreSQL, MS SQL Server, Oracle, and DB2. You can certainly come up with Oracle statements that don't run on PostgreSQL - e.g., by using their alternate syntax for left joins that predates standardization - but presumably this book teaches you the standard stuff. That's all you need in most situations, and it's all they can give without you without having to update the book every six months.
Microsoft Jet SQL (of Access fame) has a few cosmetic differences in syntax. (IIRC, quoting is different.) If that's enough to seriously set you back, you'd be in trouble even if the book did duplicate all the examples for you.
MySQL is the only real oddball, and even they are starting to learn that this SQL thing is useful after all. If you want to work with older MySQL installations, get a book on MySQL, throw out any knowledge you have of how to do things properly, and give up on portability altogether. Peculiarities in its performance characteristics made projects like phpBB do bizarre things like mantain parallel table structures for each forum in a messageboard. That's totally against the relational model, and there are lots of consequences...
Regarding > "there's a bigger gap between what the code says and what the code does." I think that's a typo. It should read...
there's bigger CRAP between what the code says and what the code does.
There's a lot of code in the RDBMS and normally you shouldn't have to delve into the RDBMS' source... But you should know what it does and how to use it.
I was once on a project where a DUHveloper needed to perform an unnatural sort on a key column. He needed to display the query results where certain rows always needed to be sorted to the end of the result set but there were no column values to meet the criteria. He had this HUGE amount of nested if statements that he had been working on for days. After I inquired as to what he was wasting all of his time on I showed him how to create a sort non-displayed column where you derive a value based on a CASE statement. I accomplished in 5 minutes what he had struggled for days on just because he didn't really know much about SQL.
I've corresponed with Mr. Faroult on several occasions, I've used many of his scripts, and I've received a lot of email help from him... So based on my experience I'm betting his book is pretty good.
In my experience there's a point in all rdbms systems for table size which once crossed causes significant differences in table handling and requires serious DBA skills to optimize the system correctly. Generally this seems to be in the region of 200,000 to 1,000,000 records. Below that as long as you are paying reasonable attention to indexes you are fine, above that it's important that the dba understands quite deeply the table structure, data, indexes and access methods, and otimizes correctly.
There's a multiplicity of reasons why this might be so, but it's quite noticable sometimes how a relatively small increase in table size can move a table over the edge from being trivial to handle to requiring some insight.
Early in my career (before I become a Microsoft shill), I decided to forego learning SQL as a language and simply rely on MS Access instead.
I know that Transact SQL book you're talking about, and thought of it when I read this review. That's the best book on SQL I've ever read (by far) and I learned more about Oracle SQL from reading it than I have from reading at least a dozen more specifically Oracle-centric books. Fantastic book, I think it's called Advanced Transact SQL or some such. There's some proprietary stuff in there that's specific to MS SQL, but mostly it's a great overview of various ways to get a specific result set, and various ways to deal with problems that arise-- duplicates, nulls, poor performance, strange data relationships, etc., and it has a good discussion of functions and stored procedures as well.
Good luck with your new Oracle career. I wish our DBA knew SQL!
Everything I've ever learned the hard way was based on a statistically invalid sample.
So it's like Alton Brown's "I'm Just Here for the Data"?
Yeah, I can understand the nostalgia for a bookseller who "knows what I want" and "understands the community". But, reality check, Amazon recommendations churned out from a little AI magic and some freakishly large database just utterly destroy anything a minimum-wage retail staffer with no knowledge of my tastes can hope to accomplish. "Well what do you like?" "Hmm, I don't know, a bit of everything. I read a lot of fantasy or Tom Clancy.". "Well, in that case, can I recommend you go to our fantasy or Tom Clancy shelves?" (Gee thanks, why didn't I think of that.) Amazon *knows* that there is a British alternate history of the Napoleonic Wars fought with dragons that is right up my alley (incidentally: it starts with His Majesty's Dragon, and I'm *loving* it).
Community involvement... well... I'm all for teaching kids to read and exposing them to the classics. And I support libraries, churches, and schools that do. But bookstores are, well, poorly suited to the task. I also, how do I put this gently, sort of fail to be a member of the community the bookstore is representing on a regular basis? I live in Japan at the moment -- the most community-oriented mom&pop bookstore around here can't possibly be community-oriented and still be Patio11-oriented. If I got a job in San Fransisco, I'd be pretty bloody out of place as the Catholic Republican and some of my reading selections might not make a San Fransisco bookseller abundantly happy. Then there's just the practical limits of bookseller expertise and shelf-space: excuse me, Mr. Mom&Pop Bookseller, can you recommend me a good book for a twenty-something set in modern China written in English which is *nothing like* Shanghai Baby? Thats what I was looking for the last time I bought a birthday gift from Amazon, and you can browse yourself a dozen choices in less time than it takes to get to the head of the line to speak to the clerk.
Help poke pirates in the eyepatch, arr.
It is hard to know what is the right thing to do. Although, I want to support my local stores and community, I also don't want to use up trees. So is buying online and reading online better than buying a mass published book from a local store?
"umm... dude? SQL has been around since the mid 50's." ?
:-)
:-)
Why did the parent post rate informative?
"A guy at ibm," you say
The theory came out in 1970, so you're off by 15 years.
Off 20 years for an actual implementation (mid 70's).
Off 25 years for commercial product (79).
http://en.wikipedia.org/wiki/SQL#History
----
"anchronistic and irreperably flawed." ?
"Duuuude!"
What would you replace it with?
By ANY measure, relational databases are a RESOUNDING
success. They're as much a part of the computer world
as operating systems, networking, and compilers.
...The Art of SQL Injection. tough choice.
If you think
... 95% of developers see relational databases simply as a means for a persistent data store, but that's not what it was designed to do.
This developer doesn't. I prefer flat files, especially for storing large amounts of raw binary data.
But nearly every time I have a review for a design that uses flat files for persistent storage, the DB wonks have conniption fits and insist that I use a DB.
I think it's the DB enthusiasts that have the problem.
In the course of every project, it will become necessary to shoot the scientists and begin production.
Kinda snooty there, aren't we?
Snooty? No, not really. Elite? Absolutley.
One of the reasons I am not actually snooty is because I do not seek to look down from my elite platform, I seek to help people up to it. I am elite, but not elite-ist.
If only because it would make my own life a damned sight easier.
The essential problem is, and where the appearance of snootiness can come from, is that I first have to make people understand there is something above them. Sometimes, for some people, it takes a wack upside the head. If you do not understand that you do not know you will never seek to learn. To learn you must accept that someone else has superior knowledge. That you are in that limited respect their inferior.
And for some people joining the elite simply isn't within their capabilities. That isn't snootiness. That's reality. I'll never run hurdles like Edwin Moses either, no matter how hard I train; and that's the way it is.
But if I chose to run hurdles you can be damned sure I'd supplicate Edwin to give me some pointers, unabashed in acknowledging him as my superior.
KFG
You have it backwards. Traversing a graph is an iterative operation. RDBMSs are designed for set operations. Nested sets are much faster for reading than an adjacency list but are also much slower for updating. In a large, highly active tree, nested sets may not meet performance constraints.
.5 seconds for any node in the database. Selecting the descendants iteratively is faster only in the special case of few descendants and takes anywhere from .25 seconds to 20 seconds.
To put things in perspective, I have a nested set tree with a ragged hierarchy with 50k nodes. Node level security is in place. A worst case insert currently requires 1.5 seconds. Selecting the descendants of a node and filtering against the ACLs requires
The tree is read more often than it is updated so the nested set meets my performance goals in all current cases while adjacency lists do not meet my goals in some cases.
No, YOU'RE a towel!
I'd always know where I am if I were you.
KFG
That gives you the dupes; I was trying to eliminate them. This is perfectly easy to do in portable SQL --
SELECT DISTINCT fields FROM tables WHERE hairy join clauses and conditions
The problem wasn't that it didn't work; the problem is that the optimizer did something stupid and made it take forever. Because the result set was far smaller than the tables, the fastest way to do the DISTINCT query was to do the ALL query and then filter the results. But for some reason Oracle did something completely different, involving lots of full table scans. I had to write a hack to trick it into doing the right thing.