MySQL Database Design and Optimization
This book focuses on MySQL 4.0/4.1 but also gives consideration to v.3.23 users as well as a nod toward v.5. The layout of each chapter gives a description of the topic of the chapter, followed by the meat of the chapter, a summary and what's next (how the context of this chapter ties into the subject of the next). There are numerous "notes", cautionary flags, tips, screen shots, code examples as well as thoughts from each author that provide explanatory asides to the content. The authors also provide references to other volumes, as needed.
A glance through the table of contents will give the reader a precise overview of what to expect in this book: Review of MySQL Basics; MySQL Column and Table Types; Keys, Indexes and Normalization; Optimizing Queries With Operators, Branching and Functions; Joins, Temporary Tables and Transactions; Finding the Bottlenecks, MySQL Programming; and Looking Ahead.
Chapter 1: Review of MySQL Basics gives a very quick (under 50 pages) summary of how to connect to the MySQL server; MySQL's identifiers and naming conventions for databases, tables and columns; a review of MySQL's syntax, writing basic queries and using basic commands (create, drop, select, insert, update, delete); and a discussion of the use of table, column and expression aliases. This section, while adequate, is clearly intended as an analysis of core information necessary to proceed to further chapters.
Chapter 2 follows with MySQL Column and Table Types, which deal with datatypes and structures used to store the data. The goal here is to help the reader design effective tables (and therefore create a well-designed and efficient database) suited to the particular type of data at hand. Numeric types are covered in depth; strings, the null value, ENUM and SET are also addressed as well as common "gotchas" and developer errors.
Keys, Indexes and Normalization come naturally in Chapter 3, with optimal data handling the goal: the chapter addresses getting data in efficiently and getting the results out efficiently, eliminating redundant data, appropriate uses of indexes and common index creation errors.
The core of the book is clearly Chapter 4, "Optimizing Queries with Operators, Branching, and Functions." Here, optimization skills are honed; manipulation and filtering of data is one of MySQL's strengths and this chapter shows the reader how to replace less-than-ideal program logic with SQL constructs to precisely adjust query performance. There's a good demonstration here of outputting a list of member data to a web page. The ultimate goal in this chapter is to provide the reader good skills that translate into better efficiency and faster database interaction. As the authors point out, one obvious logical consequence of this is easier migration between platforms and programming languages.
The next reasonable step is to look at additional features that MySQL has up its sleeve that will save the developer time and effort in the overall scheme of application development. Chapter 5, "Joins, Temporary Tables, and Transactions" discusses three of these additional features. The authors carefully point out that each of these eliminate excess queries needed to pull data, decrease code overhead, minimize the need to store data as application logic, decrease the number of bugs that appear in code and help guarantee data integrity (an aspect of database design that unfortunately often takes a back seat to other priorities as developers are often not concerned with the validity of data in a real world sense; i.e. from the user's perspective).
Chapter 6, "Finding the Bottlenecks," addresses modifying system configuration variables outside of the default and how these can dramatically affect performance. The authors look at some available free tools that help monitor server performance and enable configuration changes including mytop, WinMySqlAdmin, phpMyAdmin and the new MySQL Administrator (available from MySQL AB). MySQL caching capabilities and the ability to decrease repetitious read/writes to disk (good table, key and query caching within MySQL) are discussed. Finally, database interoperability and abstraction layers are mentioned in terms of performance penalties vs. making life easier for the programmer.
MySQL Programming is the topic of Chapter 7, where a very good discussion of the MySQL API is provided. There are a lot of useful examples in this chapter covering many of the common MySQL APIs available (PHP's MySQL and MySQLi, Pythons's MySQLdb, ODBC, Perl's DBI), along with feature discussions and examples.
The final chapter, "Looking Ahead," examines MySQL v.4.1, 5.0 and 5.1 and some eagerly awaited new features, including stored procedures, stored functions, views and triggers.
This is a well-rounded volume on MySQL design. There are excellent examples and the flow of the text is conversational without being rambling and unstructured. The authors have obviously taken great pains to minimize tangents and extraneous information; pithy, but with sufficient detail in mind. The reader is left with neither the sense of being overwhelmed nor longing for an explanation for a glossed-over topic. This book is pretty much a "must have" for a MySQL programmer looking to bridge the gap between novice and professional.
You can purchase Beginning MySQL Database Design and Optimization from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
...SQL Performance Tuning is an excellent book. It has a lot of good discussion on when to use certain SQL contructs and how to check your database to ensure you're actually getting improvements.
PLUG: Which SQL queries are taking the most time? PQA home page, download.
The Army reading list
right here
___
If you think big enough, you'll never have to do it.
3) It's probably already available at your hosting provider, whereas Postgresql probably isn't (vicious cycle with #1)
Wow! So I guess other databases don't get Slashdotted? That's impressive! (note sarcasm).
MySQL is popular because its easy
You say that like it's a bad thing... is something better if it's difficult to use?
And as an aside, I worked at a biotech. We had an 80 gig database with a quarter billion entries. We were running platform neutral SQL and used MySql, Oracle and MS-Sql server.
We ditched the "commercial grade" databases because they were to slow! MySql did a great job under enormous load.
Agile Artisans
MySQL is not easy. Need a subquery? You're out of luck, pal. A FULL JOIN? Sorry. Check constraints? No dice. Throw an error instead of taking a bad guess when you've fatfingered an input? Oops. And God help you if you need some trigger logic for auditing purposes, because MySQL AB certainly won't.
What part of "A well regulated militia" do you not understand?
mySQL may not be appropriate in a mission-critical situation, but that does not make it bad for all situations; if you need speed in prescedence of everything else then mySQL is probably the right tool for the job. If you need data integrity, ACID compliance etc then PostgrSQL, Oracle etc are the right tool for the job.
Outright saying one or the other is a POS only makes you look stupud.
There's mischief and malarkies but no queers or yids or darkies within this bastard's carnival, this vicious cabaret.
OPTIMIZE TABLE `tablename`
h tml
or you can look here
http://dev.mysql.com/doc/mysql/en/OPTIMIZE_TABLE.
The RFDs (Request for Discussions) for both PostgreSQL and MySQL are on news.group. In about one month, both groups will be voted on, if it passes, the groups will be found under comp.databases.*.
& Su rveyID=36
If you want more information, visit news.groups with your usenet server.
Right now, there aren't ANY postgresql or mysql groups under the big 8 comp. domain.
Remember to stay tuned for the CFV so they get voted into the domain! Here is a nice web poll you can take to voice your support of the groups getting into the big 8 usenet hierarchy:
http://scripts.postgresql.org/survey.php?View=1
Vote yes, so they know there is support for a big 8 comp.databases.postgresql newsgroup as one does not exist yet!
Then why not use PostgreSQL? For companies it's cheaper (MySQL licence for commercial use is almost as much as the basic Windows 2003 Server Licence), and for open source lovers it's even more free.
And that of course ignores the fact that it's also technically superior - faster, proper data integrity, and real SQL (sorry, but for an SQL programmer, the kiddie pool syntax available in MySQL is The show stopper)
MySQL is the very thing open source is supposed to be against - software being used just because everyone else is using it, without regard for if it actually has any merit. MySQL is the Windows of the database world.
MySQL is popular because it is known and because every $20 a month web hosting service supports it. Period. It's not any easier than other databases... the fact that MySQL supports a limited subset of SQL and only allows you to use the basics doesn't mean that more powerful databases don't also let you use the basics.
At least they finally added support for subqueries. That was always the nail in their coffin as far as I was concerned.
-1 Uncomfortable Truth
If you are in the UK you can get the book here for 10% off and free delivery up to Dec 25th :)
A few are floating around for £20 as well.
Insert "MySQL isn't a real database so this book makes no sense" comment here.
INSERT INTO mysql_sucks (comment) VALUES ('MySQL isn't a real database so this book makes no sense');
I'm sure "SlashdotMedia" will improve on all the wonders that Dice Holdings blessed us all with
escaping is good.
Need a subquery
Not everyone does
FULL JOIN
Again, not everyone does
Check constraints
Still don't, or the application can
trigger logic
You know what I'm seeing a pattern here. I guess it seems that while MySQL doesn't do everything you want, it does enough of what other people want so it would seem that at least to other people, MySQL does not suck.
As far as not throwing errors, we've either used different versions, or your talking about something I haven't come across, since I've seen MySQL throw plenty of errors.
"I use a Mac because I'm just better than you are."
Don't forget stored procedures. MySQL doesn't support them (I think the new version is going to, though). On PostgreSQL you have their own SP language or you can write the procedures in C... also, I think transaction support is not as good in MySQL as in other databases, but I don't know for sure.
Oh, and sequences. MySQL has the auto increment property for a field, but doesn't have sequences that can be independent of a table.
Go hug some trees.
Don't you want to start with just database design and SQL before you would want to move to a book about a specific RDBC implementation? If it is just about database design then the title of the book is wrong.
Then again, if you wish to explain about setting up the database itself, access rights and so on, then the book might be for beginners. Once again, the title would not fit the book.
As anyone should know, the steps in software development are: get it working, get it right, get it optimized. Let's hope that the book does not go to deep into the optimized part in a too early stage.
How is this a rant? Are you saying the problems they list don't exist?
I like that site cause it contains no spin: it just lists the facts and provides references to the documentation. Is it the facts that bother you?
___
If you think big enough, you'll never have to do it.
1) I don't think you're required to buy a MySQL licence unless you need other licence requirements or support from the company.
2) You have any proof that Postgres is faster than mysql? Everything I've seen indicates that mysql is still *slightly* faster than postgres. Most of the systems I've worked with I haven't seen much of a difference. Personally I only use Postgresql for everything, but on other systems mysql seems to be about the same.
I have started a company and am in the process of putting my precious mission critical data in a MySQL database. RAID 1 is used to prevent loss of data, and apart from that DVDs will be burnt using the superdrive.
My database app is low volume data traffic to/from the db.
Can anyone tell me where MySQL fails when it comes to mission critical stuff? I'm very eager to find out.
Bert
How true. For instance, emacs is a much better piece of software than vi.
taken! (by Davidleeroth) Thanks Bingo Foo!
A reply to #1 (because I have no idea what #2 is really like, I've heard stories both ways)
MySQL with the release of version 4.0 changed the licence of it's libraries from LGPL (commercial code can link to these libraries) to GPL (commercial code cannot link to these libraries, anything that does becomes GPL). Any program you write that uses the mysql libraries is considered a derivative work by the GPL. So unless you write your own mysql libraries (to the undocumented wire-protocol of mysql 4.0), you have to buy a license for anything not open source.
This was the biggest reason for my move to Postgres. I make my living building web applications for people, my clients get the source code so that they can make modifications, but not the right to redistribute the source code (unless I use anything GPL, then they get the full GPL licence). So it's nice to be able to use a proper database server, and support the developers of that database server without having to open the source of everything I create. (Most things I create however, would just be another useless entry on freshmeat, so I don't feel too bad about not contributing these things back)
First of all, I like and use MySQL. I have it running on two public web sites and I use at home for my intranet. It is fast, easy to use, and works well for my purposes. It may not be perfect, but I think it is good. Anyway, on with the reply:
Constraints, triggers, and stored procedures are important to have in a database. While I can make up for MySQL's shortcomings in application code, this is not a good idea. I should not have to validate data in the application. If I could put that code in the database, it will be more robust. Not only would that provide a single point to validate from the application, it would also provide validation for scripts and console access.
Full joins and subqueries help, but in a different way. I should not have to execute two queries to get one result set (minus intermediate data). Queries should be atomic -- one piece of code that executes together. For example, let's say I have two queries that could be combined with a sub select. In between the queries running, something changes the data, invalidating my intermediate values and second query. While the underlying data is fine, it appears to the user that there is data corruption. The second query could return incorrect data or just error out for no valid reason. Sure, most nested selects are not necessary, but sometimes they are. Full joins are not usually necessary, but in about 0.0001% of the queries they are.
Maybe for the typical blog site these features are not all that critical, but for enterprise-class databases they are. The database application I use at work is bad enough in Oracle (poor DB design), it would fall apart at the seams in MySQL. Constraints, triggers, etc. help keep it working until we can reengineer it.
24 beers in a case, 24 hours in a day. Coincidence? I think not!
> > Need a subquery
> Not everyone does
but when you do, you are Sh^Hadly Out of Luck.
> > FULL JOIN
> Again, not everyone does
see above.
> > Check constraints
> Still don't, or the application can
I hope you'll pardon my putting this so bluntly, but this is an absolutely classical n00b mistake. In my experience, a useful data store doesn't have "the" application or "the" interface. It has several to start with and eventually grows many of them. That is why the data store itself has to maintain any data integrity/business rules inviolably. The alternative is to keep n (for some large n) code bases perfectly in synch and functioning exactly the same way.
> > trigger logic
> You know what I'm seeing a pattern here. I guess
> it seems that while MySQL doesn't do everything
> you want, it does enough of what other people
> want so it would seem that at least to other
> people, MySQL does not suck.
It doesn't do enough of what people need when they're doing things that involve money, for example. There just isn't any way to use MySQL for an application that requires Generally Accepted Accounting Practices (GAAP) short of building a true relational database as middleware.
> As far as not throwing errors, we've either
> used different versions, or your talking about
> something I haven't come across, since I've
> seen MySQL throw plenty of errors.
See the "gotchas" page, so often quoted here, re: the failure to throw errors when it's supposed to.
What part of "A well regulated militia" do you not understand?
I assume you only code in assembler?
Agile Artisans
Why do people keep saying MySQL doesn't have transactions or subqueries? For heaven's sake, please update your 4-year-old facts!
Oh, and PostgreSQL is actually better for ANSI SQL compliance. It's one of the most SQL-standard compliant database I've ever seen (certainly more standard than Firebird). Call me when Firebird has TIMESTAMP WITH TIME ZONE datatype.
Database design should be a generic RDBMS book for the most part. It does not make much sense to repeat table design techniques and philosophy for each RDBMS product. (However, giving vendor-specific tips and limits is understandable.) It might be cheaper to purchase and write a generic book about table design because it can be written and printed for multiple products. Then again, many publishers simply copy-and-paste semi-generic topics with slight custom tuning.
Table-ized A.I.
Views, synonyms and referential integrity (foreign key constraints) would be very nice too.
When I find out why VHS became more popular than technically superior Betamax, I'll figure out why Mysql is more popular than Postgres.
Aside from taking advice from an AC who says something "sux" (real technical term there. Care to back it up?) There are no "auto incrementing" datatypes in postgresql, only default nextval('somesequence'). Note the "default" there.
Before calling this "broken" ask yourself how you'd insert a row into a database with two columns having the same number, without having an external object to get that number from. I use this all the time in billing systems that deal with families. One field for the ID, one field for the ID of who gets the bill, usually the same. In postgresql I can use nextval('sequence') for the first field and lastval('sequence') for the second, in a single command. In MySQL InnoDB only appears to support a single AUTO_INCREMENT column, so I'd have to insert the record, locate the record and read its id, then update the record with that id in the second field.
Not only that, MySQL can only count by one, unlike postgres which can use whatever number is next in the sequence: positive, negative, or even zero.
The lack of referential integrity means you cannot guarantee the state of the data in the database, unless you run a whole bunch of queries to check.
If there is only one way to get data in, through one application that can enforce the integrity on entry, and check it on retrieval, you may be OK. As long as you know nobody will ever be updating it directly in the database.
Hi, I'm one of the authors of the book.
:)
1. Thanks to Mary for the positive review.
2. Thanks to Mike Hillyer for his invaluable help with the book. Say what you like about Visual Basic (I happen to loathe it, myself), Mike's an excellent programmer, and his knowledge of MySQL is superb. In fact, part of the way through the process of writing this book, he was hired by MySQL AB to work with the teams developing the Connectors and the new GUI tools. His site VBMySQL.com provides a valuable and unique resource for VB and other Windows developers wanting to build DB applications who'd like to use an actual database instead of Access and don't feel like condemning themselves or their users to paying for SQL Server. Rather than flame him for his language and platform choices, you should commend him for introducing many Windows programmers to an Open Source technology. (BTW, you might be interested in knowing that he also uses Linux and programs in C++ as well.) It was a privilege to have him work on the book with us, and it's a privilege to work with him now at MySQL AB. And he's a damn good writer.
3. We wrote the book because there's a lot of MySQL installations out there, and a lot of very badly done MySQL databases. Granted, there are some things that MySQl isn't (yet). But it is fast and stable -- or can be. And it's certainly possible to throw those advantages away through poor DB and application design by people who don't know the difference between a database and a spreadsheet or who don't know how to leverage SQL to do their heavy lifting for them. We chose not to spend a great deal of time with enforcing foreign keys because a great many administrators are still running MySQL 3.23 and don't bother to make InnoDB available. Besides, if you expect people to understand key constraints, you have to get them to normalise first, and many devs don't even do that.
4. We wanted to encourage PHP developers to make the transition to ext/mysqli as soon as possible.
5. I don't know what other people may have experienced with Apress, but they've been damned nice to me, and I can tell you that Gary Cornell does answer his email, even when it comes from a lowly writer who's not yet even signed a contract. Speaking of which -- their contracts are much better than Wrox' or Wiley's. And since I've been associated with them, they've dumped at least one bad editor and another one that I'd heard some not-so-favourable things about.
6. While we didn't cover this in the book, fans of Postgres might wish to take note: We already have a working Cluster implementation, and we're anxious to see what yours will look like.
Il n'y a pas de Planet B.