High Performance MySQL

← Back to Stories (view on slashdot.org)

Posted by timothy on Wednesday October 20, 2004 @06:30AM from the turning-the-tables dept.

swsuehr (Steve Suehring) writes "Knowing that Jeremy Zawodny works with MySQL in a highly demanding environment at Yahoo! Finance, I have to admit that I was somewhat anxious to crack open a copy of the book High Performance MySQL published by O'Reilly and written by Jeremy and Derek J. Balling. With such high expectations, it would be easy to be disappointed with the content or level of the material in the book. Far from disappointing, High Performance MySQL was exactly what I had hoped it would be and more." Read on for the rest of Suehring's review. High Performance MySQL author Zawodny & Balling pages 294 publisher O'Reilly rating Excellent reviewer Steve Suehring ISBN 0596003064 summary Optimization, Backups, Replication, Load Balancing & More

With a title like High Performance MySQL, the reader should not expect assistance with tasks such as installation or the basics of SQL. The authors acknowledge that the book isn't meant for the MySQL beginner. Even so, the book does spend a little time going over a refresher of MySQL. If the reader is looking for a book on the basics of MySQL, though, I'd recommend picking up a different title. I might be so brazen as to suggest looking at the book on MySQL that I wrote a couple years ago, but I'll leave it for the reader to find an appropriate title for their level of knowledge with MySQL.

The book is broken out into ten chapters, beginning with a chapter on the basics of MySQL as seen from someone with some level of MySQL experience. (And since I am someone with some level of MySQL knowledge, this chapter didn't seem to be too painful of a read.) That statement begs for a further explanation. I find it painful to read books that claim to have advanced topics but find it necessary to spend (or waste) time by giving yet another explanation of how to type ./configure; make; make install in the beginning chapter. In this regard, I find High Performance MySQL to be quite good at providing the appropriate level of explanation for the audience.

The first pages alone are a great starting point for those who are wondering about the performance as it relates to the different versions of MySQL. A discussion of the binary versions of MySQL versus the source code versions gives the reader a good understanding of the implications of this important decision. Choosing correctly between the versions of MySQL can help to maximize performance.

Though only the first chapter is titled "Back to Basics," most chapters include some basic or background information pertinent to that chapter. For example, both the query and index performance-related chapters include sections with the word 'Basic' in their titles. These two chapters, "Index Performance" and "Query Performance," cover topics often misunderstood when considering performance on a database. Naturally, any database administrator worth their pay (insert your own joke here) will tell you that an index can help to improve performance. But High Performance MySQL takes it a step further by diving into why you might use a particular index given your database needs. This type of practical information is common throughout the book. Rather than just give the reader a listing of the indexes (which are mostly common among RDBMS anyway), the authors choose to tell you why you might choose one method over another.

Other chapters within High Performance MySQL include chapters on server performance tuning, replication, load balancing, and high-availability, backup and recovery, and security. I've followed MySQL forums and mailing lists over the years and some of the more common questions relate to MySQL security. The chapter on security gives the reader a great explanation of the MySQL GRANT tables and includes a sidebar on the significance of 'localhost' in MySQL which nearly always seems to be a source of confusion, though maybe not for the reader of this particular book. Like other chapters, the security chapter includes tips for the underlying environment upon which MySQL operates. For example, regardless of how good the MySQL database permissions are configured, if another piece of software on the server isn't kept up to date then the entire server, including the MySQL data, could be compromised.

Looking back at the book, I might have laid the chapters out in a different order or provided more structure through sections of the book. However, this can be dismissed as personal opinion and is non-substantive anyway. I believe High Performance MySQL is an excellent choice for the MySQL administrator or developer not only looking to improve their database performance but also to improve his knowledge of MySQL.

You can purchase High Performance MySQL from bn.com. (And the book that Suehring alludes to is the MySQL Bible . Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

10 of 307 comments (clear)

Min score:

Reason:

Sort:

Check your SQL.... by tcopeland · 2004-10-20 06:43 · Score: 4, Informative

...before blaming MySQL! Ha ha!

Er, anyhow, as an apology, here's an open source SQL analyzer.

--
The Army reading list
1. Re:Check your SQL.... by tcopeland · 2004-10-20 07:05 · Score: 4, Informative
  
  > People can write some scary code and then
  > just blame it on the engine.
  
  Yup. There's a really good article by Stephane Faroult on OnLamp about writing better SQL... it's right here.
  
  After reading that article, I went through some code I'd written and found some places where I was using DISTINCT incorrectly in exactly the way he described.
  
  --
  The Army reading list
I bought it 3 weeks ago... by ttys00 · 2004-10-20 06:57 · Score: 4, Informative

...and was surprised at a two things:

1) The number of typos in the book. Didn't anyone proofread it before it went to print?
2) How many features MySQL lacks. Don't get me wrong, I like it as its fast and easy to work with for web stuff, but I was surprised at how often I read "MySQL doesn't do x yet, but will in version y".

The chapter on replication was worth the cost of the book for me, but I suggest people considering buying the book should wait until the 2nd edition.
Re:How does Yahoo! Finance use MySQL? by Anonymous Coward · 2004-10-20 06:58 · Score: 5, Informative

Why not? Yahoo finance has been using MySQL for the last four years.

"Yahoo! uses the MySQL database to power many of the services on Yahoo! Finance (finance.yahoo.com), a popular web site that provides the full spectrum of financial services and information. MySQL is behind Yahoo! Finance's high-traffic area of news headlines, which is aggregated from over 100 sources worldwide, as well as its stock charts, insider trading, SEC filings, conference calls and earnings reports, among others. "

"Prior to its switch to MySQL in 2000, Yahoo! Finance managed its database needs through homegrown flat files and Berkeley DB databases. But those solutions proved to be inflexible and not scalable enough for their needs, with the volume of data nearly doubling every few months. "Our needs vary considerably, and that is one reason that MySQL works well with us. It handles the high-volume, 260 million record tables just as well as the small, low-volume ones. One of our servers handled over a quarter of a billion queries in a month-and-a-half, and it still has capacity to spare.""

"Since Yahoo! Finance started using MySQL, the database's popularity has grown within the company, and many of Yahoo's other world-wide properties are now using MySQL too. "We have used MySQL far more than anyone expected," stated Zawodny. "We went from experimental to mission-critical in a couple of months. Once others saw it, they jumped on board.""
---

I wonder why many people here hate MySQL even when it has been proven to be a very competitive database. Is it because MySQL is a foreign product?
Re:It Sounds Pretty Basic by shirai · 2004-10-20 06:59 · Score: 4, Informative

The answer is yes it does show you how to setup massive high transaction systems.

Up until I picked up this book, I was not comfortable with using MySQL for the extremely high performance apps I have to write. Yes, they are web-based (meaning they fit into the MySQL appropriateness factor) but things like backups and scalability really scared me off of it.

This book talks about indexes but doesn't focus on them as the magic bullet. Rather it is very specific with lots of need to know items of importance.

Some things that it covers are:

1. How to setup replication with different architectures including master/slave, master/slave/slave (another level), replication rings, dual master replication with slaves and more.

2. RAID (software vs. hardware), RAID types, IDE vs. SCSI. I know. I'm sure you've thought of these things already but most people will find something new here because there is a lot on it. It is quite dense.

3. Selection of an OS. Seriously. It actually compares threading models on OSes compared to performance with MySQL. This is great stuff.

4. Selection of a filesystem on an OS. Yes, nitty gritty. Includes talk about journaling vs non-journaling benefits and warnings.

5. Load balancing mySQL. How to do this through a load balancer. Why and how it's different than load balancing web servers.

6. How to backup live using replication.

7. I love this: Detailed information on how the replication system actually works. I love understanding the inner workings because then I know how/why things went wrong.

8. The benefits of myISAM/InnoDB table types including how they store/retrieve data, the features available, etc.

I've been optimizing databases for web use for some time and this isn't a book you should be tired of because it claims "high performance." Okay, maybe I'm a geek, but this book kept me up at night figuring out how I should be architecting a system.

--
Sunny
Be my Friend
Re:mysql backup by fingon · 2004-10-20 07:12 · Score: 4, Informative

Check out LVM2 and snapshots.

In principle, using LVM 2 system it's easy. Get database to consistent state on disk, take LVM snapshot of the logical volume involved, resume database, run dump on the snapshot, take out the snapshot when no longer needed.

Beauty is that the size of snapshot is size of delta that is caused during the backup; therefore, this is practical for almost any sized database.

--
-- pending
Re:mysql backup by shirai · 2004-10-20 07:14 · Score: 5, Informative

Actually, lack of live backups was one of my biggest fears with using MySQL as well and the book actually does address this.

The caveat: You need to set up replication. This is a good idea anyways for a high performance system in case the master crashes and you need another system available quickly.

It works like this:

1. Set up replication.
2. Do a locked table backup on the slave.

The nice things are the details like how the new version of MySQL uses two threads on the slave. One to copy the transactions to the local drive and the other to apply the transactions. Because of this, your local transaction log is up to date even if the transactions haven't been applied which subsequently means you don't have missing transactions if there is a crash on the master during the backup.

This is seriously a good book. Okay, you might want to use PostgreSQL for its feature set but if you are doing write few read mostly with the web as the application, this book seriously gives you the confidence to use MySQL. Up until I read this book, I didn't know enough about MySQL performance implications to trust it for scalability. Now I know what I need to do to get it.

--
Sunny
Be my Friend
Re:It Sounds Pretty Basic by ca1v1n · 2004-10-20 07:24 · Score: 4, Informative

Sounds like you've been using PostgreSQL for quite a while, since your criticisms generally apply to rather old versions of MySQL. I hate all databases with equal passion, so I figured I ought to set the record straight.

No data integrity

MySQL has a modular design, allowing you to use any of several database backends, each with different design priorities. If data integrity is a priority for you, RTFM and pick the right backend.

Completely non-standard SQL

This is true of pretty much all DBMS's. MySQL's development cycle is quite delightfully quick, meaning that if people want certain features added, it doesn't take long to get them added.

No extensibility in the engine (functions, stored procedures, etc.)

Ummm... MySQL has those.

No subselects

Yes it does, as of version 4.1. It's not their fault if you're more than a full release behind.

Weird handling of '0' vs. null

The special handling of null forces you to write slightly less braindead code. This kind of handholding is becoming popular in the design of modern programming languages like Java and C#, with the recognition that the later in your development cycle you find a bug, the more it costs to fix it. Of course, there are still times when the programmer needs very explicit control, like when writing an OS kernel or a device driver. If you're writing an OS kernel or a device driver in MySQL, you've got bigger problems.

--
WARNING: there is a trojan on your
Re:Apples and oranges by SpamapS · 2004-10-20 08:30 · Score: 4, Informative

You PgSQL nazis are just being ignorant. Do you think because MyISAM isn't transactional that the whole database isn't?

MySQL's InnoDB offers the same level of data integrity as PgSQL does, and about the same performance. Where it wins is in the flexibility to choose table-by-table whether you want InnoDB, MyISAM, or the new NDBCLUSTER. There's also the universal availability and familiarity when it comes to the mysql protocol and libraries.

As far as performance.. you might want to look at numbers on well designed threading systems such as Solaris or Linux 2.6. Most of PostgreSQL's gains are erased by MySQL 4.1's excellent code when it is combined with those OS's proper threading.

The only place I see PgSQL winning the performance war is the query optimizer.

PgSQL will be doomed to a BSDesque life. Touted by too many people with no social skills, nobody will ever be able to see how good it is, because they just get shouted down when asking questions.

--
SpamapS -- Undernet #Linuxhelp
Re:Apples and oranges by boneshintai · 2004-10-20 11:25 · Score: 4, Informative

I'm going to forgo the opportunity to mod this fascinating debate to discuss this. You mentionned that PostgreSQL was "an order of magnitude" slower on imports -- my guess is that your import tool simply connected and started running INSERTs? It's a common-ish trap for new users. If there is no transaction running, each statement is its own transaction, with all the BEGIN and COMMIT overhead that goes with it. Wrapping your import in a BEGIN; ... COMMIT; pair would probably have sped it up considerably.

There is an option to change the transactionality of standalone commands, too, so that they implicitly start a transaction that doesn't end until you COMMIT it.

VACUUM is something of a contentious point among PostgreSQL users, certainly. On the one hand, I can't think of an algorithmic way to automatically schedule statistic updates, and apparently tracking them in real time causes more of a performance hit than is needed. On the other, having to set up an external task (either a cron task or vacuumd) rather than being able to configure postmaster to vacuum itself on a schedule or other parameters is kind of a pain.

PostgreSQL's default memory cache size is exceedingly small -- this is an issue with the default configuration that can lead to excessive disk reads and writes.

In short, pgsql does require somewhat more effort to initially configure and maintain. On the other hand, for a lot of people it's very much worth that effort.