Slashdot Mirror


PostgreSQL vs. MySQL comparison

prostoalex writes "Ever find yourself wondering which open source database is the best tool for the job? Well, wonder no more, and let your tax dollars do the work in the form of Fermi National Accelerator Laboratory publishing this unbiased review of MySQL vs. PostgreSQL. After reading it, however, it seems that MySQL ranks the same or better on most of the accounts." My poor sleepy eyes misread the date of posting on here; caveat that this is more then 15 months old.

5 of 390 comments (clear)

  1. Why we moved from MySQL to PG by punker · · Score: 5, Interesting

    This almost seems like the same comparisons we've been hearing for years.
    1) Postgresql is more full featured than MySQL
    2) MySQL is faster in a read-mostly environment
    That's pretty much the same as the anecdotal arguments have been for years.

              In my job, we moved from mysql to postgres several years ago (around PG 7.0). At the time, we needed to make the move for performance reasons. We are in a read-write system, and MySQL's locking was killing us (this was before InnoDB was well established). The features are better too, as our developers were used to having data integrity features, server side programming, and all of the SQL92 constructs available. We also learned a bit about PG performance, which I'll share.

    1) Run EXPLAIN ANALYZE on everything. Postgresql is touchier about query performance than MySQL was. This just needs to be a habit if you're using PG. (You really should do performance analysis no matter your DB. It's just a good practice). The biggest gain will be making sure you're using index scans rather than sequential scans.

    2) Use persistent connections. Everyone likes to point out the forking issue with PG vs. MySQL's threaded. PG's connection handling is slow, there's no doubt about it. But there's an easy answer. Just limit how often you connect. If you can keep a connection pool, and just reuse those connections, you'll save this big hit.

    3) Full vacuum and reindex regularly. We've found the docs to be a bit off on this. It indicates that you should run these occasionally. If you're in a read-write system, a full vacuum on a regular basis is very important. It really doesn't take that long if you do it regularly. Also, we've had trouble with indexes getting unbalanced (we see 50->90% tuple turnover daily). This has gotten better, but it doesn't hurt to let your maintenance scripts make things ideal for you. So, we run a full vacuum and reindex of our tables nightly through cron.

    4) Get your shared memory right. PG's shared buffers is probably the most important config attribute. It controls how much of your DB is memory resident vs disk resident. Avoiding disk hits is a big deal for any DB, so get this right. If you can fit your whole DB in memory, then do it. If not, make sure your primary tables will fit. The more you use the shared memory, and the less you have to page data in/out, the better your overall performance will be.

    Most DB systems seem to be read-mostly, so I can understand the performance comparisons focusing on that. In our read-write system though, the locking was the biggest issue and it tilted the performance comparison toward PG.

  2. more recent benchmarks by darekana · · Score: 4, Interesting

    http://tweakers.net/reviews/657

    They compare PostgreSQL 8.2 vs MySQL 4.1.20 and MySQL 5.1.20a.

  3. Postgres For Larger Datasets by Hornsby · · Score: 5, Interesting

    I had to make a decision recently between Mysql and Postgresql for a database composed of many tables with greater than 50,000,000 rows. While going through the decision making process, I loaded a sample table with 50,000,000 rows to do some benchmarks. The first thing I had to do to run my tests was index the table. I started with Mysql using a InnoDB table type. I had both database servers relatively tuned to the hardware they were running on. I ran the create index with MySQL and detached my screen session. I came back several hours later to find MySQL was doing something along the lines of INDEX via REPAIR SORT. After some reading, I learned that this takes an order of magnitude longer than building an index the "normal" way and is caused by the index becoming corrupted during the creation. Okay... so, I restarted this process several times and encountered the same problem. This is clean data mind you that has already been exported from an existing SQL server. I duplicated my install on a second server and had the same problems. Very annoyed with MySQL, I gave Postgres a try. It worked on the first time in less than 25 minutes without issue. Since then I've been using it on 250,000,000 row datasets without issue. It's always reliable, and as long as you remember to use CURSORS for huge SELECT statement, it's painless to work with.

    --
    A musician without the RIAA, is like a fish without a bicycle.
  4. Re:Foreign Keys by anothy · · Score: 3, Interesting

    good clarification. but still: does this really hold water? i mean, unless they're using a modified GPL or whatever (is the GPL open-source?), they don't really get to dictate the terms. "derivative work", for example, isn't something they get to define. if i'm shipping a closed-source app that simply requires MySQL be also installed, and not something that's statically linked to their binaries, where's the "derivative work" come in?

    --

    i speak for myself and those who like what i say.
  5. Foreign keys are an enterprise feature by alienmole · · Score: 4, Interesting

    I suspect you're thinking of relatively simple cases where you have a single application working against a database. Which of course is the standard MySQL scenario.

    Where foreign keys and the other referential integrity features really shine is in true enterprise scenarios, when you may have hundreds or thousands of applications, written in multiple languages, working against the same shared database(s).

    In that scenario, the only viable way to duplicate the functionality of foreign keys at the application level is to have a middle layer which all other applications are required to go through. Realistically, that middle layer has to be implemented as a server, serving requests for object/record creation, update and delete over the network. Implementing it as a library to be linked into applications doesn't work well, because there are multiple applications accessing the database, and integrity enforcement needs to be centrally coordinated.

    Implementing a middleware data server for an application isn't all that difficult, but integrating it into applications can be. Most application development environments know how to talk to databases, but don't automatically know how to talk to your application-specific, language-independent, data server. So now you're writing a client library for each app dev platform used in the enterprise, and dealing with things like integrating your custom interface with data-bound controls in the user interface. BTW, this is where people start resorting to e.g. SOAP, and projects start going off the rails (no pun intended, Ruby fans).

    Luckily, as it turns out, there are already standardized, widely-available, well-supported systems that implement a centralized data serving service which enforces referential integrity. They're called databases. And foreign keys are an essential part of the service they provide.