Slashdot Mirror


Web And Database Synchronization?

crazney asks: "I am developing a medium sized Web site with a friend, and we have stumbled across a small problem. Our hosting will consist of several Web servers load balanced, and possible two or three database servers. The extra database servers may be used for load balancing or for redundancy. My question is, what can we use to keep these up to date? We would like to ensure that all the Web servers always contain an exact same copy of the Web page, and any changes to the database affect the others. These servers may be geographically separated, so something low bandwidth would be helpful. For the Web servers, a quick and dirty way could be CVS, but I'm stumped for the database servers (which will probably be running PostgreSQL)."

3 of 11 comments (clear)

  1. erServer! by nconway · · Score: 3
    There's a project to add replication to PostgreSQL. From what I've heard, a public beta should be out fairly soon. It's at www.erserver.com.

    If you'd rather do it the 'old fashioned' way, what is the query load like? If the data is mostly static, you could send all the writes to 1 database and use all the DBs for reads. Every 2 or so hours, update the read-only databases. Something like rsync would be very good for that. Until erServer is released, this may be your best choice.

    There's been some discussion about this on the PostgreSQL mailing lists - check the pgsql-general archives @ postgresql.org

  2. do NOT use rsync by woggo · · Score: 3
    to transfer database files. Basically, a file corresponding to a database table, index, etc. does not necessarily correspond to a consistent database at any given time. The file you transfer might have uncommitted data, data from a transaction which has aborted, index entries corresponding to deleted tuples, or worse. (You might not even get all of the committed data; much of it remains in the buffer pool for some time.)

    If you want cheap replication and a consistent db, your best bet (for now) is to use pg_dumpall to make periodic backups of the db, transfer them (perhaps using scp and host-based authentication), and then use psql to restore them. To see how the dump/restore process works, look at the man page for pg_dumpall. You can likely do this all from a cronjob.

    If you do it with stock pg_dumpall, you will probably need to take the site down for a few minutes or else risk odd service interuptions. However, "pg_dumpall -d" will dump tuples as SQL insert statements; this will allow you to put the data in a running db. Be aware that the dump/restore process can be quite taxing (especially on a production db where you likely don't want to turn off fsync() calls for the bulk copies). I don't know the requirements of your application, but you'll likely want to strike some sort of compromise between speed and up-to-date correctness.


    HTH.
    ~wog

    1. Re:do NOT use rsync by nconway · · Score: 2
      I was thinking more like rsyncing pg_dump files. Considering that the vast majority of the output will be the same between synchronizations, this would be much more efficient than using scp - if you need security, you can run rsync over ssh.

      You're right, of course. rsyncing the actual database files while the DB serving is running is incredibly dumb.