Web And Database Synchronization?
crazney asks: "I am developing a medium sized Web site with a friend, and we have stumbled across a small problem. Our hosting will consist of several Web servers load balanced, and possible two or three database servers. The extra database servers may be used for load balancing or for redundancy. My question is, what can we use to keep these up to date? We would like to ensure that all the Web servers always contain an exact same copy of the Web page, and any changes to the database affect the others. These servers may be geographically separated, so something low bandwidth would be helpful. For the Web servers, a quick and dirty way could be CVS, but I'm stumped for the database servers (which will probably be running PostgreSQL)."
How many transactions will be enacted on the database each [ minute | hour | day | week ] whose need to be posted to the other servers will be critical?
If this number is small or non existent then would it be possible to keep a log of transactions (ie. store the SQL query) and run them against the other DB servers??
"I can't buy want I want because it's free. Can't be what they want because I'm me." -Corduroy, Pearl Jam
If you'd rather do it the 'old fashioned' way, what is the query load like? If the data is mostly static, you could send all the writes to 1 database and use all the DBs for reads. Every 2 or so hours, update the read-only databases. Something like rsync would be very good for that. Until erServer is released, this may be your best choice.
There's been some discussion about this on the PostgreSQL mailing lists - check the pgsql-general archives @ postgresql.org
Not sure if this has anything to do with what you're doing as MySQL is an entirely different product than PostgreSQL, and you may be trying to avoid it based on its own well documented pitfalls/features/etc.
The URL for the story on setting up MySQL replication is: http://www.phpbuilder.com/co lum ns/tanoviceanu20000912.php3.
Hope it helps.
If you want cheap replication and a consistent db, your best bet (for now) is to use pg_dumpall to make periodic backups of the db, transfer them (perhaps using scp and host-based authentication), and then use psql to restore them. To see how the dump/restore process works, look at the man page for pg_dumpall. You can likely do this all from a cronjob.
If you do it with stock pg_dumpall, you will probably need to take the site down for a few minutes or else risk odd service interuptions. However, "pg_dumpall -d" will dump tuples as SQL insert statements; this will allow you to put the data in a running db. Be aware that the dump/restore process can be quite taxing (especially on a production db where you likely don't want to turn off fsync() calls for the bulk copies). I don't know the requirements of your application, but you'll likely want to strike some sort of compromise between speed and up-to-date correctness.
HTH.
~wog
Just a note: I only refer to MySQL since I've never dealt with PostreSQL.
One of MySQL's solutions to replication (before it added replication features) was an option to keep update logs which contained all INSERT commands it ran on a DB. To update a slave database you would simply feed the mysql client the updatelog.
A practical solution could be cycling the update log every hour, then having the slave databases fetch the hour update, etc...
Again, I have no idea if PostreSQL supports this, but just for reference here's the MySQL manual links:
21.1 Database replication with update log
21.3 The update log