Web And Database Synchronization?

← Back to Stories (view on slashdot.org)

Web And Database Synchronization?

Posted by Cliff on Friday November 17, 2000 @11:11PM from the keeping-it-all-together dept.

crazney asks: "I am developing a medium sized Web site with a friend, and we have stumbled across a small problem. Our hosting will consist of several Web servers load balanced, and possible two or three database servers. The extra database servers may be used for load balancing or for redundancy. My question is, what can we use to keep these up to date? We would like to ensure that all the Web servers always contain an exact same copy of the Web page, and any changes to the database affect the others. These servers may be geographically separated, so something low bandwidth would be helpful. For the Web servers, a quick and dirty way could be CVS, but I'm stumped for the database servers (which will probably be running PostgreSQL)."

11 comments

Min score:

Reason:

Sort:

More Questions... by supine · 2000-11-17 18:26 · Score: 1

How many transactions will be enacted on the database each [ minute | hour | day | week ] whose need to be posted to the other servers will be critical?

If this number is small or non existent then would it be possible to keep a log of transactions (ie. store the SQL query) and run them against the other DB servers??

--
"I can't buy want I want because it's free. Can't be what they want because I'm me." -Corduroy, Pearl Jam
1. Re:More Questions... by crazney · 2000-11-18 03:31 · Score: 1
  
  Thanks for your reply..
  The server database will probably be written to several times per minute, which are crytical.. Henc that idea probably doesnt work..
  Thanks, crazney.
  
  "Who is General Failure and why is he reading my hard disk ?"
  
  --
  stuff
erServer! by nconway · 2000-11-17 21:47 · Score: 3

There's a project to add replication to PostgreSQL. From what I've heard, a public beta should be out fairly soon. It's at www.erserver.com.
If you'd rather do it the 'old fashioned' way, what is the query load like? If the data is mostly static, you could send all the writes to 1 database and use all the DBs for reads. Every 2 or so hours, update the read-only databases. Something like rsync would be very good for that. Until erServer is released, this may be your best choice.
There's been some discussion about this on the PostgreSQL mailing lists - check the pgsql-general archives @ postgresql.org
1. Re:erServer! by wezell · 2000-11-18 20:25 · Score: 1
  
  Taking a quick look at the erserver.com site, it doesn't look like this is going to be an open sourced project. Depending on your resources and/or your politics, this might suck. In fact, I think it does. Does anyone have any information as to whether erserver is going to be totally opensourced?
2. Re:erServer! by nconway · 2000-11-18 22:47 · Score: 1
  
  Well, it's being developed by the same company that pays for the time of most of the Postgres core team. From what I understand, most of the code will be initially open sourced, but some of it (the really enterprise level stuff I guess) will need to be purchased. Once that has been out for a while, that is also going to be open sourced, and included with the main Postgres release.
  I'm happy we're getting replication at all. The companies that really need it won't mind paying for it, and the money goes into further PgSQL development. And (AFAIK) it's all going to be open source eventually.
MySQL has added replication ... by trexl · 2000-11-17 23:15 · Score: 1

And it's pretty/really easy to set up. Just a couple of changes in the configuration file. I've never done the replication of the database across anything with less bandwidth than all machines on the same local switch, but it appears to be rather reliable. Some of the slaves that I've played with have been down for days, and when they database is restarted ... they pick up replicating where it left off.
Not sure if this has anything to do with what you're doing as MySQL is an entirely different product than PostgreSQL, and you may be trying to avoid it based on its own well documented pitfalls/features/etc.
The URL for the story on setting up MySQL replication is: http://www.phpbuilder.com/co lum ns/tanoviceanu20000912.php3.
Hope it helps.
do NOT use rsync by woggo · 2000-11-18 00:35 · Score: 3

to transfer database files. Basically, a file corresponding to a database table, index, etc. does not necessarily correspond to a consistent database at any given time. The file you transfer might have uncommitted data, data from a transaction which has aborted, index entries corresponding to deleted tuples, or worse. (You might not even get all of the committed data; much of it remains in the buffer pool for some time.)
If you want cheap replication and a consistent db, your best bet (for now) is to use pg_dumpall to make periodic backups of the db, transfer them (perhaps using scp and host-based authentication), and then use psql to restore them. To see how the dump/restore process works, look at the man page for pg_dumpall. You can likely do this all from a cronjob.
If you do it with stock pg_dumpall, you will probably need to take the site down for a few minutes or else risk odd service interuptions. However, "pg_dumpall -d" will dump tuples as SQL insert statements; this will allow you to put the data in a running db. Be aware that the dump/restore process can be quite taxing (especially on a production db where you likely don't want to turn off fsync() calls for the bulk copies). I don't know the requirements of your application, but you'll likely want to strike some sort of compromise between speed and up-to-date correctness.

HTH.
~wog
1. Re:do NOT use rsync by nconway · 2000-11-18 01:27 · Score: 2
  
  I was thinking more like rsyncing pg_dump files. Considering that the vast majority of the output will be the same between synchronizations, this would be much more efficient than using scp - if you need security, you can run rsync over ssh.
  You're right, of course. rsyncing the actual database files while the DB serving is running is incredibly dumb.
2. Re:do NOT use rsync by nneul · 2000-11-19 05:37 · Score: 1
  
  Nothing wrong with using rsync on the database files, you just have to do them at the right time.
  
  For example, on oracle, you can do this by putting the databases in backup mode, which temporarily freezes the contents of the files, and writes all updates to redo logs. Using rsync on the db files while in backup mode would be fine. Of course, I've found that when dealing with huge (multple gigabytes) amounts of data, rsync doesn't help much.
Replication via Update Logs by o+mandarin · 2000-11-18 18:58 · Score: 1

Just a note: I only refer to MySQL since I've never dealt with PostreSQL.

One of MySQL's solutions to replication (before it added replication features) was an option to keep update logs which contained all INSERT commands it ran on a DB. To update a slave database you would simply feed the mysql client the updatelog.

A practical solution could be cycling the update log every hour, then having the slave databases fetch the hour update, etc...

Again, I have no idea if PostreSQL supports this, but just for reference here's the MySQL manual links:

21.1 Database replication with update log
21.3 The update log
1. Re:Replication via Update Logs by duffbeer703 · 2000-11-19 08:41 · Score: 1
  
  That's a very hackish solution, with far too much room for failure. What this guy should be doing is purchasing a real database engine (ie Oracle, Informix, DB2, etc. While this is very expensive, so is having data corrupted or destroyed using some el cheapo solution.
  
  --
  Conformity is the jailer of freedom and enemy of growth. -JFK