Slashdot Mirror


Linux Directory Replication?

okie_rhce asks: "With cheap Unix-based server clusters becoming all the rage, what are people using to replicate content to server farms? Networked File Systems can be a single point of failure and tools like rsync are not real time."

1 of 8 comments (clear)

  1. Volume replication and clustering by Cato · · Score: 4

    I recently investigated this for Solaris, and I'm sure some of the products will be on Linux as well. Also, check the Linux-HA page at http://www.linux-ha.org/.

    The basic idea is to do clustering: two machines that share a single (virtual) IP address, in addition to their real IP address. You then have clustering software that detects a failure in the other machine, or in an application, and fails over to the other machine, starting the applications that failed, and doing a gratuitous ARP to bind the virtual IP address to the MAC address for the surviving machine.

    Client apps must be able to re-connect to the server, since their TCP sessions are dropped when the primary server crashes. TCP state failover as in some firewalls (e.g. FW-1 from Check Point) would be very handy but I don't know any OSs that do it.

    This requires a shared disk subsystem - initially this is usually SCSI, with two controllers and software that can handle this. As systems grow, they tend to migrate to SANs (storage area networks), usually based on Fiber Channel - this is very fast, as you might expect, and can be built using FC switches, so your SAN can be redundant, as well as your servers. You would of course need RAID 1 or 5 in your disk subsystem.

    The next step is to do volume replication - this can be nearly instantaneous (you have a choice of synchronous replication, which slows down every update transaction on the main server, or asynchronous replication, which is a little less safe). The trick here is to make sure that the volume replication software can buffer updates during times when the secondary server/disk is not available - otherwise a single failure stops all transactions...

    Finally, global cluster management involves failing over between geographically separated systems - this would require the client apps to know how to switch to a different IP address, though you might be able to rig something up with load balancing technology.

    This is a horribly complex area, as I discovered, and it's not simple to get it right. There are many techniques I have not covered - good sites to read up on are veritas.com, logitech.com, sun.com (search for NDR and Sun Cluster), technet.oracle.com (Oracle focused but covers many options), and of course linux-ha.org.