Slashdot Mirror


Database Replication?

xcaster asks: "I've been working with Lotus Notes for several years. Although Lotus Notes is horrible software, it wins in one feature: replication. I've researching all major RDBMS for a while, and found that even though they offer replication support, none provides easy-to-use, flexible support for disconnected-user-oriented replication. Do real solutions for replication for disconnected users exist out there?" It would be interesting to see how well the major free RDBMS engines stack up when it comes to replication. Which RDBMS software can support replication out of the box, and for the ones that do not, what changes would they need to make to support it?

1 of 7 comments (clear)

  1. Replication: don't do it (and why) by woggo · · Score: 4
    That's a little extreme, of course, but replication is a very nasty problem, and there are very good reasons why people don't do it. See "The Dangers of Replication and a Solution" (also published in the 1996 SIGMOD proceedings) by Jim Gray and others for more information.

    Basically, there are two ways to preform replication: "lazy replication" and "eager replication". "Eager replication" means that all updates are atomic across all nodes and that transactions are serializable. However, the problem with "eager replication" is that as you increase the number of nodes n, the probability of deadlock increases on the order of n^5. The "solution", such as it is, is to remove the expectation of serializabilty, using timestamps for concurrency control, only allow commutative transactions on your data, and use two-tier replication. This works for banks and others whose database applications consist mainly of commutative transactions, but won't for many others: YMMV. (Gray's paper also details the differences between having a single "master" node that "owns" all db objects and having each node own several objects.)

    IIRC, the way Notes does it is by queuing updates at the local node and using an optimistic concurrency control mechanism when the local node connects to replicate. This is great for the application domain that Notes caters to: I "own" my own calendar, and if I'm out of the office (and have my node -- notebook -- with me), you can't schedule me for an appointment until I come back. However, for many application domains, this won't work.

    In any case, that's why Notes does it -- because it can, thanks to the nature of its data domain -- and why most people don't -- because it's hard/impossible for the general case.


    ~wog
    PS -- If you can't get into the ACM Digital Library, check out these lecture notes from Stonebraker's anthology at Berzerkeley.