Slashdot Mirror


Distributed Databases?

yamla asks: "I am interested in learning about distributed, fault-tolerant databases. That is, a database (not necessarily SQL) where the data is spread out (not replicated) amongst a large number of computers and furthermore, any reasonable number of those computers could disconnect or reconnect at any time without making it impossible to retrieve the stored data. I think this is a far more interesting problem than peer-to-peer because, provided such a solution scales, it would seem to solve the decentralised peer-to-peer problems. It would also seem to open up all kinds of new applications which we have hardly begun to think about yet. So I'm interested in good places to go to read up on (potential) solutions."

2 of 12 comments (clear)

  1. Slightly simplier problems by coyote-san · · Score: 3

    A slightly simplier problem can give you insights into possible solutions. How do you manage a distributed file system? That is, something that looks like a single file system but can operate and recover from partitioning?

    There are a couple solutions, but all (iirc) in turn ultimately reduce to an even simplier question: how do you manage distributed messaging? A classic example of this is "buy" and "sell" orders in a distributed stock exchange - there needs to be some way of ensuring that all parties can agree on <b>the</b> ordering of all messages. Disagreement on ordering can have major ramifications since it can affect the price paid, possibly even whether the stock was obtained at all. Likewise, ordering of file systems reads and writes can determine what gets written to disk and/or what gets fed to running applications.

    Once you have that, you can start looking at recovery issues in filesystems. IIRC, all come down to a question of how many systems you write data to, and how many systems you read data from. When you read data you'll often get multiple versions of the same information (because of update latency) and you need to know how to determine which is the most current.

    The two extremes are "everyone has everything" (total replication) and "only one server has each item" (multiple independent and disjoint servers). Depending on expected loads (esp. the ratio of reads to writes) you might see a policy of reading from a third of all systems, writing to 2/3 of them. No system will have all data, but all will have most.

    All of this points to an unstated assumption in your question. "Distributed" means more than one thing - to someone who has studied algorithms they usually refer to designs that maximize availability despite network partitioning (e.g., line cuts or court injuctions against some servers). These algorithms require substantial, if not complete, data replication.

    To many people, "distributed" also means what we would call "partitioned" algorithms where multiple sites work on a small part of the problem and the results are combined later. Examples are factorization efforts and SETI-at-home. These algorithms don't require replication, but they are highly vulnerable to partitioning.

    What problem are you trying to solve with this distributed database?

    --
    For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
  2. ummm... by nomadic · · Score: 3

    Wait, if they're not replicated, how could you get data from a machine that's been brought offline?
    --