High Availability Solutions for Databases?

← Back to Stories (view on slashdot.org)

High Availability Solutions for Databases?

Posted by Cliff on Monday November 14, 2005 @02:57PM from the in-search-of-redundancy dept.

An anonymous reader asks: "What would be the best high availability solution for databases? I don't have enough money to afford Oracle RAC or any architecture that require an expensive SAN. What about open source solutions? MySQL cluster seems to be more master/slave and you can lose data when the master dies. What about this Sequoia project that seems good for PostgreSQL and other databases? Has anyone tried it? What HA solution do you use for your database?"

2 of 83 comments (clear)

Min score:

Reason:

Sort:

Re:MySQL Cluster != master/slave by afabbro · 2005-11-14 15:31 · Score: 4, Interesting

The only real downside to the architecture required for CLuster is that all of the data is stored in RAM based tables
...for an "only real downside" that's a pretty big one. I mean really, what sort of database is this - 256MB? 500MB? 1GB? Fine for small websites, not fine for large apps. I don't mean to be a "big shop" snob, but this is a ridiculous limitation.
Unfortunately, open source hasn't caught up to the big guys yet in the area of replication.

--
Advice: on VPS providers
Nice "question" by photon317 · 2005-11-14 16:06 · Score: 4, Interesting

Hello, anonymous Sequoia promoter seeking free advertising. (BTW, You might try picking a product name that normal people can spell without thinking about it).

Your solution is not database clustering, and should not be advertised as such. It's more a long the lines of a database connection proxy which supports multiple simultaneous backends and operating on them in parallel, with some added features to make HA-like solutions relatively easy.

The downside of this style of approach, as opposed to an architecture of the likes of Oracle "RAC", is that it doesn't scale up as you add backend nodes (at least not for writes, but in any case for read-only scaling there are simpler solutions for all of the vendors, even the free ones), and it must have limits on how many transactions it can backlog and replay to a temporarily-unreachable/down server before that server has to be re-synced from scratch in order to catch back up (and I have to wonder if there's really any real-world scenario under real transaction load in which the practical net effect wouldn't be a complete resync of a backend server anytime something goes wrong with it, in which case one could throw out any attempt to backlog transactions for a single failed server and just keep things simple - you fall out of sync, you resync).

The open source world really needs a RAC-like solution for PostgreSQL and MySQL (I'm a fan of friendly open-source competition, so while my personal preference is PostgreSQL, I hope both projects stay current and popular for many years to come). Unfortunately there is unlikely to be a generic way to do this, it will probably have to be re-invented for each database project.

I took a brief look around PostgreSQL's guts a while back, and it actually seemed like the architecture they use isn't far off from something RAC-capable to begin with, just nobody's quite buttoned together a few peices here and there to make it happen. Basically on SMP multiple co-operating backends already serve parallel requests and synchronized on a shared memory cache. There's patches out there for the linux kernel to support network-synchronized distributed shared memory. Put two and two together, and what do you get? Something not far off from a first-pass hack at a RAC-like network-distributed database caching system. Most of the other details are easy to solve (start/shutdown, join/leave cluster, tracking of processes across the cluster, etc), or belong in another problem domain (implementing shared storage filesystems (hey, we have GFS, Oracle OCFS, etc available...)). One of the biggest issues would be multiple nodes all having pg "Writer" processes. The first step would probably be to put the writer on one node and failover the writer functionality when that node dies, to be quickly replaced by a scheme whereby multiple writers can work by synchronizing through a distributed lock manager (there are already dlm modules available for linux). Then there's the issue of making the current distributed shared-memory patches do the right thing performance-wise for this kind of usage, and so on. It's not easy, but it's not outside the realm of possibility.

--
11*43+456^2