Open Source Database Clusters?
grugruto asks: "A lot of open source solutions are available to scale web sites with clusters but what about databases? I can't afford an Oracle RAC license but can I have something more reliable and fault tolerant than my single Postgres box? I have seen this recent article that looks promising for open source solutions. Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?"
For what it's worth, the commercial solutions are hard to setup, unstable and terribly difficult to maintain, and this is after a small fortune has been invested in making them work. Not to knock the open source solution, but it's hard to beleive that something that is infrequently used and difficult to understand will be truly production quality if you want to use it for money.
MySQL has very nice replication functionality, and, in certain circumstances, you can even set up replication rings. It is somewhat flexible about the topology you choose to use, so pick the one best for your application. Load balance ala DNS and you're in business.
Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?
They don't compare to commercial products. I know it isn't what you want to hear, and there are hundreds of kids here to tell you different, but they just dont compare. Those kids database experience doesn't extend past an address book.
Even if you manage to get them to technically keep up, transaction wise, to Oracle or SQL Server, the ACID enforcement isn't there, the syntaxes are kludgy. Gack.
My company ships products with SQL Server or Oracle as the back end. I've tried to put together an OSS solution so I could impress the big boss with millions of bucks of saved license fees. They just aren't anywhere close IMO.
Run a SQL Server farm on the back end if you cant afford an Oracle license. Don't be an OSS idealogue in the business world, you end up unemployed.
I don't need no instructions to know how to rock!!!!
Open-source or not...
I would say just get a bigger box for your PostgreSQL solution and do semi-realtime remote replication on the tables you dont want to lose.
I browse at +5 Flamebait- moderation for all or moderation for none.
HA is always crapshoot/tradeoff between cost and risk. Throw enough $ at the problem and you'll approach 100% availability.
I know that 'more robust' is a nice thing to want, but you really need to think about what you really need. If it takes 15 minutes to switch over to a backup copy (using some magic RAID disk mirroring maybe?) and 15 minutes to restart the app and let it checkpoint it's way up to a decent operational speed again, is that good enough?
If it takes an hour, how about that?
How much time/heartache or money is it worth for you to have system downtime, and how much are you willing to expend to reduce it by 5, 15, 30 minutes?
So, there's really a continuum of availabilty you have to pick your point in. At the low end, you have no backups and recreate everything from scratch. At the high end you use Vendor X's real clustering solution and 24x7 monitoring, then have zero downtime even in a disaster. Somewhere in the middle is you.
Now I realise this an overtly commercial view of things, but if needs be replace money with effort and season to taste.
/* affect != effect */ void affect(int *thing,int effect) { *thing += effect; }
If you're working with enough data that would require a CLUSTER, then I would suggest a commercial product.
But if you need that SPEED, but not a lot of data storage, I'd say a decent sized MySQL cluster would cover you, depending on what your needs are.
If you are in the position to actually need a cluster to do that much work, you should be able to get something commercial and more large-scaled oriented
Error 407 - No creative sig found
Two options I haven't seen anyone mention yet are PostgreSQL eRServer 1.0+ (see PostgreSQL news item "PostgreSQL now has working, tested, scalable replication!" from August 28, 2003 or a lengthier press posting "PostgreSQL, Inc. Releases Open Source Replication Version") and Backplane.
eRServer has been in development for over two years, is used in production settings and is released under a BSD license (as with PostgreSQL). It uses a single master/multiple slave asynchronous replication scheme. There are cautions in the release that replication may be difficult to setup.
Backplane seems to be particularly well-suited to clustering data quickly across a WAN. A quote may explain it better:
I haven't used either yet, but you may wish to give them a look.
check out prevayler
Seems like an excellent alternative to the traditional database route (though I myself have not yet used it in an application)
Here is a developerWorks article about Object Prevalence (of which prevayler is an implementation): An introduction to object prevalence
Oracle is giving away a cluster filesystem (so they can sell RAC on linux) there is OpenGFS as well for filesystem usage.
I was saying "Wow! Oracle has released a clustered filesystem!", until I discovered it only works with shared-storage. Meaning it won't create a filesystem image across a cluster network, where data is distributed. But rather the cluster filesystem is stored in a centralized location, but can be accessed by multiple members of the cluster at the same time for both read and write.
At any rate, many "big kids" are using the most unfairly bullied product, slandered most likely because it is a software boy-named-sue, MySQL. Why not have a read before taking childish pot-shots:
http://www.mysql.com/press/MySQL_userlist.pdf
In the end this silly "I'm a big boy because I use oracle and your a little gurly kiddie because you don't" bullshit is just empty bravado. Businesses generally attempt to find the most cost effective means to meet a need and often Oracle ends up being like buying a stealth fighter to deliver a pizza. It often just doesn't make sense even for a big kid with billions of dollars, which might be why the $30B+ multinational BASF uses PostgreSQL.
Frankly, after the named-user license Oracle sold the State of California, no matter how idiotic the clearly comatose contract negotiators were, one would be remiss to not consider other companies with slightly less egregious behavior on record.
Are your slaves using InnoDB or MyISAM? If the latter have you checked if the 30+ second delays are caused by the replication thread begin blocked by a read lock? The course grain (table level) locking of MyISAM make its impossible to get consistent behavior under load in a read/write environment (even if the only writer is the replication thread). Use InnoDB instead, it is amazing.
It got to the point where the slave servers (P4, 2GB RAM, Hardware RAID) could not keep up with the Master replication _and_ service SELECT queries. The data was too big for RAM (filesort) and the drives were not fast enough (2 drives mirrored). The Master is dual PIII 2Ghz, 2GB RAM , and fast RAID 5 hardware.
I ended up solving the problem with a hardware upgrade. I replaced all 4 servers with 1 Quad-Opteron 1.8GHz, 16 GB RAM, and _VERY_ Fast RAID 10 across 9 fast drives.
Please feel free to check it out. For the first time in a long time, I'm not affraid that the MySQL server will be the bottlenect in this very dynamic web site.
We use Linux, Apache 1.3.x, MySQL 4.0.x, and PHP 4.x to build the pages and generate XML to our Flash MX applications.
Superdudes.Net
Flash heavy signt. Free registration required to access the coolest features (those which beat up the MySQL server).
If you have a widely replicated, multi-master, database and the replication network becomes partitioned, it's impossible in general to fully resolve the inconsistencies when connectivity is restored.
There are particular instances of applications where a specific replication solution might be adequate. But it really does depend on the application requirements.