Open Source Database Clusters?
grugruto asks: "A lot of open source solutions are available to scale web sites with clusters but what about databases? I can't afford an Oracle RAC license but can I have something more reliable and fault tolerant than my single Postgres box? I have seen this recent article that looks promising for open source solutions. Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?"
We've been evaluating the Emic application cluster for MySQL and have had pretty good results. It's a new product (so YMMV), but it looks promising.
Emic Networks
If you're using Java, you might want to check out the Clustered JDBC project
-D
Cube On! (http://stores.ebay.com/PuzzleProz)
And most websites don't. Unless you need true transaction support then this solution will work. If you need transactions and clustering then you probably need to find a way to pay for an Oracle setup.
You can make a High Availability cluster out of most any software if you have some kind of shared storage.
People have used firewire drives connected to two different computers to accomplish this cheaply. Oracle is giving away a cluster filesystem (so they can sell RAC on linux) there is OpenGFS as well for filesystem usage.
Just write some basic monitoring scripts that will bring up your postgress database on the second server should the first one fail. Just make sure those scripts completely take down the old database on the first server in the case of a partial failure. Having two databases try to open the same data would be a really bad thing.
Here are some links to articles that should help:
Overview
Howto
Cluster Filesystem
These are mainly geared for Oracle/RAC, all you need is the firewire shared storage and cluster filesystem. You're on your own to write the monitoring and failover scripts. Hope this helps. --Chris
Well, the .ORG domain runs on PostgreSQL + eRServer, so that's one scalable solution ...
Off-topic, but it's Baling wire. As in the wire used to hold together bales of hay and whatnot.
Using one server as a master and n servers as slaves. Just make sure to write everything to the master. Replication to the slaves generally takes about a second or maybe two depending on load.
OK, not quite the same thing but this works quite well for ready heavy applications, and is very reliable unless you get a slave out of sync.
This was on v3.n.n - the good folks at MySQL have made many improvements to the replication facilities in the 4.n series I believe.
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety" - BF
there are basically three type of clusters:
1) shared nothing: in this, each computer is only connected to each other via simple IP network. no disks are shared. each machine serves part of data. these cluster doesn't work reliably when you have to aggregations. e.g. if one of the machine fails and you try to to "avg()" and if the data is spread across machines, the query would fail, since one of the machine is not available. most enterprise apps cannot work in this config without degradation. e.g. IBM study showed that 2 node cluster is slower and less reliable than 1 node system when running SAP.
IBM on windows and unix and MS uses this type of clustering (also called federated database approach or shared nothing approach).
2) shared disk between two computers: in this case, there are multiple machines and multiple disks. each disk is atleast connected to two computers. if one of the computer fails, other takes over. no mainstream database uses this mode, but it is used by hp-nonstop. still, each machine serves up part of the data and hence standard enterprise apps like SAP etc cannot take clustering advantage without lot of modification.
3) shared everything: in this, each disk is connected to all the machines in the cluster. any number of machines can fail and yet the system would keep running as long as atleast one machine is up. this is used by Oracle. all the machine sees all the data. standard apps like SAP etc can be run in this kind of configs with minor modification or no modification at all. this method is also used by IBM in their mainframe database (which outsells their windows and unix database by huge margine). most enterprise apps are deployed in this type of cluster configuration.
the approach one is simpler from hardware point of view. also, for database kernel writers, this is the easiest to implement. however, the user would need to break up data judiciously and spread acros s machines. also adding a node and removing a node will require re-partitioning of data. mostly only custom apps which are fully aware of your partitioning etc will be able to take advantage.
it is also easy to make it scale for simple custom app and so most of TPC-C benchmarks are published in this configuration.
approach 3 requires special shared disk system. the database implementation is very complex. the kernel writers have to worry about two computers simultaneously accessing disks or overwriting each others data etc. this is the thing that Oracle is pushing across all platforms and IBM is pushing for its mainframes.
approach 2 is similar to approach 1 except that it adds redundancy and hence is more reliable.
What about SAPDB isn't it a potential choice. I thought I read somewhere that MySQL and SAPDB were merging. Chech it out http://www.sapdb.org/
I've been running a 3-4 node MySQL 3.23.x cluster on Slowlaris 9 since January. It has survived several catastrophic power outages and numerous other insults without a hiccup. Load is fairly light (about 3,000 updates daily and a similar number of queries on each server) so YMMV.
Actually its "baling twine", they don't use wire for it as the wire would cut through the materials being bound. And twine is cheaper.
Transaction problems on a single database? That explains a lot. I bet you used MySQL. Sounds pretty typical for it.
Ever tried PostgreSQL?
Less is more !
ZEO will allow you to scale the ZODB (Zope Object Database) across multiple processors, machines, and networks. Although the ZODB is a Python object database, so it's probably not an option to port your current database. There are other limitations of the database - it's not always the fastest, it's an object database so concepts like foreign keys are not fully there, but it can give you high availability. As of new Zope 2.7 in beta though, ZEO is quite easy to set-up, and it is open source.
Another hard bit is that the Postgres replication doesn't support sequences - see the details in the aptly named "Things to Remember" section of the installation documentation.
So if your master fails, presumably you have to recreate the sequences starting at a number high enough to avoid conflicting numbers before switching over to a slave. Seems like this could be a problem.
Nonetheless, Postgres is cruising away on RubyForge; 300,000 records and counting...
The Army reading list
PostgreSQL has released their replication technology under an open source licence.
--Lawrence Lessig for Congress!
deviantart.com, IIRC, runs about 3 mysql servers behind a load-balancing cache/server, so have had to deal with a lot of the difficulties involved in that.
Use this link to the article instead:s p
Database Server Clash Revisited
http://www.eweek.com/article2/0,4149,1238712,00.a
Trusted Computing FAQ | Free Dawit Isaak!
I love PostgreSQL sequences - I think definatly a feature I would miss.
PGSQL stores it's sequences seperate from the tables. When you need a sequence, you can have PGSQL create one at table creation time, or you can link your table to an already available sequence.
The sequences are under your full controll - you can reset them, roll them back or set them to any value you want.
Clever things you can do with this:
Set one server to have high squence numbers and another to have lower ones - in the middle of the night they can swap data that they have dominion over. This poor-man replicaiton works well over flaky internet connections. Imagine a remote office with a 56K modem that swaps over in the middle of the night.
When you need to update the entire database to take advantage of another type of data abstrations - you can have your new style of tables use the old sequence numbers as well. The older tables grow and the new tables grow - but they never share a sequence number. The every night you can import the newer sytle of table with the old style data - you can let some users use the old front end, and test the new front end and the entire migration can be seamless.
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
This is just as off-topic as the grandparent, but having grown up around plenty of farms I've used and seen baling wire more than a few times. It is indeed actual wire, (galvanized steel according to one site that still sells it), and it works just fine for tying up a bale of hay. Some people still use it, but it has mostly been replaced by plastic twine, as it is indeed cheaper and easier to use. Here's a link to a fellow waxing sub-poetic on the bygone days of baling wire - Read All About It. And this is a site that sells baling wire, and has a few pictures - REC.
Anti-social? My code is just platform-specific.
Nah, our DB totals only about 6 GB. Slashdot isn't an especially big database.
Its only claim to fame is that it delivers about 30 dynamic pages a second, 12 hours a day.
Slashdot runs MySQL db on a couple of boxes. Check the FAQ and the IRC interview log. According to the FAQ, Slashdot is / was financially contributing to replication in MySQL.
Vino, gyno, and techno -Bruce Sterling
There is no 64 bit version os MS SQL
Bullshit, it's been out for months, see This article. As to the rest of your argument check out TPC-C results and say that MS SQL doesn't scale, it's the second highest scorer and has 6 of the top 10 results. This is a real world load testing benchmark that many companies base purchasing decisions on. (ok the MS solutions are a little unusual in that they are shared-nothing but the other competitiors are free to do likewise).
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Don't know how DB2 ICE would do compared to Open Source soloutions but take a look at the interesting results of the recent TPC-H benchmark performance testing on Clustered and non-Clustered 100GB and 300GB configurations. It appears that the IBM DB2 Integrated Cluster Environment (DB2 ICE) for Linux is heads above the rest.
Now that I have, it's pretty cool and quite stable. We've tested transparent failover a few times (once due to an instance failure) and nobody notices. Amazing.
In my opinion, it's worth the cost. We'll have to agree to disagree with open source solutions. For those that can't afford it, I suppose the alternative is the better solution.
Some of us who compare OSS databases to commercial ones have experience that extends past address books. And no, I'll pass on the DSW if you don't mind.
My main problem with PostgreSQL is the query optimiser. Oracle's query optimiser is definitely superior as Postgres occasionally comes up with some peculiar query plans. In a product I'm involved with, we hand tune our SQL from the ground up, so this is less of a problem for us. I find the two products to be pretty comparable in other aspects, though I haven't tried Postgres-R yet.
I haven't played with MySQL since back when you couldn't do sub-SELECTS, so I have no idea how much it's progressed since then.
At this stage, I'd suggest you stick with a commercial product for replication or clustering for high end work. Clustering and replication is still the bleeding edge for OSS, so use it with caution on non-critical tasks. Having said that, these are complex tasks you're talking about, and even the commercial products have their own peculiarities at times. High volume replication using Oracle materialized views over database links comes to mind.
Just because you're paranoid doesn't mean they're NOT after you.
Here you go.
The part you are probably interested in is this: Note that if you decide to "ring" your server setups, then you are not necessarily helping distribute the load, you are simply creating redundant masters in the case that your primary machine becomes unavailable. Also, you'll have to write your own monitoring scripts. MySQL says they are working on some tools for this... I'm excited to see what they come up with.
I moderate "-1, Fool"
Bah, DNS isn't load balancing.
LVS + MySQL works really well. We've got grouped clusters of databases that we can allocate more/less resources to as needed. Reporting cluster for the slower queries, faster cluster for the real-time queries and a few specific application clusters.
Replication keeps them in sync but there isn't a good HA solution available for the master database yet. Perhaps in MySQL 5.0. In the meantime, use DRBD + heartbeat for near HA.
I don't do this for karma, I do it for cash. It's much better.
Note that if you decide to "ring" your server setups, then you are not necessarily helping distribute the load, you are simply creating redundant masters in the case that your primary machine becomes unavailable.
Not necessarily. The largest part of most database access is reads - searching, retrieval, etc. This often times vastly outnumbers writes, depending on the application. Reads do not have to be replicated, giving a big performance boost.
Ouch, sounds like you should have gotten an experienced dba to set it up for you. DB2's too complex to go with simple defaults, and clustering is definitely a high-skills endeaver.
As far as insert loads go, we've seen 500 rows / second on five year old hardware without any problems. Although that's far short of what DB2 is capable of, it's fine for a sustained load. Beyond that batch loads hit 15,000 rows per second easily on the same box.
And as far as pricing goes, today you could get DB2 Express for those little dual-cpu boxes for just $500. A really fast four-way will cost you $32,000 - still way shy of $100k. You don't need to hit that kind of pricing unless you're doing inter-partition parallelism. And as I mentioned above - that's just not worth doing unless you've got the right skills to pull it off.
Here is a description of a Cluster created on MySQL with Linux boxes - similar to Google. http://www.dwreview.com/Product_Reviews/Review_Dat a.html
and http://www.dwreview.com/Data_mining/Intelligent_Da taMining.html
I maintain a site that does a fair bit of traffic (Daily avgs: files served = 1.8 Million, bandwidth = 20 Gigs)
We have 1 "master" MySQL server which gets all updates and inserts, etc. We have 2 "slave" servers which each take a signifigant portion of the select queries. All machines run the same 4.0.x version of MySQL. (Web access is PHP on Apache) All machines are dual x86s packed with RAM.
Setting up replication is pretty easy. And for the most part things are pretty nice. The load average drops a lot on each machine when we add a new slave. (Oh don't forget to enable query caching.)
We have had some problems though. Because the site gets so much traffic sometimes queries take a while to run and to propagate to the slave servers. This means if you update your data (via the master) and then do a select from one of the slaves your change may not show up yet. For most web apps this might not seem really big.
But it leads to the web users changing things and not seeing the results right away. So they figure the site is "broken" and they repeat what they just did only to have it take place twice. If you have your browser "refresh" the page first usually the data has come through but many people don't do this. The result is they don't feel their account has been credited or something. These kinds of bugs are hard to track down too.
I wrote a program to check repetatly (sleeping from 1/4 to 1/25 of second in between) and the slaves were almost always in perfect sync with the master. (as per MySQL's binary log position indicator). That was really impressive however there are times when the servers are under load that the slaves will be out of sync for 30 to 60+ seconds! (Measuring in the tens of thousands of byte offest differences in the binary log position.)
The solution we've been using is that any time there is an update to the database and the imediate page seen next by the user relies on the changed data we do the selects from the master server. This seems to work for now but I'm not sure how long we will be able to scale this way.
In summary so long as the laod on the machines stays around 1.0 or lower everything runs pretty smooth. If the loads hit 3 to 5 or higher then people notice (or rather mention) that things seem odd. (By the way those are linux load averages which IIRC is different than under Solaris.)
What I would like to see is a virtual server type system where one machine accepts all queries and hands them out to a set of replicating servers without requiring the application to know about it. This is nice for developing applications but the real reason is the master can then prevent the syncing issues discussed above.
SF
Enterprise database solutions are quite varied. Is it a data warehouse or something financial or ???
You pick the right tool for the job. I've seen massive databases on Sun Enterprise E-6500s and Oracle do a LOT if the database is properly configured. But one structure doesn't work for all applications. Do you use stored procedures? Do you index? Do you require triple replication to reindex one system have a backup and a live production system? Do you need remote fail-safe operation? These types of questions need to be answered before you settle into one solution.
Banjo - The more I know about Windoze, the more I love *nix
unless your software is gpl.
Our fast four way was under $32k, I threw in the price of the x300 storage array we bought with it and the 4 CPU licenses for DB2 EEE 7.2 we bought for $11.5K each (half off the SRP).
Our db2 does over 15,000 rows/second in BATCH mode. It was a sad day when we had to log our transactions to text files for batch processing.
We did end up hiring a good DBA to help us with our DB2. It's worth noting that we didn't have this extra cost or need with our PostgreSQL setup.
I'm curious about anothers experience with DB2 on Linux, as I assume you're running on. Tell me, what versions do you run? What kernel? What kind of reliabity do you get?
We initially ran DB2 on a Redhat 7.3 setup with a severely modified kernel, 2.4.15 I think it was. We went to RHAS 2.1 with the RedHat kernel after so many stability problems in DB2. The new version didn't fix the problems, it only threw in a slew of new problems relating to the hardware. Our new setup is Gentoo 1.4 with a 2.4.21 kernel. It runs much faster, sees all HT enabled processors and throws no APIC errors, and hasn't crashed.
So, what are your experiences with DB2 on Linux. If you're not runing on Linux, what are you running on?
My Linux Command of the Day site : LCOD
Almost right. MySQL is free to use in a commercial application, its not free to distribute or embedd in a commercial application.
MySQL is dual licenced, and one of those licences is GPL. You can use mySQL for free anywehere and in any manner that conformed to the GPL.
According to the FAQ it supports clusters/high availability of several types (towards the bottom), has Oracle 7 compatability, and has the option to upgrade to commercial support (something available for Postgres, MySQL and most others as well). It's got an install base of users used to large environments and has been reasonably proven in the field. Just a thought.
I have personally installed, setup and maintained a 5 (3 slaves, 1 master/slave, 1 slave/master) node cluster using Heartbeat and MySql replication. It works great!! My guess is that 80% of MY Mysql usage is content and needs READ-ONLY access. So I have 3 slaves that are used in a Read-Only cluster. The master is one of 2 other machines and ALL WRITES go to it. In the event of a MASTER db going down, the remaining slave promotes itself and updates the other slaves to point to itself. Been working great fo 8 months!!!
You need a license only if you choose to distribute the software. If this is an in-house application, simply obtain copies of MySQL Standard/Max (GPL) directly from MySQL mirrors for each server. Since you do not perform the distribution, you are in good shape (see MySQL License Policy - Licensing -2).
However, the folks at MySQL AB are very decent folks who offer great support and warranty for their product and who have to feed their families, and licenses are cheap. IMHO, buy at least one license for a master and one for a slave. That way you get support for the program in each role.
Check Mnesia DB from Erlang package. It's not relational, but has high-availablility replication, conflict management, etc. It's reliable and tested. By Ericsson.
Good license.
http://www.livejournal.com/community/lj_maintenanc e/60984.html
- I am made of meat.
sorry, maybe it is just me, but the whole "ARRGG IT AIN'T ACID" is a lot of hype to me. ACID boils down to transactions. plain and simple.
Perhaps you need a deeper understanding.
ACID tends to be a knee jerk reaction, and most people realyl need to be askign themselves what it ACTUALLY buys them.
It buys them a database that you they can expect to still be there, sound and consistent, after the machine blows a fuse in the middle of 200 simultaneous updates. It buys them a database that doesn't accumulate rot over time because somebody deleted a customer at the same time somebody in another city entered an invoice. It buys them queries that give the right answer, because of only ever seeing the database in a consistent state, even while other queries running at the same time are only partially completed.
Basically, it gives them a database capable of completely correct operation, not just mostly correct. Of course that may not matter to you, in that case I have a faulty pacemaker to sell you.
Have you got your LWN subscription yet?
But if you care about your data so much that you are seriously going into replicated systems, the couple of most popular free packages at least aren't there yet even in basic ACID reliability.
What are you talking about? PostgreSQL has supported ACID reliability for years.
Plus, PostgreSQL also now supports replication, the same as the one that PostgreSQL, INC. has been selling as an add-on for years (they finally opensourced it).
Another database that I would check out is SAPdb. SAP originally created it to be a competitor to Oracle, so that their customers wouldn't have to buy Oracle databases (read: pretty complicated setup, but worth it). But now they've opensourced it too, and as far as I know, it supports replication. And in the next release when MySQL takes over (Q4 2003, it'll be renamed MaxDB, and MySQL will be working on the code as well as SAP), it will have a proxy available so that you can just use MySQL database drivers to access it.
It would seem that the mySQL zealots have failed to address the real issue: clustering.
Sure, mySQL supports transactional tables but it does not support two-phase commits. Without distributed transactions, how can you build an ACID cluster? Replication is not the same as clustering!
And, I'm sorry, but mySQL does not come close to Oracle. I manage a large, distributed J2EE application running on mySQL (4.0.x). mySQL *still* does not support sub-selects, stored procedures, views, or dynamic tables (SELECT a, b, c FROM foo) AS bar(a, b, c). It does not support MINUS. It does not support constraints (e.g. column x must be greater than 15). It is not as fault tolerant as the commerical solutions. I could keep going if you would like? Yes, 4.1.x supports some of these things. Yes, PostgreSQL support some of these things too -- but it has other flaws.
There is a reason Oracle (and SQL Server, and DB2, etc.) costs a lot of money. As much as I love open source, if my project could afford it we would be running Oracle. Not because I like Oracle (it has its faults too, it is a resource hog), but because none of the open source solutions meet my needs as completely as it does.
Another thing -- did anyone had a look at SAPdb and Interbase? They are Free too and there's not much talk about them. Are they useable? Do they provide replication?
No, that does NOT conform with the GPL.
Yes it does. Where in the GPL does it say that the software cannot be used in commercial applications?
It does not! It only requires that you provide the source (which has nothing to do with whether the software is commercial or not).