Database Clusters for the Masses
grugruto writes "Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back! ObjectWeb, an Apache-like group, has announced the availability of Clustered JDBC (or C-JDBC). C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases). It is simple: take a bunch of MySQL or PostgreSQL boxes, choose your RAIDb level (partitioning, replication, ...) and you obtain a scalable and fault tolerant database cluster."
Just started looking at the site. I've wanted this for years. I was ecstatic with what load-balancing cheap Apache boxes did for the cost of web hosting. Unfortunately, reliability has still required hundreds of thousands of dollars of high-end equipement and software for databases. I've been hoping the open-source community would make headway on this front.
So, the question is - is anyone working on anything like this for Perl, C, or generic implmentations?
now if only MySQL or PosgreSQL can get the reputation that Oracle has mabye we will start to see Oracle DBs go away in favor of the cheaper solutions using RAIDb
I am the Alpha and the Omega-3
Hmm, interesting idea. I didn't see performance listed as a feature.
I wonder how much slower my query will be when the data is spread across several machines. I'd imagine that a few complex queries that aren't correctly optimized would bring this system to it's knees rather quickly.
Here's an example of an application: I have a database-driven Web application that allows my onsite clients to register network services for openings in the firewall. Another software component probes the registered hosts for daemon version information and records it in the database, so that we can send out alerts when security holes are discovered in particular versions. I use PostgreSQL on Debian and Solaris. Independently of my work, our networking office has a Microsoft SQL Server database of IP addresses, MAC addresses, and physical switch ports and jack numbers.
What I'd like to do is mount both my database and the networking office's database into some sort of "meta-database" -- analogous to mounting filesystems from two different hosts via NFS -- and run SQL queries that span both data sets. I wouldn't expect to be able to write to this conjoined database -- locking would be a nightmare -- but being able to SELECT across the two sets would be incredibly valuable.
Maybe I missed it but there info is pretty sparse on how they handle updates (i.e. adds/deletes/updates). Does it do two phase commit so if I'm stripping data and one of the updates fail then everything fails? If they are replicating, will they automatically update replication servers if they are down at the time of the update? If one of the databases in the RAIDb doesn't support online backups and it's backing up, what will their system do? After all, this would be the true grunt work, without these features then what they have isn't a big deal at all. Does anyone have more info?
Just curious.
How do you join one table to another when they are on two separate boxes?
Well. I know how to actually use SQL to join two tables from two separate databases. But what is actually happening inside the RDBMS at the low lever. Does one just bring over the entire other table. How does it use indexes.
Seems to me this really is doing at best, a reference implementation that may actually degrade performance.
Can someone who understands C-JDBC better than I do explain what this might mean? Sounds to me like they are replacing a feature of CMP by doing this, which is not necessarily something that would be "useful with EJB entity beans" if I understand it right (unless maybe they are referring to folks using EJB 1.0?). That is, the container already handles cache-invalidation at a fine-grained level. Perhaps there is a scenario I am not imagining where it would be useful to have this at the database level also... thoughts?
Just put a replica on a second node and you will have fault tolerance (even just for maintenance) and you will be able to handle peak loads. 2 nodes is already a cluster, don't need to have hundreds of nodes.
Another usage could be to keep a single Oracle instance and put a bunch of open-source databases to offload your main Oracle database. You could have all the write queries (orders, ...) handled by your [safe] main Oracle database and have all other open-source databases handle the read requests for browsing your web site (which is the main part of the load). What do you think of this idea of scaling Oracle with open-source databases?