Slashdot Mirror


Database Clusters for the Masses

grugruto writes "Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back! ObjectWeb, an Apache-like group, has announced the availability of Clustered JDBC (or C-JDBC). C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases). It is simple: take a bunch of MySQL or PostgreSQL boxes, choose your RAIDb level (partitioning, replication, ...) and you obtain a scalable and fault tolerant database cluster."

10 of 278 comments (clear)

  1. Non-Java Implementations? by the_quark · · Score: 4, Interesting

    Just started looking at the site. I've wanted this for years. I was ecstatic with what load-balancing cheap Apache boxes did for the cost of web hosting. Unfortunately, reliability has still required hundreds of thousands of dollars of high-end equipement and software for databases. I've been hoping the open-source community would make headway on this front.

    So, the question is - is anyone working on anything like this for Perl, C, or generic implmentations?

    1. Re:Non-Java Implementations? by palad1 · · Score: 4, Interesting

      Please don't take my previous post as a flame, I completely agree with your point. What I was whining about was the fact that java doesn't play nice with system libs, as it is 'easy' to import other libs, but exporting java classes to other languages is ...
      Let's say that few people feel like embedding a JVM to their C app :)

    2. Re:Non-Java Implementations? by Matt2000 · · Score: 2, Interesting

      I don't know if you're still writing applets, but Java works. And there's a reason this DB RAID was written for Java and not for the other languages you mentioned, because Java works.

      You should check out some of the Java technologies post 1999, they're entrusted with a lot of sensitive computing nowadays.

      --

  2. hmmm by the_2nd_coming · · Score: 4, Interesting

    now if only MySQL or PosgreSQL can get the reputation that Oracle has mabye we will start to see Oracle DBs go away in favor of the cheaper solutions using RAIDb

    --



    I am the Alpha and the Omega-3
  3. Performance? by deranged+unix+nut · · Score: 5, Interesting

    Hmm, interesting idea. I didn't see performance listed as a feature.

    I wonder how much slower my query will be when the data is spread across several machines. I'd imagine that a few complex queries that aren't correctly optimized would bring this system to it's knees rather quickly.

  4. How about a meta-database adapter? by Frater+219 · · Score: 4, Interesting
    Since the article suggests the idea of applying disk-volume concepts (RAID) to databases, I thought I'd bring this up: For a while now I've been wishing there was an equivalent of NFS for databases, a way to mount a running database's tablespace into another database. This would allow one to draw together disparate databases, creating views and running joins across tables which natively reside in different databases, on different hosts.

    Here's an example of an application: I have a database-driven Web application that allows my onsite clients to register network services for openings in the firewall. Another software component probes the registered hosts for daemon version information and records it in the database, so that we can send out alerts when security holes are discovered in particular versions. I use PostgreSQL on Debian and Solaris. Independently of my work, our networking office has a Microsoft SQL Server database of IP addresses, MAC addresses, and physical switch ports and jack numbers.

    What I'd like to do is mount both my database and the networking office's database into some sort of "meta-database" -- analogous to mounting filesystems from two different hosts via NFS -- and run SQL queries that span both data sets. I wouldn't expect to be able to write to this conjoined database -- locking would be a nightmare -- but being able to SELECT across the two sets would be incredibly valuable.

  5. More info on transactions by binaryDigit · · Score: 3, Interesting

    Maybe I missed it but there info is pretty sparse on how they handle updates (i.e. adds/deletes/updates). Does it do two phase commit so if I'm stripping data and one of the updates fail then everything fails? If they are replicating, will they automatically update replication servers if they are down at the time of the update? If one of the databases in the RAIDb doesn't support online backups and it's backing up, what will their system do? After all, this would be the true grunt work, without these features then what they have isn't a big deal at all. Does anyone have more info?

  6. How does clustering improve performance? by snatchitup · · Score: 1, Interesting

    Just curious.

    How do you join one table to another when they are on two separate boxes?

    Well. I know how to actually use SQL to join two tables from two separate databases. But what is actually happening inside the RDBMS at the low lever. Does one just bring over the entire other table. How does it use indexes.

    Seems to me this really is doing at best, a reference implementation that may actually degrade performance.

  7. Fine-grained caching question by code_nerd · · Score: 2, Interesting
    The user's guide page says this about caching:

    8.7.2. Request cache

    The query cache provides query result caching. If two exact same requests are to be executed, only one is executed and the second one waits until the completion of the first one (this is the default pendingTimeout value which is 0). To prevent the second request to wait forever, a pendingTimeout value in seconds can be defined for the waiting request. If the timeout expires the request is executed in parallel of the first one.

    A request cache element is described as follows:

    The request cache granularity defines how entries are removed from the cache. noInvalidation provides a non-coherent cache and should only be used for testing purposes. table and column provide table-based and column-based invalidations, respectively. columnUnique can optimize requests that select a unique primary key (useful with EJB entity beans).

    Can someone who understands C-JDBC better than I do explain what this might mean? Sounds to me like they are replacing a feature of CMP by doing this, which is not necessarily something that would be "useful with EJB entity beans" if I understand it right (unless maybe they are referring to folks using EJB 1.0?). That is, the container already handles cache-invalidation at a fine-grained level. Perhaps there is a scenario I am not imagining where it would be useful to have this at the database level also... thoughts?
  8. Re:Why? by grugruto · · Score: 2, Interesting
    You have your web site backed by an open source database?
    Just put a replica on a second node and you will have fault tolerance (even just for maintenance) and you will be able to handle peak loads. 2 nodes is already a cluster, don't need to have hundreds of nodes.

    Another usage could be to keep a single Oracle instance and put a bunch of open-source databases to offload your main Oracle database. You could have all the write queries (orders, ...) handled by your [safe] main Oracle database and have all other open-source databases handle the read requests for browsing your web site (which is the main part of the load). What do you think of this idea of scaling Oracle with open-source databases?