Database Clusters for the Masses
grugruto writes "Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back! ObjectWeb, an Apache-like group, has announced the availability of Clustered JDBC (or C-JDBC). C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases). It is simple: take a bunch of MySQL or PostgreSQL boxes, choose your RAIDb level (partitioning, replication, ...) and you obtain a scalable and fault tolerant database cluster."
So a few things come up just reading the docs on this:
;)
1. A Controller. It looks as tho a single controller is used by the clients to communicate to the various RAID'd dbs. I'm sure there can be multiple controllers since there would be little point to make some db's redundant, yet the access to them not. Still looking into this.
2. And also, it looks as tho the default port is 1099 - RMI. If you have, for a web app, your EJBs and web app local to that containter, that might not be a problem. However, I happen to have my EJB server on its own box and this might very well cause probs. I think it said you could specify our own ports, but I haven't seen any examples in the docs yet of this being the case. Also, still looking.
A few other things exist as well which are in the docs as known limitations:
* XAConnections
* Blobs
* batch updates
* callable statements
These could be serious issues for some. My last project used CLOBs/BLOBs, batch updates and callable statements, so this would rule that out. Of course, all the db stuff was strictly tied to Oracle, so I think that would rule this all regardless.
All in all tho, this looks like a good start. As my current project progresses, clustered dbs will become more and more of an issue. I've looked into some other projects out there for Postgres, but nothing yet really satisfactory. I think this is a good step in the right direction - for Java developers. It'll be interesting to watch.
Database clustering is typically used for high-avaliability, not performance.
There are better ways to improve the performance of a database, horizontal partitioning, federated servers, etc.
This would be very cool if there was a generic implementation; we build many Microsoft SQL clusters and just the hardware requirements for an MSCS cluster easily exceed $50k, let alone the licensing...as an MCDBA I'd consider an open source solution if I could use it as a back-end ot an ASP/VB.NET application, just to save the licensing $$ for consulting! ; )
second society
Not to argue in any way that Sun botched Java, but what I meant is, this implementation is for Java programs. It provides no functionality for programs not written in Java. Even if Sun had done Java correctly, my statement would still be true - this isn't a generic implementation, as it requires the code be written in Java. Even if Java itself were generic, this implementation wouldn't be generic, it'd be Java-specific.
;)
When I said "generic implementation" I meant "an implementation which doesn't require your programs be written in a particular language." Which is probably a bit of a pipe dream, you'd still need some sort of glue code (ODBC, JDBC, DBD, etc). But, as was alluded to above, I was trying to beat the Beowulf comment when I asked my question.
Josh, know what you're talking about before you post. MySQL (the company which does the vast majority of development of MySQL) offers a variety of levels of support and consulting, regardless of the number of systems that you admin. For $48,000/year, you get:
Does Oracle match that for the price?
After actually reading the documentation, here's my informed take on this:
1) In it's current incarnation, it's only useful for very very simple database access. No transactions, no blobs, etc. Basically if you're just storing some simple weblication tables and doing single-statements against them for selects/updates (no big cross-table transactions), you can use it.
2) It's JDBC only. Perhaps someone could port the concept to ODBC though.
3) There's a new middle tier between the JDBC driver and the database itself, which is the bulk of their code. This tier actually re-implements some database constructs like recovery logging, query caching, etc. Of course this is neccesary, as trying to do replication from the client-code side alone would be impossible (what do you do when one of 3 DB mirrors goes offline for an hour? have every jdbc client cache the requests and replay them later, hoping those clients are even stilla round later?)
For some applications and some companies, in it's current state this is a godsend - but it's not a general solution yet. Making it ODBC (or even better, having the front of it emulate a native postgresql or mysql listener) would broaden it's applicability.
Supporting transactions would be a big win too, although I'm not sure how feasible this is - I think at that point they may as well just write their own new database engine which is parallel from the start, seeing as they'll be re-implementing in their cluster tier almost everything the database server does except for actual physical storage.
Still, it's nice to see that someone did this and made it work - and for a lot of simple databases behind java apps it's all you really need.
PostgreSQL has all the transaction support in place already, so of all the free DBs out there it would seem they have the best shot at doing their own native parallelism, if they would just get it done someday.
11*43+456^2