Database Clusters for the Masses
grugruto writes "Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back! ObjectWeb, an Apache-like group, has announced the availability of Clustered JDBC (or C-JDBC). C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases). It is simple: take a bunch of MySQL or PostgreSQL boxes, choose your RAIDb level (partitioning, replication, ...) and you obtain a scalable and fault tolerant database cluster."
Running many databases is easy. Organizing and serializing replication is hard. Even if one have distributed transactions handy - not present in this case. But let's read their code...
This is a major threat to the big vendors. In fact I would say it is even more of a threat to Oracle than it is to MS! After all MS can continue to go after the midrange market that are are already locked into them for the OS.
But Oracle shops are dealing with expensive boxes they would love to replace, not to mention expensive Oracle licenses. Often the only reason they use Oracle (other than Oracle salesmen licking their buttholes) is because only Oracle has the horsepower to meet their requirements. Give them a cheaper alternative with the same capabilities and they will bail out faster than you can say 'Geronimo'.
Expect Larry Ellison to start talking about the dangers of using Open Source software now...
- -
Are you an SF Fan? Are you a Tru-Fan?
Exactly -- given that the RAIDb itself sits elsewhere, I can't imagine it would be that hard to take the source itself and make a Perl DBD::Module out of it.
If only I had the spare time...
Hire a Linux system administrator, systems engineer,
Unfortuntly there is no free open source hardware available :)
Seriously though, this may reduce the costs for some users but I don't think it will get a wide take up. Most people will not want to leave the deniability you can have with large corps like Oracle. Oracle is a 'safe' solution for the purchaser with their ass on the line, which is most corperate users these days.
And the more entrepenrial users will not usually have the hardware to use this properly anyway.
Anyone who is financing this lot will want proven standards.
Just my flawed £0.02
"Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
I looked at the diagram, and it looks very nice, but they seem to be very light on the details.
Supposedly, This new version has been successfully tested with Tomcat, JOnAS, MySQL and PostgreSQL. Excellent results have been obtained with the TPC-W and RUBiS benchmarks.
Don't get me wrong, I like the idea, and I have been wanting something like this for years, but I sure would like to _see_ the test results, even if they are preliminary.
Am I the only one a bit saddened by the fact that Sun botched it with java that much, that we now exclude java from 'generic implementations'
Build once, run anywhere, riiiiight.
Why do masses need database clusters? Does anyone apart from mid-large sized businesses need one?
Isn't clustering supposed to be a function of the database system, not the software you use to access it?
I mean, this is neat and all, but I really don't want to have to use this interface just so that I can cluster my database. You're much better off placing clustering functions within the database itself. Then you can access the data by any method (ODBC, native libraries, hell even with the provided command line interface).
Take a look at how MS SQL Server performs clustering sometime. Everything (and I mean EVERYTHING) is performed via triggers and tsql. All the clustering setup does is set up a bunch of known working trigger scripts to propagate the data. You can even edit them to your liking afterwards if you wish. Now I'm not saying that MS's solution for clustering is the cat's ass. Personally, I think it is kind of hackish, but then again I believe that clustering should be something you simply turn on, and shouldn't be able to fuss with. Realistically, I can't think of any good reason to change the cookie cutter tsql scripts that perform the clustering, so I only see the ability to modify them as a potential way to fsck it up (that being an obviously bad thing).
Clustering really isn't that hard to implement. I'm pretty surprised that MySQL and Postgres don't have better support for it. Especially Postgres, since transaction support is really the one big key that makes clustering possible. Maybe no one has really had an itch to make it heppen yet. Hopefully it will happen soon, since I'd love clustering to be another argument for why OSS databases can play with the big kids just as easily.
The commercial databases that have been doing this for years are DB2, Informix, and Teradata.
Know what? There are a ton of deep issues beyond just making the different partitions transparent to the application level. Think about joins across partitions for sec...
But , Seriously do you see Oracle/DB2 etc customers suddenly jumping over this ?
My view is that it may be difficult to migrate OSes or even hardware, but its almost darm impossible to migrate existing Databases.
A Database is the most fundamental and most cared about aspect of a major business. There is a lot of time and effort and MONEY spent to incorporate it in to the company.
Lots and lots of critical business applications are written using the propritory extenstions of these vendors. Is it very easy to migrate this code ?
May be interesting for a future pilot project, but if expect business to change their database vendors.. that's not going to happen very soon.
for the last time people, I am "frodo from middle eaRTH", not "middle eaST".
It's a good start - but not ready for prime time yet... Stored proc support is essential in a production setting.
4.4. Current limitations
The C-JDBC driver currently does not support the following features:
* XAConnections,
* updatable ResultSets,
* callable statements (stored procedures),
* Blobs,
* batch updates,
* multiple controller failover is subject to controller support for distributed virtual databases,
* JDBC 3.0 features.
Interesting point. I find that there are several views when it comes to OS databases.
One is that since most open source databases lack some feature, they will never replace any Oracle servers. Most of the people who believe this also believe that Oracle servers are always used in high parallel load transactional systems that have to be up 24/7 and never go down. While plenty of sites that need that use oracle, it is not inversely always true. Many places put Oracle online because it's what their developers know and love, not because it's the best fit for the problem.
The next view is that Open Source databases are ready to replace Oracle right now, everywhere. While there are plenty of places using Oracle that could switch to Pgsql/MySQL/Firebire right now, there are plenty more that couldn't dream of it. It's all about what you're doing with your database that defines which ones you can use.
The final view is the right tool for the job view. These folks are rare. They're actually loaded test datasets into various database engines, read up on how each db's locking mechanism works, examined each to see where the best fit is.
People relying on the first two views are treating computer science like a religion instead of a science.
--- It is not the things we do which we regret the most, but the things which we don't do.
to bad the licensing for those cpus is exponential
I am the Alpha and the Omega-3
While, I commend their efforts, what they are offering is little more than a poor man's High Availability cluster.
The shared disk array (RAID, etc.) is just a part of implementating HA.
My recommendation is for the developers to take a look at how it is implemented in the enterprise DBMSs (Sybase, Oracle, MS SQL Server, DB2) first.
jason
No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
> "You might need to update a value on the PostgreSQL server, and VBA can do this by updating the remote text file."
You complete fucking moron. You are suggesting using shared text files as a database server. Hello? Anyone home? That's even worse than Jet! What about locking, integrity, and so on? Damn you are stupid.
> "Access can't import arbitrarily large files"
Don't use Access. Duh.
> "You might not need to import all of a given table"
Don't import the whole table. Duh.
> "If the database becomes corrupted"
Get your DBA to fix the database corruption. Duh. This is only a common problem for glue-sniffing Access bozos like yourself.
> "Obviously, I shouldn't be able to see financial information for fellow employees, but I need to see information for our members to do my job"
A) VBA does not solve this problem - there's no security model there at all.
B) What you need is called a "View". You can learn more by taking SQL 101 at the local community college.
> "I've had to clean up after the likes of you - you know, the folks who think database design is just running a wizard. Not every database problem fits into the Microsoft database model mold"
This from an Access programmer who is stupified by basic RDBMS concepts. Yes, please please "clean-up" my stuff by introducing MS Access and updatable CSV kludges into the mix. That will really help tons. (To make you a tiny bit more informed, DTS is actually an API, although it comes does with a wizard for MS Access dinks like yourself)
After you complete your elite mastery of VBA, I recommend reading up on the tools found in MS-SQL or another RDBMS.
You forgot the replication and transactional aspects of it...
What happens if a transaction fails on one member of a cluster, but not another, do you report success or failure?
That's the problem with using this kind of proxy architecture, once you "commit" transaction on server 1, if it fails on server 2, how do you rollback server 1? you can't... it's already committed...
(I won't go into the atomicity of how you would rollback a commited, non-atomic change because another server failed, to keep them in check, nor how that might mean you might have to stop accepting transactions until the discrepancy is resolved)
None of this is covered by LVS, which is a fine product, it just doesn't apply to the right area of the problem(there's more to database clustering than connection redundancy).