Slashdot Mirror


Database Clusters for the Masses

grugruto writes "Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back! ObjectWeb, an Apache-like group, has announced the availability of Clustered JDBC (or C-JDBC). C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases). It is simple: take a bunch of MySQL or PostgreSQL boxes, choose your RAIDb level (partitioning, replication, ...) and you obtain a scalable and fault tolerant database cluster."

21 of 278 comments (clear)

  1. If only replicaton was so trivial by marcink1234 · · Score: 4, Insightful

    Running many databases is easy. Organizing and serializing replication is hard. Even if one have distributed transactions handy - not present in this case. But let's read their code...

  2. This is a threat to the big vendors by Jack+William+Bell · · Score: 4, Insightful

    This is a major threat to the big vendors. In fact I would say it is even more of a threat to Oracle than it is to MS! After all MS can continue to go after the midrange market that are are already locked into them for the OS.

    But Oracle shops are dealing with expensive boxes they would love to replace, not to mention expensive Oracle licenses. Often the only reason they use Oracle (other than Oracle salesmen licking their buttholes) is because only Oracle has the horsepower to meet their requirements. Give them a cheaper alternative with the same capabilities and they will bail out faster than you can say 'Geronimo'.

    Expect Larry Ellison to start talking about the dangers of using Open Source software now...

    --
    - -
    Are you an SF Fan? Are you a Tru-Fan?
    1. Re:This is a threat to the big vendors by FortKnox · · Score: 3, Insightful

      I have to say this is a major point. This is why you don't see people using open source. If my DB goes down, I call up Oracle, and make them bring someone down here to fix the problem. If my open source DB goes down, I crap my pants and hope to keep my job.

      What does proprietary software have that Open Source doesn't? Insurance.

      The best way to knock over oracle is to start up a company that supports open source for a fee (which is cheaper than running oracle for a year).

      --
      Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    2. Re:This is a threat to the big vendors by mangu · · Score: 3, Insightful
      please prove to me that this new "threat" has all of the features of Oracle


      That's exactly the point. Who needs all the features of Oracle? Maybe the IRS or Mastercard, but the vast majority of Oracle users are getting just one feature: the Oracle reputation that their marketing has built.


      And with all those features comes the big problem of managing them: no matter how small the application is, once you choose Oracle you need a team of experienced DBAs to correctly and reliably configure the system.

    3. Re:This is a threat to the big vendors by AlecC · · Score: 2, Insightful

      The best way to knock over oracle is to start up a company that supports open source for a fee (which is cheaper than running oracle for a year).

      Which is exactly what MySQL AB does for MySQL. Their support is not particularly cheap, (though I be that it is a lot less than Oracle's), but I recommend it highly. The original designers are still leading the development/support team (is that true for many of the alternatives?) and make a living *only* because of their superior product, not because some salesman conned the management.

      (You may gather that I am a fan of MySQL).

      --
      Consciousness is an illusion caused by an excess of self consciousness.
    4. Re:This is a threat to the big vendors by valisk · · Score: 5, Insightful
      People will always by oracle because "No one ever got fired for choosing Oracle". If something goes wrong, you always have someone to blame. With open source, your job is more on the line because you have to take responsability.

      Prior to Oracle taking off in a big way people used to say:

      People will always by IBM because "No one ever got fired for choosing IBM". If something goes wrong, you always have someone to blame. With the Seven Dwarfs (the common name for IBMs competitors back then), your job is more on the line because you have to take responsability.

      Then Larry E. shamelessly put together a cool SQL database which copied every major innovation IBM had made and added in a few more for good measure. He also cut the price by a third, IBMs database customers deserted in droves, after all if this Oracle thing turned out to be shit, they could always get IBM to come clean up the mess. It turned out though, that Oracle wasn't and isn't shit.

      That does not mean that Oracle is immortal and will always be top of the pile, Postgres now replicates almost all of the major features and is proven in the reliability stakes, tools like this are only going to make it more likely that corporate data departments will dip their toes into the Free software waters, after all if it turns out to be shit, they could always get Oracle to come clean up the mess.

      --

      Economic Left/Right: -0.62
      Social Libertarian/Authoritarian: -3.69
  3. Re:Non-Java Implementations? by Etcetera · · Score: 3, Insightful


    Exactly -- given that the RAIDb itself sits elsewhere, I can't imagine it would be that hard to take the source itself and make a Perl DBD::Module out of it.

    If only I had the spare time... :/

  4. Re:Non-Java Implementations? by akadruid · · Score: 4, Insightful

    Unfortuntly there is no free open source hardware available :)
    Seriously though, this may reduce the costs for some users but I don't think it will get a wide take up. Most people will not want to leave the deniability you can have with large corps like Oracle. Oracle is a 'safe' solution for the purchaser with their ass on the line, which is most corperate users these days.
    And the more entrepenrial users will not usually have the hardware to use this properly anyway.
    Anyone who is financing this lot will want proven standards.
    Just my flawed £0.02

    --
    "Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
  5. Where are the benchmarks that they speak of ? by a7244270 · · Score: 5, Insightful

    I looked at the diagram, and it looks very nice, but they seem to be very light on the details.

    Supposedly, This new version has been successfully tested with Tomcat, JOnAS, MySQL and PostgreSQL. Excellent results have been obtained with the TPC-W and RUBiS benchmarks.

    Don't get me wrong, I like the idea, and I have been wanting something like this for years, but I sure would like to _see_ the test results, even if they are preliminary.

  6. Re:Non-Java Implementations? by palad1 · · Score: 3, Insightful
    So, the question is - is anyone working on anything like this for Perl, C, or generic implmentations?

    Am I the only one a bit saddened by the fact that Sun botched it with java that much, that we now exclude java from 'generic implementations'

    Build once, run anywhere, riiiiight.

  7. Why? by Anonymous Coward · · Score: 2, Insightful

    Why do masses need database clusters? Does anyone apart from mid-large sized businesses need one?

  8. supposed to be at RDMS level by Arethan · · Score: 4, Insightful

    Isn't clustering supposed to be a function of the database system, not the software you use to access it?

    I mean, this is neat and all, but I really don't want to have to use this interface just so that I can cluster my database. You're much better off placing clustering functions within the database itself. Then you can access the data by any method (ODBC, native libraries, hell even with the provided command line interface).

    Take a look at how MS SQL Server performs clustering sometime. Everything (and I mean EVERYTHING) is performed via triggers and tsql. All the clustering setup does is set up a bunch of known working trigger scripts to propagate the data. You can even edit them to your liking afterwards if you wish. Now I'm not saying that MS's solution for clustering is the cat's ass. Personally, I think it is kind of hackish, but then again I believe that clustering should be something you simply turn on, and shouldn't be able to fuss with. Realistically, I can't think of any good reason to change the cookie cutter tsql scripts that perform the clustering, so I only see the ability to modify them as a potential way to fsck it up (that being an obviously bad thing).

    Clustering really isn't that hard to implement. I'm pretty surprised that MySQL and Postgres don't have better support for it. Especially Postgres, since transaction support is really the one big key that makes clustering possible. Maybe no one has really had an itch to make it heppen yet. Hopefully it will happen soon, since I'd love clustering to be another argument for why OSS databases can play with the big kids just as easily.

    1. Re:supposed to be at RDMS level by Vihai · · Score: 3, Insightful

      You are true, clustering not only it better implemented ad DMBS itslef, it actually NEEDS support from the DBMS.

      You are wrong saying that implementing clustering isn't hard.

      If we are talking about REAL DBMSes (no, MySQL is not a real DBMS) enabling every form of clustering which maintains the ACID properties we expect from a DBMS is a major step, it means becoming a distributed application, and it is one of the most complex thing to implement.

      Just for example, suppose you have two machines in a master-to-master configuration, suddenly the network become partitioned, each server thinks that the other is offline, but the clients can reach both of them.

      Suppose now that the clients update the same record on the two servers in an incompatible way... you could imagine what will happen when the servers become visible to each other again...

  9. "Shared-Nothing Architecture" by Anonymous Coward · · Score: 2, Insightful

    The commercial databases that have been doing this for years are DB2, Informix, and Teradata.

    Know what? There are a ton of deep issues beyond just making the different partitions transparent to the application level. Think about joins across partitions for sec...

  10. Slightly Offtopic.... by frodo+from+middle+ea · · Score: 3, Insightful

    But , Seriously do you see Oracle/DB2 etc customers suddenly jumping over this ?
    My view is that it may be difficult to migrate OSes or even hardware, but its almost darm impossible to migrate existing Databases.
    A Database is the most fundamental and most cared about aspect of a major business. There is a lot of time and effort and MONEY spent to incorporate it in to the company.
    Lots and lots of critical business applications are written using the propritory extenstions of these vendors. Is it very easy to migrate this code ?
    May be interesting for a future pilot project, but if expect business to change their database vendors.. that's not going to happen very soon.

    --
    for the last time people, I am "frodo from middle eaRTH", not "middle eaST".
  11. Limitations by Anonymous Coward · · Score: 1, Insightful

    It's a good start - but not ready for prime time yet... Stored proc support is essential in a production setting.

    4.4. Current limitations

    The C-JDBC driver currently does not support the following features:

    * XAConnections,
    * updatable ResultSets,
    * callable statements (stored procedures),
    * Blobs,
    * batch updates,
    * multiple controller failover is subject to controller support for distributed virtual databases,
    * JDBC 3.0 features.

  12. Re:hmmm by Sxooter · · Score: 2, Insightful

    Interesting point. I find that there are several views when it comes to OS databases.

    One is that since most open source databases lack some feature, they will never replace any Oracle servers. Most of the people who believe this also believe that Oracle servers are always used in high parallel load transactional systems that have to be up 24/7 and never go down. While plenty of sites that need that use oracle, it is not inversely always true. Many places put Oracle online because it's what their developers know and love, not because it's the best fit for the problem.

    The next view is that Open Source databases are ready to replace Oracle right now, everywhere. While there are plenty of places using Oracle that could switch to Pgsql/MySQL/Firebire right now, there are plenty more that couldn't dream of it. It's all about what you're doing with your database that defines which ones you can use.

    The final view is the right tool for the job view. These folks are rare. They're actually loaded test datasets into various database engines, read up on how each db's locking mechanism works, examined each to see where the best fit is.

    People relying on the first two views are treating computer science like a religion instead of a science.

    --

    --- It is not the things we do which we regret the most, but the things which we don't do.
  13. Re:Clusters aren't performance? Just not true! by the_2nd_coming · · Score: 2, Insightful

    to bad the licensing for those cpus is exponential

    --



    I am the Alpha and the Omega-3
  14. Not there yet... by jfroebe · · Score: 2, Insightful

    While, I commend their efforts, what they are offering is little more than a poor man's High Availability cluster.

    The shared disk array (RAID, etc.) is just a part of implementating HA.

    My recommendation is for the developers to take a look at how it is implemented in the enterprise DBMSs (Sybase, Oracle, MS SQL Server, DB2) first.

    jason

    --
    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
  15. Re:How about a meta-database adapter? by Anonymous Coward · · Score: 2, Insightful

    > "You might need to update a value on the PostgreSQL server, and VBA can do this by updating the remote text file."

    You complete fucking moron. You are suggesting using shared text files as a database server. Hello? Anyone home? That's even worse than Jet! What about locking, integrity, and so on? Damn you are stupid.

    > "Access can't import arbitrarily large files"
    Don't use Access. Duh.

    > "You might not need to import all of a given table"
    Don't import the whole table. Duh.

    > "If the database becomes corrupted"
    Get your DBA to fix the database corruption. Duh. This is only a common problem for glue-sniffing Access bozos like yourself.

    > "Obviously, I shouldn't be able to see financial information for fellow employees, but I need to see information for our members to do my job"

    A) VBA does not solve this problem - there's no security model there at all.
    B) What you need is called a "View". You can learn more by taking SQL 101 at the local community college.

    > "I've had to clean up after the likes of you - you know, the folks who think database design is just running a wizard. Not every database problem fits into the Microsoft database model mold"

    This from an Access programmer who is stupified by basic RDBMS concepts. Yes, please please "clean-up" my stuff by introducing MS Access and updatable CSV kludges into the mix. That will really help tons. (To make you a tiny bit more informed, DTS is actually an API, although it comes does with a wizard for MS Access dinks like yourself)

    After you complete your elite mastery of VBA, I recommend reading up on the tools found in MS-SQL or another RDBMS.

  16. Re:Non-Java Implementations? by perlchild · · Score: 2, Insightful

    You forgot the replication and transactional aspects of it...
    What happens if a transaction fails on one member of a cluster, but not another, do you report success or failure?
    That's the problem with using this kind of proxy architecture, once you "commit" transaction on server 1, if it fails on server 2, how do you rollback server 1? you can't... it's already committed...
    (I won't go into the atomicity of how you would rollback a commited, non-atomic change because another server failed, to keep them in check, nor how that might mean you might have to stop accepting transactions until the discrepancy is resolved)

    None of this is covered by LVS, which is a fine product, it just doesn't apply to the right area of the problem(there's more to database clustering than connection redundancy).