Slashdot Mirror


Open Source Database Clusters?

grugruto asks: "A lot of open source solutions are available to scale web sites with clusters but what about databases? I can't afford an Oracle RAC license but can I have something more reliable and fault tolerant than my single Postgres box? I have seen this recent article that looks promising for open source solutions. Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?"

26 of 350 comments (clear)

  1. transactionality is hard by solarisguy · · Score: 5, Insightful

    For what it's worth, the commercial solutions are hard to setup, unstable and terribly difficult to maintain, and this is after a small fortune has been invested in making them work. Not to knock the open source solution, but it's hard to beleive that something that is infrequently used and difficult to understand will be truly production quality if you want to use it for money.

    1. Re:transactionality is hard by sys$manager · · Score: 4, Insightful

      Oracle 9i RAC running on Veritas Foundation Suite HA+ Database Edition is a snap. Parallel server in 8 was hard to set up and unstable though.

    2. Re:transactionality is hard by Rorschach1 · · Score: 4, Insightful

      I never set up a pre-9i cluster on Windows, but I ran Parallel Server in 7.1 on OpenVMS and it worked great. Of course, OpenVMS has had real clustering for a long time - Windows still isn't anywhere near where OpenVMS was 20 years ago.

    3. Re:transactionality is hard by sys$manager · · Score: 3, Insightful

      Nothing will ever touch VMS clustering, it's kind of sad that 20 year old technology is so much more stable than modern technology. I mean VMS has only advanced two major versions in ten years.

  2. MySQL Replication by infernalC · · Score: 4, Insightful

    MySQL has very nice replication functionality, and, in certain circumstances, you can even set up replication rings. It is somewhat flexible about the topology you choose to use, so pick the one best for your application. Load balance ala DNS and you're in business.

    1. Re:MySQL Replication by sys$manager · · Score: 2, Insightful

      Yes, as long as you don't give a damn about ACID compliance.

    2. Re:MySQL Replication by linhux · · Score: 2, Insightful

      I used to say the same a few years ago. But nowadays I say that most website that actually has some kind of functionality at all above "insert-update-select" an article or similar, needs - or at least wants - transaction and isolation support. It makes it much easer to create logical database schemas without having to tuck everything into a single table just to make sure you only need one atomic INSERT. Instead you can spread things out and allow for a lot more flexibility - and then you'll want transactions to maintain consistency among the tables.

    3. Re:MySQL Replication by strobert · · Score: 3, Insightful

      sorry, maybe it is just me, but the whole "ARRGG IT AIN'T ACID" is a lot of hype to me. ACID boils down to transactions. plain and simple.

      And I have found in many applications it is easier to deal with transaction type data consistency at the app layer instead of the db one.

      knowing that a DB transaction is complete doesn't help you if for in order to move forward you have to have db ops done in mtuliple servers and/or a change happen with an external vendor.

      And generally some bad code/process will at some point munge your data in a similar way as if you had a db crash in the middle of a transaction.

      I have generally seen for most applciations you are better off just coding things to treat outside input (including data from a db) as evil until you
      have verified it and cope with the abnormalities.

      Yes there are exceptions, but ACID tends to be a knee jerk reaction, and most people realyl need to be askign themselves what it ACTUALLY buys them.

  3. -1:Troll by stratjakt · · Score: 4, Insightful

    Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?

    They don't compare to commercial products. I know it isn't what you want to hear, and there are hundreds of kids here to tell you different, but they just dont compare. Those kids database experience doesn't extend past an address book.

    Even if you manage to get them to technically keep up, transaction wise, to Oracle or SQL Server, the ACID enforcement isn't there, the syntaxes are kludgy. Gack.

    My company ships products with SQL Server or Oracle as the back end. I've tried to put together an OSS solution so I could impress the big boss with millions of bucks of saved license fees. They just aren't anywhere close IMO.

    Run a SQL Server farm on the back end if you cant afford an Oracle license. Don't be an OSS idealogue in the business world, you end up unemployed.

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:-1:Troll by venom600 · · Score: 5, Insightful

      ACID enforcement isn't there
      Actually ACID compliance is getting pretty darn good in databases like MySQL. Care to elaborate about what ACID compliance issues you have?

      Don't be an OSS idealogue in the business world, you end up unemployed.
      Actually, in our flailing economy 'OSS idealogues' as you call them are making a lot of head-way. OSS now has a viable alternative to *just about* any commercial enterprise software out there.

    2. Re:-1:Troll by venom600 · · Score: 4, Insightful

      I realize that stuff has improved in the year since I've seriously looked at it, but I'm doubtful it's reached the level of Oracle or SQL Server.
      You should look again. MySQL, for example, has full transaction support with InnoDB table types.....AND it's pretty damn fast.

      Watch 404 messages from websites for telling clues - mysql always fails before apache.
      I'm sorry, but that doesn't seem like a very accurate way of measuring database reliability. One of the cool (and sometimes harmful) things about open databases is that there is no entry fee...meaning anybody and their brother can set up a MySQL server. This means that the number of ill-managed MySQL servers out there probably out-numbers Oracle or SQL Server installations (which, typically, have a somewhat knowledgeable admin behind them) by 10 to 1. A MySQL database managed by somebody who knows what they are doing will go head to head with Oracle or SQL Server installations which are also managed by someone who knows what they are doing.

    3. Re:-1:Troll by Kunta+Kinte · · Score: 4, Insightful
      Run a SQL Server farm on the back end if you cant afford an Oracle license. Don't be an OSS idealogue in the business world, you end up unemployed.

      And I would fire the IT guy who causes my company to spend $10,000 for SQL Server in a situation where the free MySQL or Postgres would do.

      Just focus on the right tool for the job. If the database is a simple one. If it is regularly backed up and your company can stand a small period of downtime, why on earth would you buy Oracle or MS SQL Server?

      This is not to say that MySQL is unreliable. I have *never* seen MySQL crash, or lose any of my data. So it would be silly of me to go with Oracle, just because everyone else is doing it.

      The right tool for the job people.

      --
      Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
    4. Re:-1:Troll by Lumpy · · Score: 3, Insightful

      They don't compare to commercial products. I know it isn't what you want to hear, and there are hundreds of kids here to tell you different, but they just dont compare. Those kids database experience doesn't extend past an address book.


      yeah, nobody would ever run a high traffic website on OSS database.....

      Anyone know of a high traffic website that uses per and OSS database servers in a cluster?

      Oh yeah.... this place

      --
      Do not look at laser with remaining good eye.
    5. Re:-1:Troll by fupeg · · Score: 2, Insightful

      ACID transactions using InnoDB, and performance is quite good on MySQL 4.0. My company uses a MySQL/InnoDB server for the loading of content from our partners to our site. The loader is an extremely multi-threaded piece of code that can put quite a strain on database. The loader also writes part of the data it loads to an Oracle 9i database. The MySQL db is running on a dual processor machine (the same machine the loader is running) and the Oracle db runs on its own 4-processor machine (both run Linux, though the Oracle one uses a slightly older kernel as required by Oracle.) When we crank up the loader, it's the Oracle server that becomes unresponsive first. MySQL scales better, at least in this case. We've had no problems with ACID transactions, and the loader code involves several long, distributed transactions (across dbs) running with isolation level of read-committed. We also use a third party search engine that uses SQL Server for persistence. This is typically the bottleneck in the system. When we talk about improving scalability and performance, the MySQL db is the last thing ever mentioned.

    6. Re:-1:Troll by IANAAC · · Score: 2, Insightful
      One of the cool (and sometimes harmful) things about open databases is that there is no entry fee...meaning anybody and their brother can set up a MySQL server.

      Not to knock MySQL, but this is actually part of hte problem. You get anybody and their brother setting something up that then crashes and burns.
      I don't care what anybody says, setting up a database is nothing trivial.
      Regarding your comment that MySQL would go head to head with Oracle or SQL Server is, frankly, laughable.
      I would suggest, partucularly to someone familiar with Oracle, PostgreSQL over MySQL any day. In addition to MySQL's much-hyped transactions (which have been in PostgreSQL forever), PostgreSQL has a procedural lanaguage with which Oracle folk would feel quite at-home.
      Let's not forget that PostgreSQL's SQL implementation is much more standards compliant.
      Again, this is not to rant on MySQL, but it kind of irks me that someone can slap up a MySQL DB and claim to know all about databases.
    7. Re:-1:Troll by stanwirth · · Score: 3, Insightful

      My main problem with PostgreSQL is the query optimiser. Oracle's query optimiser is definitely superior as Postgres occasionally comes up with some peculiar query plans.

      I had the same experience. You basically have to optimse large queries combined with joins and subselects on Postgresql yourself -- and often with Oracle, as well, if its for tables with > 1-10M records. ish. You might want to check out DB2. Awesome clustering -- IMHO more sophisticated and flexible than Oracle's. YMMV depending on the application, as always. Also, if it's a development environment, you can test DB2 and Oracle on linux boxen to your heart's content for the same price as PostGreSQL -- free .

      MySQL may be able to handle subselects, but it's still struggling with triggers and stored procedures.

  4. Clustering is never fun by scosol · · Score: 2, Insightful

    Open-source or not...

    I would say just get a bigger box for your PostgreSQL solution and do semi-realtime remote replication on the tables you dont want to lose.

    --
    I browse at +5 Flamebait- moderation for all or moderation for none.
  5. High Availability by mcdrewski42 · · Score: 5, Insightful


    HA is always crapshoot/tradeoff between cost and risk. Throw enough $ at the problem and you'll approach 100% availability.

    I know that 'more robust' is a nice thing to want, but you really need to think about what you really need. If it takes 15 minutes to switch over to a backup copy (using some magic RAID disk mirroring maybe?) and 15 minutes to restart the app and let it checkpoint it's way up to a decent operational speed again, is that good enough?

    If it takes an hour, how about that?

    How much time/heartache or money is it worth for you to have system downtime, and how much are you willing to expend to reduce it by 5, 15, 30 minutes?

    So, there's really a continuum of availabilty you have to pick your point in. At the low end, you have no backups and recreate everything from scratch. At the high end you use Vendor X's real clustering solution and 24x7 monitoring, then have zero downtime even in a disaster. Somewhere in the middle is you.

    Now I realise this an overtly commercial view of things, but if needs be replace money with effort and season to taste.

    --
    /* affect != effect */ void affect(int *thing,int effect) { *thing += effect; }
  6. Well.... by agent+dero · · Score: 2, Insightful

    If you're working with enough data that would require a CLUSTER, then I would suggest a commercial product.

    But if you need that SPEED, but not a lot of data storage, I'd say a decent sized MySQL cluster would cover you, depending on what your needs are.

    If you are in the position to actually need a cluster to do that much work, you should be able to get something commercial and more large-scaled oriented

    --
    Error 407 - No creative sig found
  7. PostgreSQL eRServer 1.0 + Backplane by pabos · · Score: 4, Insightful

    Two options I haven't seen anyone mention yet are PostgreSQL eRServer 1.0+ (see PostgreSQL news item "PostgreSQL now has working, tested, scalable replication!" from August 28, 2003 or a lengthier press posting "PostgreSQL, Inc. Releases Open Source Replication Version") and Backplane.

    eRServer has been in development for over two years, is used in production settings and is released under a BSD license (as with PostgreSQL). It uses a single master/multiple slave asynchronous replication scheme. There are cautions in the release that replication may be difficult to setup.

    Backplane seems to be particularly well-suited to clustering data quickly across a WAN. A quote may explain it better:

    The Backplane Open Source Database is a replicated, transactional, fault-tolerant relational database core. Currently supported on Linux and FreeBSD, Backplane is designed to run on a large number of small servers rather than a small number of large servers. With Backplane, it is possible to spread the database nodes widely, allowing database operations to work efficiently over WAN latencies while maintaining full transactional coherency across the entire replication group.
    Backplane's native quorum-based replication makes it easy to increase the database capacity (by simply adding a new node), re-synch a crashed server, or take down multiple nodes for maintenance (such as an entire co-location facility) - all without affecting the database availability.

    I haven't used either yet, but you may wish to give them a look.

  8. If you are developing a new app, by Anonymous Coward · · Score: 1, Insightful

    check out prevayler

    Seems like an excellent alternative to the traditional database route (though I myself have not yet used it in an application)

    Here is a developerWorks article about Object Prevalence (of which prevayler is an implementation): An introduction to object prevalence

  9. Re:Shared storage? by Kashif+Shaikh · · Score: 2, Insightful

    Oracle is giving away a cluster filesystem (so they can sell RAC on linux) there is OpenGFS as well for filesystem usage.

    I was saying "Wow! Oracle has released a clustered filesystem!", until I discovered it only works with shared-storage. Meaning it won't create a filesystem image across a cluster network, where data is distributed. But rather the cluster filesystem is stored in a centralized location, but can be accessed by multiple members of the cluster at the same time for both read and write.

  10. Here kiddie, kiddie by C10H14N2 · · Score: 2, Insightful
    Great, we had to go there. "Kiddie" and then resort to recommending MS SQL, which has proven itself to be, shall we say, not the most "reliable" on the planet and elaboration shouldn't be necessary in these parts. Oddly enough, although MS SQL was "licensed" (a-la CPM, sigh) from Sybase, people often simultaneously disparage Sybase and praise MS SQL, despite the fact the Sybase licenses are roughly half of MS SQL. So we're left, essentially, with Oracle, which the vast majority of businesses will find not the least bit cost effective, for that matter even necessary, certainly not at twice to four times that of the next commercial competition. No wonder people look for open source solutions.


    At any rate, many "big kids" are using the most unfairly bullied product, slandered most likely because it is a software boy-named-sue, MySQL. Why not have a read before taking childish pot-shots:


    http://www.mysql.com/press/MySQL_userlist.pdf

    In the end this silly "I'm a big boy because I use oracle and your a little gurly kiddie because you don't" bullshit is just empty bravado. Businesses generally attempt to find the most cost effective means to meet a need and often Oracle ends up being like buying a stealth fighter to deliver a pizza. It often just doesn't make sense even for a big kid with billions of dollars, which might be why the $30B+ multinational BASF uses PostgreSQL.


    Frankly, after the named-user license Oracle sold the State of California, no matter how idiotic the clearly comatose contract negotiators were, one would be remiss to not consider other companies with slightly less egregious behavior on record.

  11. Re:I've used MySQL and PHP on a reasonably big sit by Anonymous Coward · · Score: 1, Insightful

    Are your slaves using InnoDB or MyISAM? If the latter have you checked if the 30+ second delays are caused by the replication thread begin blocked by a read lock? The course grain (table level) locking of MyISAM make its impossible to get consistent behavior under load in a read/write environment (even if the only writer is the replication thread). Use InnoDB instead, it is amazing.

  12. Re:I've used MySQL and PHP on a reasonably big sit by DaMoose · · Score: 2, Insightful
    OMG - as I was reading your post, I starting to think you must be one of my developers - we had the identical configuration and issues with Superdudes.net.

    It got to the point where the slave servers (P4, 2GB RAM, Hardware RAID) could not keep up with the Master replication _and_ service SELECT queries. The data was too big for RAM (filesort) and the drives were not fast enough (2 drives mirrored). The Master is dual PIII 2Ghz, 2GB RAM , and fast RAID 5 hardware.

    I ended up solving the problem with a hardware upgrade. I replaced all 4 servers with 1 Quad-Opteron 1.8GHz, 16 GB RAM, and _VERY_ Fast RAID 10 across 9 fast drives.

    Please feel free to check it out. For the first time in a long time, I'm not affraid that the MySQL server will be the bottlenect in this very dynamic web site.

    We use Linux, Apache 1.3.x, MySQL 4.0.x, and PHP 4.x to build the pages and generate XML to our Flash MX applications.

    Superdudes.Net

    Flash heavy signt. Free registration required to access the coolest features (those which beat up the MySQL server).

  13. Arguably, there's no such thing as a replicated DB by Anonymous Coward · · Score: 1, Insightful

    If you have a widely replicated, multi-master, database and the replication network becomes partitioned, it's impossible in general to fully resolve the inconsistencies when connectivity is restored.

    There are particular instances of applications where a specific replication solution might be adequate. But it really does depend on the application requirements.