Slashdot Mirror


Is there An Enterprise-Level Open Source RDBMS?

Colm@TCD asks: "This is something I've run up against, and I'm sure that anyone else who's trying to bring Linux and Open Source software into 'enterprise' systems has encountered the same problem: Are there any OSS 'enterprise-quality' RDBMSs out there? Many (most) business setups require a fast, solid database engine to run back-end stuff. Fast SQL support isn't enough, though -- these systems need to be able to fail-over automatically to a backup machine in the event of the main system falling over. What are the best of the available systems? Are there ways of taking the existing systems like PostgreSQL and making them failover-able?"

7 of 177 comments (clear)

  1. Re:OSS is not a solution for every problem. by emerson · · Score: 5

    Operating System design has been a solved problem for about a quarter of a century. 18-year-old kids write OSes as freshmen projects now. There's an astounding amount of freely-available material out there, including the source to several OSes, to get you started on doing such a thing, and there has been for a long time.

    The graphical desktop environments you point out are still in the infancy stage, after 3-4 years of development. Although they're very cool, (as is Linux), both are basically aping existing art, and not very well in some cases. (Go on, moderate me down as flamebait and argue with me; this is not my primary point.)

    RDBMS design is relatively new in the long view of the art of Computer Science. The commercial vendors that pay thousands of people fulltime to develop them still don't have it down correctly. It's certainly not progressed to the point where it's textbook knowledge, where any fresh-faced CS grad can get in there and build one.

    I think the most telling evidence that this is beyond the current state of the art of Open Source, though, is the fact that it doesn't exist yet. It's not like this Ask Slashdot is the first time anyone's ever thought about free alternatives to the big boys of databases. It's a known need.

    And the "Cathedral and the Bazaar" theory of Open Source as 'developers scratching their own itch' would seem to mean that if it COULD be done, it WOULD be done. Intrepid hackers would have taken Postgres or the GPL'ed old MySQL or the like and had a project underway years ago if it were just that easy. Apparently, it's not.

    Open Source projects do not _innately_ scale well, merely from the fact of them being open. There are a lot of preconditions for a project to work. At least these three need to exist:

    1) A critical mass of developers see the need.
    2) That mass of developers (or a subset thereof) can be organized into a team to do the work.
    3) That mass of developers has the knowledge and skills to do the required work.

    As an example, there's a large set of people that would like an open-source engine to crack DES keys in real-time on commodity hardware. That takes care of 1. Getting a set of crypto hackers together to do this work would be a snap, cf OpenSSL et al. There's 2. But, the art isn't there. The general public, the mass of developers out there, simply don't have the knowledge or experience to do that. Maybe it can be done, maybe it can't, that's not the point. The point is that just saying "I'm starting an Open Source project to do [something really cool]" doesn't mean it's something you actually CAN do.

    So, back to the topic at hand, yes, RDBMS is something that's outside of the Open Source world at this point. The skills required to do such a thing are reporting for work at Redwood Shores and Emeryville and the like, and not willing or able to share their really-quite-esoteric knowledge with the free software universe.

    And pointing to a handful of other projects that are working doesn't prove that this project, or any other, will. Look around; the code's not coming, mostly because, no, there aren't many people out there who are up to the challenge. Or else they'd be doing it.

    The proof, as they say, is in the pudding.

    --

  2. OSS is not a solution for every problem. by mrsam · · Score: 5

    This is, perhaps, one of those situations where you are rather unlikely to find an OSS solution. An enterprise-level RDBMS is not exactly an overnight hack. It would take an enormous kind of time and effort to put an enterprise-level RDBMS.

    I'm not even sure that an enterprise-level OSS RDBMS makes sense. If you're talking about an enterprise-level product, you probably have the resources to afford to purchase a commercial RDBMS. Both Sybase and Oracle have decades of experience building robust, rock hard, RDBMSes that you can bet your company jewels on, and both companies have Linux versions of their RDBMSes. I'm not sure about Informix, they may have a Linux version of their database engine too.

    Right now, attempting to deliver an OSS RDBMS that can beat time-tested RDBMSes from Sybase and Oracle would be quite a challenge, to say the least. People will just have to accept the fact that OSS can't solve every problem in the world. It's certainly feasible that one day Sybase or Oracle might decide to release their respective RDBMS engines under an OSS license (which should certainly be quite a shock to many people), that's probably the most likely scenario.
    --

  3. Not at all unreasonable! by FallLine · · Score: 5

    The position that any Open Source, and most importantly "free", RDBMS could NOT hold a candle to Oracle (amongst others) where the needs of "enterprise" as concerned is reasonable. The development of Linux is (was), in many ways, the antithesis of what it takes to develop an equivalent to Oracle. Think about it:

    Firstly, kernel's are not all that complicated.

    Secondly, Linux's kernel is about as simple you can get.

    Thirdly, Linux, thus far, has had the advantage of being behind the curve in development. It has had the benefit of essentially just being able to copy features, design, and even some code. While the code may be from scratch, virtually nothing else is. In other words, Linux's code was built with 20/20 hindsight. Furthermore, it's done nothing particularly spectacular with that opportunity.

    Fourthly, the motive and feedback in developing an enterprise class RDBMS are vastly different. In developing Linux, there is something quite tangible, such that most "geeks" can get relatively instant gratification. Do geeks really personally need such a database (of course not, not even their hardware approaches this)? Contrast this with an RDBMS, if "they" make the system %.001 more reliable, by whatever measure, under heavy loads, how can they appreciate that? (never mind testing it) These things are about numbers. A company that does 1 million a day in sales with a broken database can't afford that %.001, but a geek screwing around developing a database in his spare time for his uses is not likely to experience such a problem. And even if he is, does that %.001 really mean anything to him? He can just reboot. Sure, you can say "given enough eyes all bugs become shallow", but ask yourself: Whose eyes, and how do these bugs become apparent? A minor bug in Linux, and a couple geeks have to reboot their PC (which presumably prompts them to patch it). Which company is going to volunteer to expose itself to such a bug---possibly millions of dollars in lost revenues (amongst other concerns). The scale of the two uses changes the nature greatly (likewise, empirically, Linux demonstrates such leanings--towards areas which most geeks appreciate, but not necessarily enterprise)

    Fifthly, the development process itself is vastly different. Because Unix, and particularly Linux, is quite modularized, it can be developed is a ragtag fashion. This is not true with an RDBMS. They're are very complex and interconnected.

    Sixthly, support is a major issue. Despite all that hype amongst the linux community about "support" it is absolutely undemonstrated. And when you consider issues such as proper documentation....

    Seventh, given some of the risks i've mentioned, what company is going to prefer to try to save 10k in software fees for the increased risk. The expected value in this case could very well be negative.

    ...I can think of more, but that is plenty of doubt right there. Think critically, not dogmatically.

  4. Database failover is only available commercially by Timothy+Dyck · · Score: 5

    I don't think you will find database failover in an open source product -- it's too specialized an area to have gathered interest from the necessary critical mass of developers.

    I cover databases for PC Week Labs, and in my view (and in my tests), the most sophisticated OSS relational database is PostgreSQL. In three months, though, that title will be held by InterBase. It has a more complete SQL implementation, has a (third-party) replication option and is generally more mature. However, it doesn't have any failover.

    The products that do provide failover are (non exhaustive list, but these are the main ones): Oracle Parallel Server (or Oracle with the Failsafe option if you want just 2 node failover), Sybase ASE, IBM DB2 UDB, Informix, Tandem NonStop SQL and Microsoft SQL Server. The cheapest option in terms of purchase price will be Microsoft SQL Server with Microsoft Cluster Server, but it only runs on NT and only supports 2 node failover, and does not cluster.

    Note that when you talk about failover (particularly with stateful connections), you need to also get failover-capable hardware systems. The databases mentioned above all need special, tightly spec'ed high availability hardware configurations from Sun, HP, etc. In general, a shared SCSI bus is required.

    You also need to decide what level of failover you need: (from simplest and cheapest to most complex, expensive and best):

    * cold standby (replicate to a standby server) -- no shared storage, mostly up-to-date, cheap, manual intervention required for failover

    * warm standby (shared storage or transactional replication to a standby server) -- ensures no data loss for all committed transactions, requires fault-tolerant hardware and a fault-tolerant dbms, failover is automatic but may take several minutes to roll the log forward, warm up the cache and reconnect users

    * hot standby (shared storage) -- no data loss for committed transactions, fast failover (30 seconds or less) -- this is a very specialized area, and you should have your dbms vendor work with you on setting up these kinds of HA setups. This is an ongoing area of research; Sybase 12 and Oracle8i 2.0 introduced new features specifically in this area to do things like pre-connect users and pre-warm the cache to speed failover times.

    If you want to have an OSS solution, I'd advise using PostgreSQL or InterBase and writing stored procedures in C to replicate inserts, updates and deletes to a second server, then coding reconnect and heartbeat logic into your front-end apps. This will be easiest to do with an app server since you have a single point of data access. This will not provide atomicity or durability though (unless the updates use 2PC) -- you'll need to run a consistency check on the db on failover.

    This is hairy, though. Overall, I advise paying for something that someone else has gotten working.

    Regards,
    Tim Dyck
    Senior Analyst
    PC Week Labs

  5. Interbase by JohnZed · · Score: 5

    Check out interbase.com or interbase.org. Interbase has been in use in enterprise environments for something like 15 years. It has very advanced crash recovery features and replication throught a commercial plug-in. Version 6.0 is under the MPL and in beta right now for Solaris, Windows, and Linux. No, I don't work for them, I just sound like that.
    --JRZ

  6. Interbase, Interbase, Interbase!!! by Dacta · · Score: 5

    Interbase v6 is in (free beer) beta right now, and will be released under the MPL (Mozilla Public Licence) by Mid-June

    You can download the beta now for Linux, Windows or Solaris from Interbase.com or The Interbase Developers Inititive

    Supports databases up to 32 TB (!!!) spread over 2GB files and is fast and reliable. (It's been around for 15 years.) The biggest known Interbase DB is over 200GB.

    It's is ANSI 92 SQL compliant (well.. as compliant as any DB I've ever seen - better than Oracle for instance), and the support is Amazing

    Join the email lists at www.mers.com, and you'll be able to get answers from Ann Harrison (the president of Interbase.com), or from a lot of other people - it's the best support I've even seen.

    As for enterpise features - well, apart from large DB support, it has row level locking, transactions, referential integrity, blobs, Multi-Generational commits, stored procedures... ummm... I can't think of what else to say.

    Basically, if your Databases are less than 50GB then Interbase is the number 1 choice - above that maybe Oracle would be better, but that isn't exacly open source.

  7. Oracle strategies generalized to other RDBMS by grantsucceeded · · Score: 5
    I am an Oracle DBA, working in Silicon valley, I've had a lot of experience with various flavors of failover. You can have all sorts of degrees of automation and rapid failover, but most of the really high end solutions are very complex, and the added complexity buys almost nothing in terms of uptime. The "sweet spot", IMHO, for Oracle is a "standby" databse, or simply a raid box that is physically attached to two unix boxes, but only one mounts it at a given time. Not a clustered or Parallel Server implementation. There are some places that will pay the complexity and $$ price to have parallel server, but it really only gets your downtime down from, say 30 minutes to under 5 minuts *IF* it works right, automatically. Even parallel server, where both systems are up at the same time, you'll still see half of your connections aborted when one side of the cluster goes down, and you'll have to redirect that half to the surviving node. All the while dealing with complexity of these things *every day*.

    Here's a good way to do it in oracle, that'll probably genralize to almost any RDBMS:

    The simplest way is to have two unix boxes, each physically attached to a raid unit (dont use raid 5, use mirroring for performance). Don't use clustering software, just mount the filesystems on only one machine at a time. If the primary system fails, then mount the filesystems on the secondary , startup the database, and you're on your way with minimum downtime. Script it and tie it to some ping or other fail detection strategy if you feel lucky, but in my opinion, keep a human in the loop to actually execute the switchover.

    Another great solution is a so called "standby database". You make a copy of the primary database to your standby machine, using your backup tapes. Then you start to "roll forward" the standby database by applyin all offline redo logs of the primary database, as it is generated. This method should work for any database that logs transactions including Sybase, Informix and probably MySql. The equivalent of a redo log in MySql appears to be the "update Log". but I have no experience with MySql. One big drawback of this solution though, is that even though a transaction has been committd to disk on the primary, the event may not make it to the offline redo logs before the system crash. So a standby database can only be within some delta-T of the primary database, the last fiew transactions before the crash are lost. Sometimes this is acceptable, but if you billed the customer then lost the order, maybe not. You tune the redo logs buffering parameters to trade of performance for small "lag time" of getting redo logs across to the standby database.

    The truth is that it takes 10x effort to get failover and cluster software correct, and 9 times out of 10, the automatic failover either triggers accidentally (bad news) or triggers correctly but fails to come up on the secondary. This is because you have to be incredibly scrupulous about keeping non-shared resourses in sync on the two machines and you can't test the failover (politically that is) until it actually fails. Heck, I just took a call this past saturday for this very problem: Site paid $$ for a bigger consulting firm to implement cluster, firm cant be reached on saturday when cluster trips but does not failover successfully. Customer calls me. I am to gracious to say I told you so.

    My free advice: Make two machines, not a cluster. Keep the two machines in sync, using the "standby technique", or make the raid unit accessable by both machines. Keep a human in the failover loop.