Is there An Enterprise-Level Open Source RDBMS?
Colm@TCD asks: "This is something I've run up against, and I'm sure that anyone else who's trying to bring Linux and Open Source software into 'enterprise' systems has encountered the same problem: Are there any OSS 'enterprise-quality' RDBMSs out there? Many (most) business setups require a fast, solid database engine to run back-end stuff. Fast SQL support isn't enough, though -- these systems need to be able to fail-over automatically to a backup machine in the event of the main system falling over. What are the best of the available systems? Are there ways of taking the existing systems like PostgreSQL and making them failover-able?"
With 1stPostDB running on slashdot you no longer need to worry about failover. Just post all your information to slashdot using the 1stPostDB and let the andover.net guys take care of the rest.
Actually, I've been doing a lot of work recently on trying to get linux db's to scale. The simplest solution, if you have a static db, is to just mirror the contents across several machines, running (depending on your needs) MySql or postgresql. Then, the linux virtual server works great to load balance across your new semi-parallel db. If you have changing data, such as syncing several machines, things get harder. LSDProject.sourceforge.net has one solution for syncing multiple DB's, by executing SQL statements in parallel, and logging failures, but more work is needed in this area (and if you want to help that project, it'd be appreciated). There is no solution in the class of oracle parallel server, etc, but if your needs are more on the reliability side with simple queries, the linux virtual server and mysql will outperform oracle, especially taking price into account. -Sean Ward (forgot my slashdot login ;)
Since I saw that many people are talking about transactions, I wanted to give a short view of what actually a transactions. People who are really interested should refer to the before mentioned book of Gray/Reuter "Transaction Processing" or even the original paper by Härder/Reuter (1981) which introduced ACID transactions. Since when you'r into computer science and databases and are talking about transactions you probably mean ACID transactions. ACID stands for: ATOMICITY: trannsactions either succeed (commit) or fail completely CONSISTENCY: a transaction transforms the DB from one consistent state into another (since transactions are atomic, the DB is logically always in a consistent state) ISOLATION: transactions are processed without the context of other transactions (logical one user system). Thus they are not interfering with each other DURABILITY: changes caused by transactions are durable (that's where recovery, logging and so on comes in) In order to ensure those properties a lot of research has been done and one can say that all basic things have been implemented. The hard part is of course efficiently implementing it, e.g. using B-Trees (or the new UB-Tree) for accessing and storing data efficiently. Logging and recovery mechanisms for ensuring durability. Locking mechanisms for isolation and also atomicity..... A database system actually offers almost the same depth as an OS, since it needs physical storage management, indexing, caching, query processing & optimization, parallel processing, distributed physical storage, distributed query processing, logging, fail saveness... So you might imagine that implmenting a good DBMS is probably the work of hundred of work hours. An effort which is almost comparable with creating Linux itself...
I faced a similar problem about one year ago.
I was annoyed with my proffesional collegues using Oracle 8i etc (excellent products) but not setting them up to facilitate their features. The 'use Oracle and don't worry about our databases fullstop' attitude is everywhere.
Anyway, to the point:
I used PostgreSQL with an additional layer underneath my application which done all selects on the same node as the process and all other operations on all nodes. There are a few tricks to getting this done, such as counters etc need to be setup to be multiples of, say, 100, with the cluster number added to that number. (this means if things screw up, nodes can operate independantly as much as possible then sync up when possible). I hope this makes sense, it's late. Other hacks, such as code to sync/check integrity etc databases after failing are relatively straight forward.
So, the bottom line:
Performance over n nodes only as good as one node (_slightly_ worse).... but then, hardware's cheap...
Reliability: We have had all sorts of things go wrong, but at the end of the day, when the macines are live again, we have never had a problem.
Im no expert by any stretch of the imagination, but with continuous incremental backups etc and a little common sence we are using PostgreSQL with a $$$ (8 digit) online business, and I can sleep at night, even though im personally responsible for the systems.
PS: Total expenditure on software for the system: $0 (we have/had the $ to purchase expensive software; OS solutions just came out on top)
Let me follow up to that.
Once upon a time I spent a year and a half developing applications for Illustra, the commercial follow-on to Postgres. Illustra had the same features as Postgres, plus SQL-compatibility, plus all of the standard SQL datatypes, while still including all of the ORDBMS features of Postgres. They even had support.
What Illustra didn't have, couldn't have, and what Postgres will never have is speed. The Object-Relational data model is not as efficient at doing plain Relational data work -- it can't be. Using Illustra/Postgres for normal relational databases is like trying to fight Mike Tyson with one arm tied behind your back -- you're playing the other guy's (in this case Oracle/DB2/Sybase) game, and giving him points to start out. Why is this important? Because all of the database APIs (SQL, ODBC, JDBC) only understand rows in tables with cursors, and all of the back-end reporting engines only cope with relational tools.
Illustra and Postgres are much more interesting for storing object data ... but the only way to take advantage of their capabilities is to use their special APIs. At which point you've thrown away portability and standards, and you've bought yourself a proprietary (even if it's open source) solution. It may be the perfect solution to your problem, but like a black hole, it will warp everything around it. Take replication -- replication is hard for ordinary RDBMSes to implement reliably; when you add object IDs to every row of every table in every database, you multiply the problem, since you can't just copy the rows directly from one database/table to another, for the same reason that you can't easily serialize a C structure containing memory pointers -- all of the pointers (object IDs) will be wrong.
I should really be talking about Illustra in the past tense -- the company got bought out by Informix several years ago, and Informix folded most of Illustra's technology into their Universal Server database. Dr. Stonebreaker (the creator of Ingres, Postgres, and Illustra) went on to become the VP of Technology for Informix.
We call it art because we have names for the things we understand.
Hey now, I'd never work *with* databases, but working *in* a database can be quite neat - in many ways it's somewhat like working in an operating system since you handle so many things (paging, data storage etc) at a low level.
I doubt it was K&R style code - it was probably written for the Borland C compiler and it's been on Windows for quite a while. It is certainly possible that it might have some ugly code. We can fix that.
Thanks
Bruce
Bruce Perens.
This post is, albeit to a lesser extent, in the same pointless vein as the one about buying Iridium and "open sourcing" it. Things that are boring, don't impress your friends, or don't scratch an itch typically will not exist open source. Programming RDBMS is (I imagine) highly theoretical, scientific work, and while a few CS wizards out there might get a kick out of it, such a project would never reach the critical mass needed to sustain development. It just doesn't meet any of those criteria.
At times people seem to get carried away with the Open Source phenomenon. We need to remember to take a step back and realize what an incredibly miniscule blip on the radar of innovation that OSS is, and that for a good three decades conventional development models produced some great things. Relational database code is one of them.
--
I think there is a world market for maybe five personal web logs.
MySql is neither Free nor is it (yet) an RDBMS
Actually older versions are GPLed. Newer version are free enough IMO: you can modify, redistribute and use at will, you're just not allowed to sell them for a profit (as in selling a product that would only work with it). So for me it's Free, if not in the general sense. Then, Monty in an interview hinted that he was trying to have it fully FREEed (Open Source'd).
I am designing a data-mining system at work, and I believe that this far surpasses Linux's ability to scale. The problem is, Linux only supports files up to 2GB in size(4GB with reserf), and MySQL only supports 4GB tables. I estimate that several tables will exceed 20GB in size, and breaking them up into smaller tables will be highly inefficient. We've just pushed Linux as far as it can currently go.
So A big Sun box and Oracle are going to be the back end. Even Oracle on Linux runs into the filesize limit, because of the lack of raw device access in the kernel.
Sometimes, free software isn't the way to go. You have to pick the right tool for the job.
we know it will stand up, and if it doesn't, there are people to sue.
Please re-read your licensing agreements with these respective vendors. You can't sue them, you can't even think about suing them. With oracle, I think they can actually have all of your children killed if you think about suing them and they find out.
It's amazing how long elementary FUD like this can abound, especially after it's been refuted countless times in OSS diatribes. Who do you know who has ever sued Microsoft for lost data and won? Have you ever heard of Sun getting busted up over a hardware failure? It is EXTREMELY difficult to prove that the system failed in the first place (who's to say you configured it correctly? not the experts from Oracle, that's for sure), and even if you can, you agree to a LOT of things when you put that software on your server.
I'd trust Postgres against Linux any day (and YES, I do work at the "enterprise level") simply because at least they're straightforward about there being no-one to sue. If I really needed to get hard-core (not too many machines, lots of RAM and processor, zero tolerance for failure) maybe FreeBSD would fit the bill.
Most likely though, I would be averse to anything that involved x86 hardware for real reliability. So it would probably be NetBSD.
Glyph Lefkowitz - Project leader, Twisted Matrix Labs
Writer, Programmer - Not a member of the TSU
Now way back in the glory days of that last great war, way back in 19 and forty something, there were two developments which have been entangled down to this very day, causing no end of grief while each tries to ensnare and evade the other. I am referring, of course, to the digital computer and the Office of Strategic Services (OSS), which has morphed into the current CIA and NSA.
I put it to you: is it just a coincidence that we have here a claim that Micro$oft coined the term? Or is there some deeper conspiracy that has been in progress these many years?
Or is it just another example of Micro$oft falsely claiming innovation?
--
Infuriate left and right
You're looking for InterBase. While it's not yet Open Source, IB 6.0 will be released under MPL 1.1. While they sanitize their code, you can get free (beer) beta releases of InterBase 6.0 for Linux, Windows, or Solaris from http://www.interbase.com at http:// www.interbase. com/open/downloads/60beta_downloads.html. There are RPMs and Tarballs there, and Debian packages are forthcoming.
You can also check out the community springing up around InterBase at http://www.interbase2000.org.
From my own personal experience with InterBase, I can tell you that it is most certainly an enterprise-grade database, with amazing speeds and all the features you could ask of a modern DB.
Postgres seems quite robust, but slower than MySQL. If you can live with the slight speed hit for the cool features (views, and stored procedures come to mind) it's well worth it.
Postgres works great with PHP, btw.
--
a) My usage is far from incorrect, though in more formal writings i'll frequently use other forms.
b) Slashdot is a casual discussion forum; colloquial language is generally fine, so long as the writing remains coherant.
Nonetheless, I generally agree that the bulk of slashdot writing is piss-poor, even for this forum.
...gotta run
It might be impossible to answer with absolute certainty, but nothing in this world is absolute. Would you say it's similarly impossible to know with any real certainty that a group of, say, medschool grads from Harvard are going to engineer a chip that can compete head to head against the latest offerings from Intel (et. al)? This of course, is just a random example, but you should get my point. You can accept perhaps one or two of them having the desire, but the odds of a whole community of them getting together to create something quite complex, and outside of their range of experience, for a long duration, are very slim.
That being said, I agree with your later point.
What sets Oracle or IBM DB2 apart is that they have large user communities, and that means a lot of people trained in using them, a lot of third party add-ons, a lot of "how-to" books, etc. The real selling point of the commercial enterprise level RDBMS systems is not software, it's market position and widespread use.
Whether that kind of support is something that matters to you enough to pay the steep prices that those systems cost, you have to decide for yourself.
The only system that comes close to Oracle or DB2 in terms of popularity is MySQL. If you can live with its limited feature set in your application, consider using it.
In the long run, hopefully some open source database will become as popular as the big commercial guys. The most likely path to that is no an Oracle clone, but the development of a new kind of database paradigm and popularizes it with an open source implementation.
What you want is atomicity. However, rollback sort of goes hand in hand with it. If you try to group together a select and a delete from one table with an insert into another table and your insert should fail for any reason, the dbms must roll back the select and the delete. While I agree that transactions (atomicity) would be nice the developers will need to implement at least an internal rollback mechanism in order for it to work. I'm sure it's been mentioned here already but do check out PHOTO.NET. Philip explains it all and he's funny too.
Hmm, an SQL database without GROUP BY.
bzzzzt. . . . next . . .
-------
Bill Gates Is My Evil Twin.
So, we have an enterprise-class RDBMS on a world-class operating system with RAID and failover capability, and several world-class organizations supporting it. Sounds like a Real World Solution to me.... So it doesn't scale into the terabyte range. How many folks really need that? and how many folks are employed by small businesses?
--
"We came, we saw, we kicked its ass!"
-- Bill Murray, "Ghostbusters"
> Slashdot is filled with different ppl very opinionated
Yeah, I noticed.
> Linux is not where your major enteprises turn when they need 24/7 huge scalable solutions.
Some do, though I suppose you could define "major interprises" to filter them out.
> this according to Gartner Group
O Bastion of Bullshit. And PHB Fertilizer.
You're right in that some of the commercial solutions are still better. How long that will last, anyone can tell. But what makes you think OSS can't provide a RDBMS that is at least useful, and useful for lots of people?
> It takes more than opinions to form valid statements but it takes less than an minute to spout something on Slash.
[This tart retort intentionally left blank.]
--
Sheesh, evil *and* a jerk. -- Jade
Apple's Enterprise Object Framework is the best way I've seen. There's also the free GNUStep alternative GSDB. The downside with the GNUStep version is that there aren't a lot of drivers (postgress, sybase, and MSSQL at this point). EOF/GSDB completely abstracts everything and presents a true OO view of your data with no queries, transactions, or anything to worry about, the framework handles fetching and resource management.
Excluding that, your best bet is to keep operations on your data source sperate. Thing's like Perl's DBI are just a API abstraction and not a Database abstraction. The way around this is to seperate out all your data fetching functions so that all your app does is call things like fetchCustomerList(start, end). That way to switch DB's you just create a new DB module for the new DB and your application could care less.
I gave up dB work a year or two ago. Maybe it was starting with MS Access and enjoying its compact, RAD, toy quality. Unscaleable and inapproriate for Enterprise stuff of course (and making no pretence to such garlands), but fast, small (well, the .mds were small if you compacted them) and fun.
The problem was the inevitable upgrade lead me to MS SQL Server 6.5 (I was stuck on NT at the time). SQL Server had a lot of stuff I was missing in Access - triggers, stored procedures, scaleability - but it also brought a lot of frustrations. The domain aggregate functions were poorer than those offered by Access, which was a pain as I was trying to roll my own OLAP before it all got proprietarized into a Babel of different buyouts and skill-subsets. Its big-iron feel didn't stop it having a ludicrous 255 byte limit on varchar fields. For bigger you had to futz about wastefully amalgamating BLOBs of text with READTEXT/WRITETEXT. Plus it was grotesquely high-maintenance. I didn't want to become a DBA. I just wanted to hack SQL. And wasteful. The 'devices' ('Honey, I bloated the database') were huge and couldn't be shrunk, no matter how svelte the actual data.
By the time I escaped the NT shop I was naively looking to Oracle to save me from these frustrations. Unfortunately, I'll never know the joys or horrors of that particular 'platform', because at 600 Mb for the Linux installation I just bailed out and cried 'Enough of this grotesquely bloated crap!' and pursued XML or BerkeleyDB solutions to anything remoteley persistence-flavoured thereafter. I knew that fast and small were synonyms, but the vendors were growing fat on the antonym line and there was nothing I could do about it. Even MySQL and PostgreSQL were part of the problem. They're all emacs. None of them are vi.
Recently I ran my periodic, wishful, wistful Google-grep for 'fast', 'small' and 'rdbms' and found myself, after rejecting Brian Jepson's TinySQL as ridiculously small and cute but strictly pedagogical, finally discovering The One.
Hypersonic SQL, a tiny Open Source Java database weighing in at less than 100K, supports correlated subqueries, transactions, referential integrity, indexes, stored procedures and JDBC - everything basically, but GROUP BY, cursors and triggers. I never used cursors myself. I'd rather iterate in Perl. The other two, admittedly are fondly missed, but not life-threatening. It doesn't support failover. But with such a small, developer-friendly codebase anything's possible.
Did I say 'Perl'? 'But it's a Java database', I hear you cry. How can this beast talk Perl? Well, it can't, which is why I'm working on a Perl DBI interface to it talking to a native driver over TCP/IP. If anyone wants to contribute (I wouldn't need to hack it if some brave soul wants to polish up the languishing JNI module in JPL to support embedding Java in Perl on Linux (currently it only works for Win32)), please get in touch. I'm almost certainly way out of my depth and entering a world of pain.
Oh, did I mention? It's 7 times faster than SQL Server.
You need DBD::Multiplex, which is part of Perl's
DBI. It allows you to do round-robin connections,
among other things.
I don't think you're comparing apples with apples Tim. Postgresql will have V7.0 released way before 3 months time at which point the SQL should be equal or superior in standardisation to Interbase. OTOH, who actually knows where Interbase will be in 3 months? It might be nowhere fast.
And in terms of "advanced", Interbase will probably never catch PostgreSQL. PostgreSQL is the king of features in RDBMS with its ORDBMS features. Interbase may one day be faster, but more featureful or advanced, I doubt.
Frontbase looks pretty nice - I'd never heard of it, though. Although... it does look a lot like some product the OpenLink ODBC people had. Is it related in any way?
I'm not too sure about the benefits of full ANSI 92 compliance.
For sure, standards are a good thing, but if no one supports them, what's the use?
For instance, take Sybase & Oracle & SQL Server. For simple select, insert & update queries the SQL is farily portable. Once they get a bit complicated you'll probably need to do some fixing to get it to work.
But that isn't the major problem with porting between databases. The real "problem" is the way different DBs have different strengths.
For instance, on Oracle you use cursors all the time, and they work really well. They are fast, efficent and a great way to do some thing.
Try porting them directly to SQL Server or Sybase, and your server will die really quickly - they just don't support the same number of similtanious cursors that Oracle does. Even if they were all ANSI SQL 92 compliant, it won't fix problems like that, which are much more difficult to fix than a few syntax differences.
What is the difference between cold standby & a database server with replication facilities?
I'm thinking of Interbase & Replication, or even Interbase and the database mirroring it has built in. Is there anything else needed for it to be classed as cold-standby at least?
I'd think it would be pretty close to warm standby if you used it with some kind of heartbeat monitor, wouldn't it?
I agree there is a fair way to go before Interbase supports hot standby, though.
Interbase has been available on many non-windows platforms for a long time (15 years+) now. On Win32, they use MS VC++ to compile it.
The code was never a mess like people have been saying, but it (apparently) has a fair amount of conditional defines for various platforms. That's the stuff they are fixing now, adn why it is taking so long.
This is an email Ann Harrison (president of Interbase) posted on a list back in January:
Sorry the formatting is so screwed up! I hope it clears up some misconceptions, though.
BTW, it's good to see you're not so cynical about Inprise any more, Bruce. I was their sole defender on Technocrat when you posted that story on them a couple of months back.
I've done a fair bit of stuff on SQL Server, and while it is limited in that it can only run on NT, it is so much better than Sybase (for instance) that it isn't funny.
For instance, I'm using Sybase Adaptive Server 11.92, and guess what? No row locks, non-ANSI join syntax, and you can't seem to have outer joins where you filter the joined table by another (non-join) field.
SQL Server has wonderful development tools, too - like Query Analyzer. After spending the first year and a half of my working life trying to work out Oracle query plans, MS Query Analyzer was such an amazing revalation.
You are right that there a lot of (surprisingly) big enterprises relying on SQL Server. I think version 7 was a big step forward in this area.
SQL Server still can't match Oracle on joins of more than four table with lots of data, though - of course, not much can really.
Yeah... I know what you mean.
They do seem to be very dedicated, though.. they listened to the community on the licence and everything.
I doubt there will be any problems, but it will be nice when the source is out.
Interbase is a RDBMS, and it is free (beer) now and will be Open Source (MPL) soon (June).
I'm pretty impressed! It seems a pretty nice bit of work - joins seem to work okay (I didn't try any out joins, and there is no ANSI 92 syntax, though)
I think the benchmark is crap, though - simply testing how fast you can create a table and an index, and then do an insert might be fair enough, but testing select speed by doing a select * from that table is a bit of a joke. I'd love to see how it performed with a 4-way join and lots of data.
That shouldn't be seen as a flame, though. I'm really, really impressed with this as a whole.
If we don't have an obnoxious Corporate eXecutive iMission Critic@l eAcronym (oCXiCeA) for OSS, then OSS is not suitable for obnoxious Corporate eXecutive iMission Critical eTasks (oCXiCeTs). Several Corporate eXecutive iMission Critic@l eConsultants (CXiCeCs) have demonstrated that OSS should be deployed in the eCorporation within several quarters, or iTimeUnits as they are now called. This will enhance our VC-funded eCorporation's rapid execution of oCXiCeTs.
Furthermore, numerous CXiCeCs have given me an executive summary about OSS, from which I have learned that because OSS is all capital letters, it will fail to succeed in eBusiness, E-commerce, and/or iWare deployment. One eConsultant suggested a paradigm shift to oSS, while another one suggested O-ss. But the latter was actually an E-consultant and not an eConsultant, and he hardly ever uses the IT acronym, so I called an eXecutive iMeeting where we all met in a conference room where we reached a consensus that the so called "E-consultant"'s credibility is lower.
Seriously, I agree with Bruce. "OSS" is highly obnoxious. Don't use it.
So is eAnything, E-anything, iAnything, and many more. Terms like those reek of marketing, rich presentation (read: content free tripe), and all those other things which make the web, computing, and many other things less fun. (Maybe Bruce agrees with that point too, but I wouldn't want to speak for him. He's been misrepresented enough as it is.) Whenever I see eFoo or E-Bar, I turn the other way as fast as I can. You should too.
You're a suburbanite.
There is replication code available for Postgres at ftp://ftp.sdc.com.au/pub/repl/ I've never used it myself, but they mentioned it on the general list a while ago.
according to numbers I saw on the web Postgres is about 20% slower than Oracle for TPC tests.
Not very far off, especially when considering that Oracle probably optimizes their DB specifically concerning benchmarks.
Have no link, sorry.
Oracle et al however have some edge or two regarding reliability features (and features in general).
I have a similar solution. I'm using mysql to develop a large-scale commecial website with gnujsp and apache jserv.
:)
My problem is this:
This is a database of commercial real estate data. There is really only one main table, but it has lookup fields that reference as many as 7 other tables, most with funky outer joins. Performing even simple queries on it is horrific.
My hack is this:
Every night, at 3am, do a GIGANTIC 7-way merge of all the relevant tables, and insert the results into a MAMMOTH SINGLE TABLE, the cache table. Lookups are lightning fast, and the data is at most 24 hours old (this is reasonable in my case). Building the cache actually only takes 15 seconds or so --- abominable if you're waiting for a web page to load, but no so bad at 3am
This technique may or may not be available to you, depending on your circumstances. I find it to work quite well with mine.
- jonathan.
Not all DSS environments are/should be read-only. My employer deploys one of the largest (if not the largest) read-write DSS client-server application.
;-)...
Transactions/Rollback are not critical (we have our own rollback solution over Acumate, made by Kenan) but in a read-write environment, they sure are good to have around.
OTOH, you're right; in a DSS environments reads are always way more common than writes, so a rollback layer could actually be preferrable to a built-in transactional functionality.
So, forget about transactional control, it's not even essential; are there *any* OSS OLAP apps out there? they sure would make a nice toy
engineers never lie; we just approximate the truth.
While I agree with almost everything you said (including the part about Linux being an excellent imitator and a rare innovator), I am tired of seeing the same premise being used again and again wrt OSS: you don't necessarily need a buncha 'lone hackers' hacking away at a problem.
Let's suppose that a fairly good-sized company came out with an open-sourced RDBMS/OLAP framework, that has all the groundwork to be fast and extensible. I am sure there are a few companies out there that would benefit to contribute to such a project for their own uses... Why can't OSS act as a framework and collaboration scheme, instead of just a toy for hobbyists?
engineers never lie; we just approximate the truth.
Imagine the following scenario: You, client C, initiate a service/transaction with server A. Server A crashes. Your service/transaction fails over to server B. However, server B knows nothing about what's been going on in your stateful connection, and is thus unable to help you.
The only solution that you can have within a database for failover is in the case when you have a stateless connection, and/or full state replication across the cluster of server machines. However, this alone is going to almost certainly bog down any of your transactions to the point where you don't want to continue anymore!
A company which has done this extremely well is the old Tandem, with their NonStopOS and NonStopSQL. Their mastery of failover in transactions was superb, and quite a bit of wall street is still running their stuff, even though Compaq has been quite unable to successfully market their services now (NonStopNT.... that's a complete oxymoron!).
I suppose you COULD implement the type of service you're talking about with failover stateless connections, but if you're going to do that you're much better going with an app server in the middle, which handles failover of connections in an EJB type model, and use the features of the app server to handle this.
So to sum it all up: For simple services like RPC and HTTP and FTP and whatnot, which are stateless (or very nearly so) by their nature, simple failover is fine. For extremely stateful services (like a database system), you've got to use much more complicated measures.
How about an Open-Source Object Database? I've been fooling around with Cache' on Linux for a week or so, and it seems nice. I think OODBMSes are the way to go for web applications, especially ones that make any use of XML. Any Open Source ones?
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
While they don't mention database servers specifically, "Generic service" indicates that they've figured out some way (probably similar to watchdog coupled with some form of ARP spoofing) to fail a down server over gracefully. Which leaves data replication.<P>
Would this work in conjunction with a simple mirror program? What I'm thinking of is... <P>
A small daemon that runs (and listens) on the 'master' box. When a database connection is initiated it reads the commands sent by the client and sends them on to the 'master' database server, and again to a secondary server.<P>
Both the master and secondary server have current copies of the data, and then leave the actual failover to whatever RedHat has concocted (still downloading the
geek. lawyer.
Umm, I'm an Oracle DBA, is that considered "enterprise level"? I read /. every day. Oracle is an excellent product, but I do think that it would be great if there were an open source alternative. At work, we are currently sitting around waiting for Oracle to release some bugfixes that impact our application. If it were open source, it would probably be fixed by now. Plus, imagine all the neat features you could add!!! I would definitely get involved with that project, if it ever came about.
Enterprise database applications are subject to FUD that runs far deeper than what we see in the desktop OS competition. I implement enterprise datawarehousing solutions for a living, and I see that most clients will ONLY accept an implementation that's built on the RDBMS they already use in house.
You can open source the operating system, or the desktop environment, because your test bed starts out as hobbyists (the early adopters). Good luck finding the first company that will take the plunge and bet their entire transactional system on an untested platform, when they can pay for the insurance of an Oracle or a SQLServer.
-JTB
You mean, for instance, an operating system?
Or maybe a graphical desktop environment or two?
How terribly shortsighted. I'm sure there are many people out there who are up to the challenge.
Besides, what you're basically saying is that open source projects don't scale very well, and all of the above examples disprove that.
---
How am I supposed to fit a pithy, relevant quote into 120 characters?
RDBMS is a very mature, flexable technology. SQL (& RDBMS) does have some shortcomings, but then every tool does. For many (most?) typical business applications, RDBMS is the tool of choice.
Because modern RDBMS's are so pervasive and accessable, many people forget that designing databases and writing SQL is a specialized and difficult art. A well-designed database can outperform a poorly-designed one by an order of magnitude or more. When people talk about RDBMS's having poor performance my first instinct is to question the db design and not the engine. Sadly, in my experience, most programmers treat the database almost as an afterthought. (I design databases for a living, in case you havn't guessed)
"The axiom 'An honest man has nothing to fear from the police'
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
DB2 should definitely be added to the list of "bet the company's jewels" RDBMS products. This is the database of choice for many banks and financial institutions. BTW, the company that provides data processing for roughly 40% of *ALL* mutual funds stores *your* data in DB2 (mainframe flavor) -- I used to work for them.
Erlang (www.erlang.org) is open-sourced sw that Ericsson use internally. This has a "production quality" DB called felicitously mnesia. It has failover, replication, very easy to use transactions, ... It's not relational but the programmer's interface is very easy. There is an rdbms front-end advertised in the erlang contrib area - don't know how good that is.
Erlang deserves to be better known: it is one of the biggest and best open source projects.
It works, it's not beta i've used it on a cluster of 6 machines. Course it's going to work for me since i wrote it. The mysql guys are working on replication support themselves.
Added to which, you can get excellent support (at a price) for these products, which may ultimately be worth more to you than anything else.
Replication generally does not take place on as current a basis. Replication can be done with any currently-existing RDBMS simply by adding a "last changed" timestamp to each record, and then occasionally querying out all records before a certain timestamp and sending those records to the remote server. Replication thus ensures that you have a snapshot of the data at some point in time, but not necessarily an up-to-date version of that data.
Those differences are why replication can be done by third-party software, while "real" standby requires specific support at the RDBMS level. I have done replication with technology as crude as dBase II by explicitly programming it into my applications. Standby, on the other hand, is a different issue.
Finally: For small businesses, none of these particularly matter. Small businesses don't have the money for big replication servers and such. They rely on an older version of replication: they have a paper copy of everything that goes into the computer, and make nightly backups (small businesses, unlike large businesses, don't run 24/7 and thus the RDBMS is stable at that time of night, they don't need the ability to do "hot" backups). If the database crashes and eats their data, they restore it from tape, then manually type in any transactions that weren't on the tape (using their paper copies). PostGreSQL is already "there" as far as small businesses are concerned, from a price/feature standpoint. PostGreSQL's only real issue is performance, especially performance when adding records, which is pathetically slow (using transactions helps, in that at least you don't get a sync for each "insert" but rather a single set of syncs at the "end transaction" time, but it's still less than 1/3rd the speed of MySQL even if you turned off the syncs altogether and ran it in the dangerous "async" mode). However, most small businesses don't need to add 200,000 records in a short period of time... I can add 200,000 records in little over 3 minutes with PostGreSQL on modest hardware, which is pretty pathetic, but far beyond the needs of a small business.
-E
Send mail here if you want to reach me.
By the way, "OSS" is an acronym coined by Microsoft, and one I've never liked. Can we please call it "Open Source"?
Thanks
Bruce
Bruce Perens.
Hey I'm developping a B2B exchange portal as well. I'm using MySQL. But damn, the thing sucks for everything more than simple selects. Plus selects are sometimes NOT optimised at all. Computing some basic stats on the content of the DB brings it down to its knees. I had to add another server duplicating the content in a very hackish way so that users would'nt be slowed down to a crawl every time we had to compute the stats (often). Duh.
I've done some research on this topic, and by far the most open/tested OSS RDBMS out there is PostgresSQL, which is actually based on the same tree from which a commercial RDBMS, Ingres, was spawned. However, I do not believe that Postgres is solid/fast enough for enterprise level applications yet (mind you, this is from reading a lot of Usenet and Web opinions on it, not personal experience, so YMMV).
Since I am a DSS ("Data Mining") guy though, I am more interested in an extension to your question? is there any decent OSS OLAP/DSS DB system out there that can handle multiple dimensions, transactions, partitioning, ? If not, how many people would be interested in one?
engineers never lie; we just approximate the truth.
Christopher Browne has a good reference page for this. I stumbled across it while researching possible database solutions for one of my projects, and it seems to be well maintained and updated. The page can be found at: http://www.ntlug.org/~cbbrowne/rdbms.html
-- Still waiting for the Nike endorsement
First off, databases look for connections to a port, just like any other service like httpd, smtp or ssh. Doing things so that you have failover and/or load-balancing on *that* is easy. There are a large number of solutions, from application level to library level to dedicated hardware to general purpose software (LVS).
That said, database failover is *not* easy.
The basic problem is that you don't just read from a database, you read and write to the database. Database replication is a very non-trivial thing to do. You have to deal with the possibility of data being inserted, updated or removed from any of the servers that are involved. This requires that you have things like global locks, timestamps on everything, primary keys that are generated in a way unique to each replicated version of the database, etc.
Most of the work of replication is handled very nicely by the big commercial databases (but you will find that they want you to pay for it.), but even with those you might have to make minor changes in your application (or maybe just the database structure) in order to handle replication properly...
AFAIK, there's nothing you can get for free that will be able to handle any kind of real-time n to n (or 1 to n and n to 1 simultaneous, even) replication. You can do stuff like have mysql dump a log of everything that's done that changes the database, and then import those changes into another mysql instance someplace else; but that's nowhere near what the big commercial databases can do when it comes to replication.
Given that replication is gonna cost real money, just get your free database running on a good-quality machine with good backups (that you make sure include the database) and maybe things like redundant power supplies. You can make a really robust box for much less than Oracle will charge you to license 2 instances with replication.
(Or hire a team of 10 database programmers to spend a year or 3 adding replication to postgresql. Well... maybe they could do it wrong in a month -- I'm trying to give a wild estimate for doing it vaguely close to right.)
Sorry.. I forgot to talk about failover.
There is a third-party, non-open source replication tool available fro Interbase 5 and 6 that will handle that data replication stuff you need for failover. The actual server switching I think you might have to write yourself... I suspect you might find there are a lot of people willing to help you.
http://www.freshmeat.net/appindex/2000/01/15/9479
It's a toolkit for clustering MySQL, which may or may not provide the failover facilities that you need. It looks beta, but may work... I haven't played with it yet. And of course, this is only if you're willing to use MySQL. Looks fun, though.
Otherwise, if you're doing serious enough work to need serious RDBMS backend, I hope you're making enough serious dough to afford a serious RDBMS. But before I invested big bucks into a failover setup using a commercial RDBMS, I'd cover my other bases first. Most DB failures I've seen were not a fault of the RDBMS itself, but hardware, power, diskspace, OS, human error, etc. The only time I've seen an RDBMS crap the bed at work, it was Micros~1 SQL Server and you should know better than to buy that. With our Sybase machines, it's always NT that takes down the box, not Sybase. Get these bases covered (hi-quality hardware, reliable power with a big UPS or generator, non-Micros~1 OS, and a DBA that knows what they're doing) and you most likely won't need a failover system.
Operating System design has been a solved problem for about a quarter of a century. 18-year-old kids write OSes as freshmen projects now. There's an astounding amount of freely-available material out there, including the source to several OSes, to get you started on doing such a thing, and there has been for a long time.
The graphical desktop environments you point out are still in the infancy stage, after 3-4 years of development. Although they're very cool, (as is Linux), both are basically aping existing art, and not very well in some cases. (Go on, moderate me down as flamebait and argue with me; this is not my primary point.)
RDBMS design is relatively new in the long view of the art of Computer Science. The commercial vendors that pay thousands of people fulltime to develop them still don't have it down correctly. It's certainly not progressed to the point where it's textbook knowledge, where any fresh-faced CS grad can get in there and build one.
I think the most telling evidence that this is beyond the current state of the art of Open Source, though, is the fact that it doesn't exist yet. It's not like this Ask Slashdot is the first time anyone's ever thought about free alternatives to the big boys of databases. It's a known need.
And the "Cathedral and the Bazaar" theory of Open Source as 'developers scratching their own itch' would seem to mean that if it COULD be done, it WOULD be done. Intrepid hackers would have taken Postgres or the GPL'ed old MySQL or the like and had a project underway years ago if it were just that easy. Apparently, it's not.
Open Source projects do not _innately_ scale well, merely from the fact of them being open. There are a lot of preconditions for a project to work. At least these three need to exist:
1) A critical mass of developers see the need.
2) That mass of developers (or a subset thereof) can be organized into a team to do the work.
3) That mass of developers has the knowledge and skills to do the required work.
As an example, there's a large set of people that would like an open-source engine to crack DES keys in real-time on commodity hardware. That takes care of 1. Getting a set of crypto hackers together to do this work would be a snap, cf OpenSSL et al. There's 2. But, the art isn't there. The general public, the mass of developers out there, simply don't have the knowledge or experience to do that. Maybe it can be done, maybe it can't, that's not the point. The point is that just saying "I'm starting an Open Source project to do [something really cool]" doesn't mean it's something you actually CAN do.
So, back to the topic at hand, yes, RDBMS is something that's outside of the Open Source world at this point. The skills required to do such a thing are reporting for work at Redwood Shores and Emeryville and the like, and not willing or able to share their really-quite-esoteric knowledge with the free software universe.
And pointing to a handful of other projects that are working doesn't prove that this project, or any other, will. Look around; the code's not coming, mostly because, no, there aren't many people out there who are up to the challenge. Or else they'd be doing it.
The proof, as they say, is in the pudding.
--
This is, perhaps, one of those situations where you are rather unlikely to find an OSS solution. An enterprise-level RDBMS is not exactly an overnight hack. It would take an enormous kind of time and effort to put an enterprise-level RDBMS.
I'm not even sure that an enterprise-level OSS RDBMS makes sense. If you're talking about an enterprise-level product, you probably have the resources to afford to purchase a commercial RDBMS. Both Sybase and Oracle have decades of experience building robust, rock hard, RDBMSes that you can bet your company jewels on, and both companies have Linux versions of their RDBMSes. I'm not sure about Informix, they may have a Linux version of their database engine too.
Right now, attempting to deliver an OSS RDBMS that can beat time-tested RDBMSes from Sybase and Oracle would be quite a challenge, to say the least. People will just have to accept the fact that OSS can't solve every problem in the world. It's certainly feasible that one day Sybase or Oracle might decide to release their respective RDBMS engines under an OSS license (which should certainly be quite a shock to many people), that's probably the most likely scenario.
--
The position that any Open Source, and most importantly "free", RDBMS could NOT hold a candle to Oracle (amongst others) where the needs of "enterprise" as concerned is reasonable. The development of Linux is (was), in many ways, the antithesis of what it takes to develop an equivalent to Oracle. Think about it:
Firstly, kernel's are not all that complicated.
Secondly, Linux's kernel is about as simple you can get.
Thirdly, Linux, thus far, has had the advantage of being behind the curve in development. It has had the benefit of essentially just being able to copy features, design, and even some code. While the code may be from scratch, virtually nothing else is. In other words, Linux's code was built with 20/20 hindsight. Furthermore, it's done nothing particularly spectacular with that opportunity.
Fourthly, the motive and feedback in developing an enterprise class RDBMS are vastly different. In developing Linux, there is something quite tangible, such that most "geeks" can get relatively instant gratification. Do geeks really personally need such a database (of course not, not even their hardware approaches this)? Contrast this with an RDBMS, if "they" make the system %.001 more reliable, by whatever measure, under heavy loads, how can they appreciate that? (never mind testing it) These things are about numbers. A company that does 1 million a day in sales with a broken database can't afford that %.001, but a geek screwing around developing a database in his spare time for his uses is not likely to experience such a problem. And even if he is, does that %.001 really mean anything to him? He can just reboot. Sure, you can say "given enough eyes all bugs become shallow", but ask yourself: Whose eyes, and how do these bugs become apparent? A minor bug in Linux, and a couple geeks have to reboot their PC (which presumably prompts them to patch it). Which company is going to volunteer to expose itself to such a bug---possibly millions of dollars in lost revenues (amongst other concerns). The scale of the two uses changes the nature greatly (likewise, empirically, Linux demonstrates such leanings--towards areas which most geeks appreciate, but not necessarily enterprise)
Fifthly, the development process itself is vastly different. Because Unix, and particularly Linux, is quite modularized, it can be developed is a ragtag fashion. This is not true with an RDBMS. They're are very complex and interconnected.
Sixthly, support is a major issue. Despite all that hype amongst the linux community about "support" it is absolutely undemonstrated. And when you consider issues such as proper documentation....
Seventh, given some of the risks i've mentioned, what company is going to prefer to try to save 10k in software fees for the increased risk. The expected value in this case could very well be negative.
...I can think of more, but that is plenty of doubt right there. Think critically, not dogmatically.
I don't think you will find database failover in an open source product -- it's too specialized an area to have gathered interest from the necessary critical mass of developers.
I cover databases for PC Week Labs, and in my view (and in my tests), the most sophisticated OSS relational database is PostgreSQL. In three months, though, that title will be held by InterBase. It has a more complete SQL implementation, has a (third-party) replication option and is generally more mature. However, it doesn't have any failover.
The products that do provide failover are (non exhaustive list, but these are the main ones): Oracle Parallel Server (or Oracle with the Failsafe option if you want just 2 node failover), Sybase ASE, IBM DB2 UDB, Informix, Tandem NonStop SQL and Microsoft SQL Server. The cheapest option in terms of purchase price will be Microsoft SQL Server with Microsoft Cluster Server, but it only runs on NT and only supports 2 node failover, and does not cluster.
Note that when you talk about failover (particularly with stateful connections), you need to also get failover-capable hardware systems. The databases mentioned above all need special, tightly spec'ed high availability hardware configurations from Sun, HP, etc. In general, a shared SCSI bus is required.
You also need to decide what level of failover you need: (from simplest and cheapest to most complex, expensive and best):
* cold standby (replicate to a standby server) -- no shared storage, mostly up-to-date, cheap, manual intervention required for failover
* warm standby (shared storage or transactional replication to a standby server) -- ensures no data loss for all committed transactions, requires fault-tolerant hardware and a fault-tolerant dbms, failover is automatic but may take several minutes to roll the log forward, warm up the cache and reconnect users
* hot standby (shared storage) -- no data loss for committed transactions, fast failover (30 seconds or less) -- this is a very specialized area, and you should have your dbms vendor work with you on setting up these kinds of HA setups. This is an ongoing area of research; Sybase 12 and Oracle8i 2.0 introduced new features specifically in this area to do things like pre-connect users and pre-warm the cache to speed failover times.
If you want to have an OSS solution, I'd advise using PostgreSQL or InterBase and writing stored procedures in C to replicate inserts, updates and deletes to a second server, then coding reconnect and heartbeat logic into your front-end apps. This will be easiest to do with an app server since you have a single point of data access. This will not provide atomicity or durability though (unless the updates use 2PC) -- you'll need to run a consistency check on the db on failover.
This is hairy, though. Overall, I advise paying for something that someone else has gotten working.
Regards,
Tim Dyck
Senior Analyst
PC Week Labs
Check out interbase.com or interbase.org. Interbase has been in use in enterprise environments for something like 15 years. It has very advanced crash recovery features and replication throught a commercial plug-in. Version 6.0 is under the MPL and in beta right now for Solaris, Windows, and Linux. No, I don't work for them, I just sound like that.
--JRZ
Interbase v6 is in (free beer) beta right now, and will be released under the MPL (Mozilla Public Licence) by Mid-June
You can download the beta now for Linux, Windows or Solaris from Interbase.com or The Interbase Developers Inititive
Supports databases up to 32 TB (!!!) spread over 2GB files and is fast and reliable. (It's been around for 15 years.) The biggest known Interbase DB is over 200GB.
It's is ANSI 92 SQL compliant (well.. as compliant as any DB I've ever seen - better than Oracle for instance), and the support is Amazing
Join the email lists at www.mers.com, and you'll be able to get answers from Ann Harrison (the president of Interbase.com), or from a lot of other people - it's the best support I've even seen.
As for enterpise features - well, apart from large DB support, it has row level locking, transactions, referential integrity, blobs, Multi-Generational commits, stored procedures... ummm... I can't think of what else to say.
Basically, if your Databases are less than 50GB then Interbase is the number 1 choice - above that maybe Oracle would be better, but that isn't exacly open source.
Here's a good way to do it in oracle, that'll probably genralize to almost any RDBMS:
The simplest way is to have two unix boxes, each physically attached to a raid unit (dont use raid 5, use mirroring for performance). Don't use clustering software, just mount the filesystems on only one machine at a time. If the primary system fails, then mount the filesystems on the secondary , startup the database, and you're on your way with minimum downtime. Script it and tie it to some ping or other fail detection strategy if you feel lucky, but in my opinion, keep a human in the loop to actually execute the switchover.
Another great solution is a so called "standby database". You make a copy of the primary database to your standby machine, using your backup tapes. Then you start to "roll forward" the standby database by applyin all offline redo logs of the primary database, as it is generated. This method should work for any database that logs transactions including Sybase, Informix and probably MySql. The equivalent of a redo log in MySql appears to be the "update Log". but I have no experience with MySql. One big drawback of this solution though, is that even though a transaction has been committd to disk on the primary, the event may not make it to the offline redo logs before the system crash. So a standby database can only be within some delta-T of the primary database, the last fiew transactions before the crash are lost. Sometimes this is acceptable, but if you billed the customer then lost the order, maybe not. You tune the redo logs buffering parameters to trade of performance for small "lag time" of getting redo logs across to the standby database.
The truth is that it takes 10x effort to get failover and cluster software correct, and 9 times out of 10, the automatic failover either triggers accidentally (bad news) or triggers correctly but fails to come up on the secondary. This is because you have to be incredibly scrupulous about keeping non-shared resourses in sync on the two machines and you can't test the failover (politically that is) until it actually fails. Heck, I just took a call this past saturday for this very problem: Site paid $$ for a bigger consulting firm to implement cluster, firm cant be reached on saturday when cluster trips but does not failover successfully. Customer calls me. I am to gracious to say I told you so.
My free advice: Make two machines, not a cluster. Keep the two machines in sync, using the "standby technique", or make the raid unit accessable by both machines. Keep a human in the failover loop.