Open Source Database Clusters?
grugruto asks: "A lot of open source solutions are available to scale web sites with clusters but what about databases? I can't afford an Oracle RAC license but can I have something more reliable and fault tolerant than my single Postgres box? I have seen this recent article that looks promising for open source solutions. Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?"
Works everytime.
For what it's worth, the commercial solutions are hard to setup, unstable and terribly difficult to maintain, and this is after a small fortune has been invested in making them work. Not to knock the open source solution, but it's hard to beleive that something that is infrequently used and difficult to understand will be truly production quality if you want to use it for money.
We've been evaluating the Emic application cluster for MySQL and have had pretty good results. It's a new product (so YMMV), but it looks promising.
Emic Networks
If you're using Java, you might want to check out the Clustered JDBC project
-D
Cube On! (http://stores.ebay.com/PuzzleProz)
MySQL has very nice replication functionality, and, in certain circumstances, you can even set up replication rings. It is somewhat flexible about the topology you choose to use, so pick the one best for your application. Load balance ala DNS and you're in business.
Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?
They don't compare to commercial products. I know it isn't what you want to hear, and there are hundreds of kids here to tell you different, but they just dont compare. Those kids database experience doesn't extend past an address book.
Even if you manage to get them to technically keep up, transaction wise, to Oracle or SQL Server, the ACID enforcement isn't there, the syntaxes are kludgy. Gack.
My company ships products with SQL Server or Oracle as the back end. I've tried to put together an OSS solution so I could impress the big boss with millions of bucks of saved license fees. They just aren't anywhere close IMO.
Run a SQL Server farm on the back end if you cant afford an Oracle license. Don't be an OSS idealogue in the business world, you end up unemployed.
I don't need no instructions to know how to rock!!!!
Maybe it will be of interest...
philcrissman.com.
Open-source or not...
I would say just get a bigger box for your PostgreSQL solution and do semi-realtime remote replication on the tables you dont want to lose.
I browse at +5 Flamebait- moderation for all or moderation for none.
You can "cluster" MySQL? Does it involve "rsync" and "cron"?
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
IMHO, the biggest problem is replication; keeping them all consistent in the face of asyncronous updates. It can also reduce/eliminate the advantages of clustering if you have a significant number of updates compared to the number of quieries.
I guess the best answer depends on how dynamic your data is. If it's static, there are all sorts of easy answers. If all the updates come from a central source, or on a predictable schedule, you're almost as well off. If updates come from the great unwashed but the data can be partitioned in some way (say, geographically) you can still do it. If updates come from all over but queries can be centralized, or if your database is tiny, or if latency isn't a problem, or if you have a machine that prints money, it can still be done.
If you want to do everything for everyone everywhere, right now if not sooner, for under twenty bucks, you're screwed.
So, what are your needs?
-- MarkusQ
Nuff said. Your sig, like Loverboy at Madison square, kicks ass.
Check out the new replication at postgresql.org: it's master -> multiple slave replication.
Then have your slave database query the master database - and if it no longer responds, it could promote itself to master.
The replication is the easy bit - the slave promotion is the hard and gritty bit.
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
What *I* would probably attempt would be to setup a replication ring, and use a bigIP to make them all look like the same server. Then you get your load balancing, and scalability. I have yet to try this, but I will in the (very) near future.
Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???
HA is always crapshoot/tradeoff between cost and risk. Throw enough $ at the problem and you'll approach 100% availability.
I know that 'more robust' is a nice thing to want, but you really need to think about what you really need. If it takes 15 minutes to switch over to a backup copy (using some magic RAID disk mirroring maybe?) and 15 minutes to restart the app and let it checkpoint it's way up to a decent operational speed again, is that good enough?
If it takes an hour, how about that?
How much time/heartache or money is it worth for you to have system downtime, and how much are you willing to expend to reduce it by 5, 15, 30 minutes?
So, there's really a continuum of availabilty you have to pick your point in. At the low end, you have no backups and recreate everything from scratch. At the high end you use Vendor X's real clustering solution and 24x7 monitoring, then have zero downtime even in a disaster. Somewhere in the middle is you.
Now I realise this an overtly commercial view of things, but if needs be replace money with effort and season to taste.
/* affect != effect */ void affect(int *thing,int effect) { *thing += effect; }
If you're working with enough data that would require a CLUSTER, then I would suggest a commercial product.
But if you need that SPEED, but not a lot of data storage, I'd say a decent sized MySQL cluster would cover you, depending on what your needs are.
If you are in the position to actually need a cluster to do that much work, you should be able to get something commercial and more large-scaled oriented
Error 407 - No creative sig found
You should investigate eRserver. It was originally a commercial replication product for Postgres but has been open-sourced. I haven't tried it yet but it's on my to-do list.
~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis
You can make a High Availability cluster out of most any software if you have some kind of shared storage.
People have used firewire drives connected to two different computers to accomplish this cheaply. Oracle is giving away a cluster filesystem (so they can sell RAC on linux) there is OpenGFS as well for filesystem usage.
Just write some basic monitoring scripts that will bring up your postgress database on the second server should the first one fail. Just make sure those scripts completely take down the old database on the first server in the case of a partial failure. Having two databases try to open the same data would be a really bad thing.
Here are some links to articles that should help:
Overview
Howto
Cluster Filesystem
These are mainly geared for Oracle/RAC, all you need is the firewire shared storage and cluster filesystem. You're on your own to write the monitoring and failover scripts. Hope this helps. --Chris
ahem have youchecked nay opensource db lately?
most of them have acid enforcement despite your claim to the contrary..
Don't Tread on OpenSource
- CLUSTERING IN TUNE WITH APACHE AND MYSQL (Free registration might be required. Also see Emic Application Cluster
for MySQL
- InnoDB Hot Backup (with point in time backup)
The rest of this comment is quoted verbatim from InnoDB NewsMySQL/InnoDB-4.0.1 and Oracle 9i win the database server benchmark of PC Magazine and eWEEK. February 27, 2002 - In the benchmark eWEEK measured the performance of an e-commerce application on leading commercial databases IBM DB2, Oracle, MS SQL Server, Sybase ASE, and MySQL/InnoDB. The application server in the test was BEA WebLogic. The operating system was Windows 2000 Advanced Server running on a 4-way Hewlett-Packard Xeon server with 2 GB RAM and 24 Ultra3 SCSI hard drives.
eWEEK writes: "Of the five databases we tested, only Oracle9i and MySQL were able to run our Nile application as originally written for 8 hours without problems."
The whole story. The throughput chart.
Trusted Computing FAQ | Free Dawit Isaak!
What does Slashdot do for this? I recall way back in the day there was some information about what the Slashdot tech looks like, anyone have info regarding their database setup? L
or does this term sound kind've like a made up buzzword like ".NET powered Java schemas!" or "SOAP servlet toaster oven with X-M-L!"
bite my glorious golden ass.
Using one server as a master and n servers as slaves. Just make sure to write everything to the master. Replication to the slaves generally takes about a second or maybe two depending on load.
OK, not quite the same thing but this works quite well for ready heavy applications, and is very reliable unless you get a slave out of sync.
This was on v3.n.n - the good folks at MySQL have made many improvements to the replication facilities in the 4.n series I believe.
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety" - BF
there are basically three type of clusters:
1) shared nothing: in this, each computer is only connected to each other via simple IP network. no disks are shared. each machine serves part of data. these cluster doesn't work reliably when you have to aggregations. e.g. if one of the machine fails and you try to to "avg()" and if the data is spread across machines, the query would fail, since one of the machine is not available. most enterprise apps cannot work in this config without degradation. e.g. IBM study showed that 2 node cluster is slower and less reliable than 1 node system when running SAP.
IBM on windows and unix and MS uses this type of clustering (also called federated database approach or shared nothing approach).
2) shared disk between two computers: in this case, there are multiple machines and multiple disks. each disk is atleast connected to two computers. if one of the computer fails, other takes over. no mainstream database uses this mode, but it is used by hp-nonstop. still, each machine serves up part of the data and hence standard enterprise apps like SAP etc cannot take clustering advantage without lot of modification.
3) shared everything: in this, each disk is connected to all the machines in the cluster. any number of machines can fail and yet the system would keep running as long as atleast one machine is up. this is used by Oracle. all the machine sees all the data. standard apps like SAP etc can be run in this kind of configs with minor modification or no modification at all. this method is also used by IBM in their mainframe database (which outsells their windows and unix database by huge margine). most enterprise apps are deployed in this type of cluster configuration.
the approach one is simpler from hardware point of view. also, for database kernel writers, this is the easiest to implement. however, the user would need to break up data judiciously and spread acros s machines. also adding a node and removing a node will require re-partitioning of data. mostly only custom apps which are fully aware of your partitioning etc will be able to take advantage.
it is also easy to make it scale for simple custom app and so most of TPC-C benchmarks are published in this configuration.
approach 3 requires special shared disk system. the database implementation is very complex. the kernel writers have to worry about two computers simultaneously accessing disks or overwriting each others data etc. this is the thing that Oracle is pushing across all platforms and IBM is pushing for its mainframes.
approach 2 is similar to approach 1 except that it adds redundancy and hence is more reliable.
What about SAPDB isn't it a potential choice. I thought I read somewhere that MySQL and SAPDB were merging. Chech it out http://www.sapdb.org/
I've been running a 3-4 node MySQL 3.23.x cluster on Slowlaris 9 since January. It has survived several catastrophic power outages and numerous other insults without a hiccup. Load is fairly light (about 3,000 updates daily and a similar number of queries on each server) so YMMV.
ZEO will allow you to scale the ZODB (Zope Object Database) across multiple processors, machines, and networks. Although the ZODB is a Python object database, so it's probably not an option to port your current database. There are other limitations of the database - it's not always the fastest, it's an object database so concepts like foreign keys are not fully there, but it can give you high availability. As of new Zope 2.7 in beta though, ZEO is quite easy to set-up, and it is open source.
All you grammar nazis point correct me when I use affect as a noun, or effect as a verb, but I don't think even most grammar nazis really understand the difference.
Check some definitions. Both words can be used as nouns or transitive verbs. It's complicated.
The first definition of "affect" is "To have an influence on or effect a change in."
Anyway. I guess not that complicated. But not the stupidest mistake someone could make.
There are no trails. There are no trees out here.
Two options I haven't seen anyone mention yet are PostgreSQL eRServer 1.0+ (see PostgreSQL news item "PostgreSQL now has working, tested, scalable replication!" from August 28, 2003 or a lengthier press posting "PostgreSQL, Inc. Releases Open Source Replication Version") and Backplane.
eRServer has been in development for over two years, is used in production settings and is released under a BSD license (as with PostgreSQL). It uses a single master/multiple slave asynchronous replication scheme. There are cautions in the release that replication may be difficult to setup.
Backplane seems to be particularly well-suited to clustering data quickly across a WAN. A quote may explain it better:
I haven't used either yet, but you may wish to give them a look.
Recently announced on the PostgreSQL website is commercially developed free and open replication for PostgreSQL. erserver is available for download. It is single master, many slave replication only.
deviantart.com, IIRC, runs about 3 mysql servers behind a load-balancing cache/server, so have had to deal with a lot of the difficulties involved in that.
Supposedly should be out by now.
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
I'm not 100% sure about this, but from what I researched, LDAP (openLDAP, etc) is built for replication. Also, LDAP is very fast at reading data and its "object-oriented" kinda, so you can do some things with it you can't do with relational databases. This is at the risk of using something entirely different (different query language, different administration, what happened to SQL, etc)
If relational is the way you need to go, I use SQL Server and it can handle replication as well as spanned filestores. Outside of that, PostgreSQL is my next favorite. Its fast, object-oriented (kinda) and is easy to setup. I just wish they would come out with an easier installation and interface so I can start using it more!
Anyway, Postgres sounds most promising. I like it better than MySQL and it is more enterprise-ready.
The press release of ER Server becoming open source is quite informative (karma?) as well.
Marc of PostgreSQL Inc's an incredible resource on the postgresql mailinglists too; and PostgreSQL Inc has a really cool policy that allowed them to do donate their code to the community that way:
From their release: " "DATELINE FRIDAY, DECEMBER 15, 2000 Open Source vs. Proprietary: We advocate Open Source, BSD style :) We will consider and develop
short term (up to 24 month) proprietary applications and solutions
where there is a strong business and intellectual property case to
be made. *All" proprietary developments that we are involved in
*will* become open source within two years of implementation,
without exception."
".
Also cool, they provide hosting http://www.pgsql.com/hosting/ which donates "25% of all profit from these services ... directly back into the PostgreSQL Project.
"
Ron
I'm not affiliated with them in any way, just appreciative of Marc's contributions on the mailingslists and to postgresql as well.
It's always hard to get this across to my clients, but you need first of all to answer the question "how much reliability/availability do you need?"
Think of it this way: if what you need is no better than 99 percent, you need to be able to fail over fast enough to only have 864 minutes of downtime a week. Of course, that's about 14 minutes, so you can practically handle it by doing a hand cutover.
On the other hand, if you need availability of 99.999 percent ("five nines") you can only afford to have about 40 seconds a week total downtime.
If you need true five-nines, you need to look at some of those nasty commerical apps. If you can slack off from there, the PostGreSQL and nySQL replication schemes work just fine, and you can use DNS remapping to do the failover. (I've done it with mySQL, it worked; think about needing a heartbeat to detect if the master server has gone down.)
check out prevayler
Seems like an excellent alternative to the traditional database route (though I myself have not yet used it in an application)
Here is a developerWorks article about Object Prevalence (of which prevayler is an implementation): An introduction to object prevalence
Availability is one of the basic issues when sizing your system. [ie, can you have it down at night for a cold backup, or does it have to be available 24x7? Can you even get a maintenance window once a month?]
As with sizing your UPS and/or generators, you need to determine what the cost to your business is for downtime.
Now, yes, you might have some issues in SLAs that spell out how much it'll cost you, if you have to refund customers's money [for service based orgs]-- or how much profit you'd lose if your customers couldn't purchase items [for sales based orgs]. But unfortunately, you have to also consider the recovery costs, the costs of damage to your reputation, etc.
If it's not worth your purchasing an Oracle or other, more expensive database, there's good odds that it's not worth the headaches of maintaining a high availability cluster with automatic failover. Instead, you can mirror the data, and keep transaction logs that you can replay.
You can have a spare system on standby, that you can keep updated on a regular basis (again, your cost of downtime, and the necessary time to recover the system will affect your choices), and when your main system should fail, you can push the most recent diffs to your standby, reconfigure the application servers to recognize the new server as the old one, and you're back in business.
It requires a bit of planning, and making sure that the necessary manual steps are well documented [so that anyone can do it, should the server outage be caused by something serious enough to take out your administrator, too], but it's easier and cheaper to build and maintain than a true cluster.
Build it, and they will come^Hplain.
Why don't you ask the guys from slashdot(creator/s). I feel pretty sure that they are not running a single intance of MySql to support slashdot.org.
Here is a good one to start with CmdrTaco They might take a while to get back to you but I am sure it will be worth your while.
It said "windows 98 or better" so I installed Linux
...When you say "database clusters" I imagine a herd of mares. It's getting sick. But it's too nice to spoil it with some psychiatrist...
I get the feeling that in a room with more than one door, it takes you all day to find your way out. :)
It goes from God, to Jerry, to me.
Clustering read-mostly data for performance reasons is relatively easy; for many applications, where a second or two of staleness on the replicated databases is acceptable, you can make do with a bunch of independent copies of the database, with all updates going to an authoritative database and getting replicated out from there asynchronously.
If your data can be partitioned cleanly -- that is, if you have groups of tables that are never joined with tables in other groups -- then you can perhaps get some benefit from putting different data on different servers, with no replication required. Obviously that's only worthwhile if the query load is comparable between groups.
If, on the other hand, you require ACID-compliant updates of all the replicants as a unit, you're entering difficult territory and you might have no choice but to go with a commercial solution depending on the specifics of your needs.
At just about all of the places where I've done database programming where this has come up, we ended up buying a much beefier database server (lots of processors and memory, good I/O bandwidth, redundant networking and power supplies) with disk mirroring, rather than get into the headaches of replication. A big Sun or HP server is certainly more expensive than some mid-range Dell or no-name PC, but it may end up being cheaper than the engineering time you'd spend getting anything nearly as robust and high-performance on less expensive hardware.
I've also found that very often when there's a database bottleneck that looks like it requires bigger hardware, the problem is the data model or the queries (unnecessary joins, no indexes where they're needed, poorly-thought-out normalization, etc.) or the physical layout of the data (indexes competing with data for access to the same disk, fragmentation in indexes/data, frequently-used tables spaced far apart on disk.)
If I'm dealing with Oracle, sometimes the solution is as simple as adding an optimizer hint to make the query do its joins in a sensible way. Sometimes denormalization is helpful, though you want to be careful with that. Sometimes a small amount of data caching in the application can mean a tremendous decrease in database load. And so on.
If you can tell us more about the specifics of your situation, there are lots of people here who can offer more specific advice.
Don't know how DB2 ICE would do compared to Open Source soloutions but take a look at the interesting results of the recent TPC-H benchmark performance testing on Clustered and non-Clustered 100GB and 300GB configurations. It appears that the IBM DB2 Integrated Cluster Environment (DB2 ICE) for Linux is heads above the rest.
MS SQL requires x86 hardware - No Sparc, No POWER, No MIPS. Just crappy x86. ...Intel Itanium 2 specifically is the leading TPC-C platform (both 1st (HP-UX/Oracle) and 2nd place (with the dreaded MS-SQL 64-bit version) and Intel cpus taking 8 of the top 10)
Crappy x86
There is no 64 bit version of MS SQL.
Wrong again. See #2 above
And if your *REALLY* need to scale PostgreSQL - run is on a SUN/SGI/IBM.
Wel, aside from the TPC-C Top 10, it's interesting to note that if you trust SGI, their whole next generation is based on Itanium 2 and IBM even sells Itanium systems. Sun, well we they're in their death throes...
Not a bunch of fucking Intel toys.
Looks like pretty powerful toys. Perhaps you should play with some so you'll know what you're talking about.
Once I was home visiting from University, hanging out drinking with a couple friends on a weeknight. They were town boys, whereas I had grown up in the country. I somehow talked them into the idea that it would be cool to go check out the old house where I had grown up.
It was a crisp clear early fall night. We parked a half mile down the road at the intersection of the concession roads, and walked. At this corner there was a tiny graveyard -- 4 stones -- and I set the mood by telling about how it was a family whose house burned down with them in it and how it was said you could see them some nights, running down the road in ghostly flames.
We snuck into the pine woods behind the house where I grew up and I showed them my childhood haunts in the moonlight. Rabbits started drumming, and my friends were freaked out. It sounds pretty spooky when you don't know what it is. We wandered over into the hayfields in what had been the old family farm. They'd been cut a month or so earlier, and we walked along the crick telling ghost stories.
One of my friends started telling the campfire classic about the man with the hook hand and just when he got to the punchline he started jumping and going "Ah! Ah! Ah!"
We got him calmed down, and figured out what had happened: right at the punchline of the hook man story he'd stepped on an old piece of baling wire, and it had slid up into his pant leg!
It's been the largest pain in the ass I've ever had managing servers.
MySQL spanks DB2, as does postgreSQL.
Our DB2 on Linux crashed so much we spent months before we had a production ready system. We were replacing PostgreSQL and we had to rethink everything. It couldn't handle our insert load, and we were going from 4 dual 733 intel boxes to two large quad xeon boxes with 15,000 rpm disks.
We spent $100,000 on DB2 license (that with the discounted half price DB2 EEE for linux). We are now in the process of migrating to MySQL after some large benchmarks. With a few simple indexes MySQL inserts twice as fast as DB2 and selects in 0.00 seconds on any row, vs. DB2's
Throw in the support scam they pulled on us, and IBM is a joke of a company. If they weren't pushing Linux they'd annoy me more than Microsoft does. The support scam went like this. We purchased 8 CPU licenses for DB2 EEE In 8/02. In 3/03 we start recieving calls from salesmen to get our upgrade business since our 1 year support contract expires on 5/1/03. I call IBM with a serious chip on my shoulder and get the story that our anniversary date automatically defaults to any dates held by previous contracts, "it's easier that way". We had some AS/400's (talk about poor performing overpriced junk). So they wanted about $50K for "support" for another year. We declined their offer and considered suing. At $50k/year losing 4 months of support isn't acceptable to a small business.
So I am bitter at IBM. But not without reason. During our first 3 painful months deploying DB2 I opened 15 PMR trouble tickets. Of the 15 I resolved 14 while either on hold or waiting for a call back from them. ALL of the PMR's were opened with status "critical, production down". The last PMR IBM claimed to either be a bug in the Linux kernel or in DB2, they didn't know, but when I pressed, they did offer a patched version that we could "try out" on our production box to see if it worked. Throw in that clustering didn't work as advertised (not at all under moderate load), and DB2 is a pile of junk.
As the IT geek the fault landed squarely at my feet, so I did some thorough investigation and benchmarking. default config DB2 is considerably faster than both PostgreSQL and MySQL at everything but inserts. But throw in a few indexes and MySQL and PostgreSQL owns DB2's sorry excuse for a database.
I AM bitter, and this probably is flamebait. But I'm past caring about IBM and their scam operation. I'm sticking with what works, and so far NOTHING from IBM has worked.
I wasted 3 months of 7 day work weeks averaging 12 hour days on DB2 and it's so called Linux support.
end
My Linux Command of the Day site : LCOD
I know it's probably irrelevant ... but I think IBM's DB2 on OS/390 rocks. You can have a couple of DB2 members on separate LPARs, which should reside on separate mainframes for a high availability (HA) setup.
PDF blows.
I hate PDF links. On Windows the experience is great, let's come to a complete halt as I watch CPU load hit 100%, wait for a splash screen, and watch the damned thing decide to show me the text at 245% zoom.
What a load of shit.
What's wrong with HTML as a virus free, pleasant to experience, documentation format?
Just say no to PDF.
Just a little bit of poking around the TPC website -
look at the applications they used for testing; M$SQL server/DB2/Oracle9i. This thread is about OSS databases. Did TPC test with Postgres/MySQL? No. Why? Because IBM/Oracle/M$ are TPC partners. Gee, why do you think we don't see OSS products running circles around these bloated corp. products on these
alpha-stage systems running Itanium???
BECAUSE THIS SUPPOSED "TEST" ORGANIZATION IS FUNDED BY THESE COMPANIES you moron.
Oh, and M$ SQLserver being 64-bit makes me laugh. Who is running, really running, SQLserver in 64-bit mode on an M$ wanna-be 64-bit mode OS on 64-bit hardware?
Sun/SGI/IBM all have *REAL* 64-bit OSes running on
*REAL* 64-bit hardware.
Don't bring your M$-blind corp. drivel here.
Your first clue about TPC is that they run their site on Assh*le Server Pages.....
> The right tool for the job people
Right, and a myoptic application of the above advice would lead to a dozen different database products in a typical department. They'd all be the right tool for some job - unless you're hoping to reuse skills, reuse backup solutions (TRM for DB2, Veritas for Oracle, etc), have any hope of reliable integration, etc.
So, yeah - get the right tool for the job. But before you right that out you need to take a big step back and get a sense of what your strategic direction is, and what are all the implications of such a decision.
I know a lot of folks converting mysql to other solutions right now - because some junior guy figured it was the best solution. It might have been for the app - but it wasn't for the department. Which is like winning a battle but loosing the war.
Ouch, sounds like you should have gotten an experienced dba to set it up for you. DB2's too complex to go with simple defaults, and clustering is definitely a high-skills endeaver.
As far as insert loads go, we've seen 500 rows / second on five year old hardware without any problems. Although that's far short of what DB2 is capable of, it's fine for a sustained load. Beyond that batch loads hit 15,000 rows per second easily on the same box.
And as far as pricing goes, today you could get DB2 Express for those little dual-cpu boxes for just $500. A really fast four-way will cost you $32,000 - still way shy of $100k. You don't need to hit that kind of pricing unless you're doing inter-partition parallelism. And as I mentioned above - that's just not worth doing unless you've got the right skills to pull it off.
At any rate, many "big kids" are using the most unfairly bullied product, slandered most likely because it is a software boy-named-sue, MySQL. Why not have a read before taking childish pot-shots:
http://www.mysql.com/press/MySQL_userlist.pdf
In the end this silly "I'm a big boy because I use oracle and your a little gurly kiddie because you don't" bullshit is just empty bravado. Businesses generally attempt to find the most cost effective means to meet a need and often Oracle ends up being like buying a stealth fighter to deliver a pizza. It often just doesn't make sense even for a big kid with billions of dollars, which might be why the $30B+ multinational BASF uses PostgreSQL.
Frankly, after the named-user license Oracle sold the State of California, no matter how idiotic the clearly comatose contract negotiators were, one would be remiss to not consider other companies with slightly less egregious behavior on record.
Here is a description of a Cluster created on MySQL with Linux boxes - similar to Google. http://www.dwreview.com/Product_Reviews/Review_Dat a.html
and http://www.dwreview.com/Data_mining/Intelligent_Da taMining.html
I want to experiment!
If you can avoid the Oracle and SQL fees, go ahead ... show us, I want to save money.
Slashdot is full of people who are quick to savage anyone who knocks OSS, yet rarely provide decent examples proving their argument!
I want an OSS database cluster, now put up, or shut up!
I maintain a site that does a fair bit of traffic (Daily avgs: files served = 1.8 Million, bandwidth = 20 Gigs)
We have 1 "master" MySQL server which gets all updates and inserts, etc. We have 2 "slave" servers which each take a signifigant portion of the select queries. All machines run the same 4.0.x version of MySQL. (Web access is PHP on Apache) All machines are dual x86s packed with RAM.
Setting up replication is pretty easy. And for the most part things are pretty nice. The load average drops a lot on each machine when we add a new slave. (Oh don't forget to enable query caching.)
We have had some problems though. Because the site gets so much traffic sometimes queries take a while to run and to propagate to the slave servers. This means if you update your data (via the master) and then do a select from one of the slaves your change may not show up yet. For most web apps this might not seem really big.
But it leads to the web users changing things and not seeing the results right away. So they figure the site is "broken" and they repeat what they just did only to have it take place twice. If you have your browser "refresh" the page first usually the data has come through but many people don't do this. The result is they don't feel their account has been credited or something. These kinds of bugs are hard to track down too.
I wrote a program to check repetatly (sleeping from 1/4 to 1/25 of second in between) and the slaves were almost always in perfect sync with the master. (as per MySQL's binary log position indicator). That was really impressive however there are times when the servers are under load that the slaves will be out of sync for 30 to 60+ seconds! (Measuring in the tens of thousands of byte offest differences in the binary log position.)
The solution we've been using is that any time there is an update to the database and the imediate page seen next by the user relies on the changed data we do the selects from the master server. This seems to work for now but I'm not sure how long we will be able to scale this way.
In summary so long as the laod on the machines stays around 1.0 or lower everything runs pretty smooth. If the loads hit 3 to 5 or higher then people notice (or rather mention) that things seem odd. (By the way those are linux load averages which IIRC is different than under Solaris.)
What I would like to see is a virtual server type system where one machine accepts all queries and hands them out to a set of replicating servers without requiring the application to know about it. This is nice for developing applications but the real reason is the master can then prevent the syncing issues discussed above.
SF
Enterprise database solutions are quite varied. Is it a data warehouse or something financial or ???
You pick the right tool for the job. I've seen massive databases on Sun Enterprise E-6500s and Oracle do a LOT if the database is properly configured. But one structure doesn't work for all applications. Do you use stored procedures? Do you index? Do you require triple replication to reindex one system have a backup and a live production system? Do you need remote fail-safe operation? These types of questions need to be answered before you settle into one solution.
Banjo - The more I know about Windoze, the more I love *nix
unless your software is gpl.
Our fast four way was under $32k, I threw in the price of the x300 storage array we bought with it and the 4 CPU licenses for DB2 EEE 7.2 we bought for $11.5K each (half off the SRP).
Our db2 does over 15,000 rows/second in BATCH mode. It was a sad day when we had to log our transactions to text files for batch processing.
We did end up hiring a good DBA to help us with our DB2. It's worth noting that we didn't have this extra cost or need with our PostgreSQL setup.
I'm curious about anothers experience with DB2 on Linux, as I assume you're running on. Tell me, what versions do you run? What kernel? What kind of reliabity do you get?
We initially ran DB2 on a Redhat 7.3 setup with a severely modified kernel, 2.4.15 I think it was. We went to RHAS 2.1 with the RedHat kernel after so many stability problems in DB2. The new version didn't fix the problems, it only threw in a slew of new problems relating to the hardware. Our new setup is Gentoo 1.4 with a 2.4.21 kernel. It runs much faster, sees all HT enabled processors and throws no APIC errors, and hasn't crashed.
So, what are your experiences with DB2 on Linux. If you're not runing on Linux, what are you running on?
My Linux Command of the Day site : LCOD
In my experience the hardest part about setting Oracle 9i RAC (or even Oracle 8 parallel server) on Linux is the Fibre Channel arbitrated loop system that generally accompanies it. I've performed such a setup both on IBM FastT-type and Compaq/HP StorageWorks hardware, with both QLogic and Emulex HBA cards, and that's where all the dancing around happens. Driver support on Linux (especially for HP) is spotty, and not exactly straightforward to come by. However, once the Fibre setup works perfectly, everything else follows quickly enough.
And these things are godawful expensive as well. The fabric switch alone usually costs more than the entire cluster of computers which are connected to it (and these are by no means cheap machines... they're IBM xSeries or Compaq/HP DL-type enterprise servers), as does the actual disk array, even if it's as small as 200 GB. The entire shared storage setup can come up as the largest single cost in an Oracle RAC or other proprietary enterprise database installation that makes use of a similar system.
However, I imagine that none of these open source database clustering solutions can hold a candle to RAC in terms of performance. They all seem to make use of replication, which means that they will be far, far cheaper than a shared storage setup, but that is where a fibre channel disk array is difficult to beat. Fibre Channel has data transfer rates measured in GB/sec... RAC has a dedicated cluster filesystem or makes use of raw devices, with its own distributed lock manager that prevents the machines in the cluster from simultaneously writing to the storage and corrupting it. As I imagine, such a system is difficult to write and test in an open source context, as it again requires any of the horrendously expensive enterprise shared storage setups on the market (even multi-tailed SCSI disk arrays, while cheaper than fibre, are still quite pricey nevertheless).
Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
While I'm a huge advocate of virus free and pleasant experiences or alternatives and although my argument may not be all that cogent, how would you propose another alternative to those VARs out there that want to utilize the wonder of e-mail to send PDFs such as quotes to clientele? Without allowing the ability for them to be modified? Printed for whatever reason? Password Protected (without webserver htaccess ability)? I know and am fully aware that any *nix based platform, l33+ h/\x0r w/\R3Z, or a simple screenshot could theoretically bypass such security, but your average quote or proposal recipient probably isn't technical enough to do so. In contrast, with unsecured documents, they surely could click and type away! PDFs have a place, and although I mostly agree with your argument, [X]HT[X]ML (IMO) isn't the answer to all scenarios. Best tool for the best job and/or circumstances I always say.
-A
maybe not totally cost free as we spent a couple days writing some code to build a solution using erlang/otp and postgress that does the job fine -- has been running for almost a year non-stop. also a collegaue used open source tool called spread to do the same thing for free too and he told me it was really easy to set up.
You can't embed fonts and images inside of html documents.
On linux(with my blazingly fast duron 650) a 500 page pdf I made with OpenOffice takes a few seconds to load in konqueror. I had downloaded all the indiv. web pages that made up the book(wasn't avail as one file), used cat to put them together and then waited 20 minutes for open office to load it. Mozilla took about the same time to load the same file, and konqueror was a little bit less than ten minutes. God knows how long IE would have taken, if it would have loaded at all. While were getting off topic here, Word2000 would only bring up the first web page becuase and ignored all the rest in the file. God only knows what IE would have done.
Basically, for big files PDF is the only option as far as I'm concerned. I am sorry that Microsoft and the creaters of pdf can't provide you with a decent computing experience for such basic tasks. There's only $50 billion dollars and decades of experience between the two companies, these poor guys are doing all they can.
---------- Open Source is capitalism applied to IP.
This is from the unixtips.org pannel on the main page: --------- Warning: mysql_connect(): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (61) in /usr/home/http/portico/random/ascii.php3 on line 7
Warning: mysql_select_db(): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (61) in /usr/home/http/portico/random/ascii.php3 on line 8
Warning: mysql_select_db(): A link to the server could not be established in /usr/home/http/portico/random/ascii.php3 on line 8
Ack! Please notify schvin@schvin.net that there is a serious problem on portico. Thanks.
------------
Does anyone else find this ironic? :/
I am reluctant to say now, but I am a MySQL advocate...
Can MySQL as of yet do this:
INSERT INTO TABLE XYZ
DELETE FROM TABLE QRT
ROLLBACK
and keep the state of XYZ and QRT consistent with respect to other tables?
I think I could make the argument that if you are running a database that never crashes on top of an operating system that never fails, you can be innately ACID if you do your middle tier correctly... but THAT is for another time.
This is my sig.
One of the telltale signs of a SQL Server installation is the frequent "deadlock" messages. I would say that if you are going to complain about transaction handling in MySQL, even the standard version that doesn't have it, you should probably complain about the transaction handling in SQL Server. If it deadlocks, and does not deadlock avoid, then it ain't an enterprise solution.
This is my sig.
According to the FAQ it supports clusters/high availability of several types (towards the bottom), has Oracle 7 compatability, and has the option to upgrade to commercial support (something available for Postgres, MySQL and most others as well). It's got an install base of users used to large environments and has been reasonably proven in the field. Just a thought.
SAPDB (was made open source by SAP a year or 2 ago) definitely bears some research, but I haven't used it myself so I can't comment (anyone else?).
I can give you some quick info, though (partly to respond to a sibling post claiming that MySQL and SAP both cost money if your software isn't GPL).
From sapdb.org (various pages):
-- From Q4 2003, SAP DB will be rebranded as "MaxDB" and offered as a MySQL AB product.
-- SAP DB can be used free of charge in non-SAP environments.
And using MySQL *is* free as long as you don't distribute it and/or sell an application that requires it. From the MySQL licensing page:
-- If you use MySQL in conjunction with a webserver and develop the needed tools/applications by yourself, or you use applications licensed under the GPL or compatible OSI licenses, then you do not have to pay for a MySQL license. This is true even if you run the system on top of a commercial web server.
-- Internet Service Providers (ISPs) often host MySQL servers for their customers. With the GPL license this does not require a license.
Note: obviously, do your own research for your particular situation... PLUS it may well be worth it to pay the license fees (only a few hundred bucks per server) to get support.
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
Right, well I didn't say they weren't the right tool for the job right now.
I said the user experience, the disruptive splash screen, the disruptive way the machine grinds to a halt, the uncanny mechanism it has for displaying everything at 180% or larger, its stupid non-understanding of the page up/page down keys blows great big donkey dick.
It's a huge slow stinking pile of shit. That's acrobat. Huge. Stinking. Slow. Pile of Shit.
Okay Adobe? Googling up acrobat sucks? That's your program. It's shit. Fuck you Adobe for violating your users. Rot in hell.
Hell, I'd rather just be sent tiffs than get a pdf. But sending them in emails is different than placing them at the end of a URL. My expectation for URLs is that things that are documents can be opened silently in the background in a new tab and are not bug the shit out of me ware. PDFs ARE bug the shit out of me ware, and I resent that.
So it's just a warning and a suggestion. I publish my resume in text, html, word, and pdf. So folks can pick what they want. I don't think it's too difficult anymore to publish in multiple formats. If you want me to read what you write, respect me and don't make it hard for me.
Here's another experience I love.
I'm in mozilla, and I middle click a link. Somewhere in the background a page starts loading in a new tab. No problem. I will continue reading. Ah, background loading into new tabs.
Then.
The.machine.comes.to.a.halt.
It.takes.me.awhile.to.realize.this.
but.
I.can.not.scroll.I.can.do.nothing.
And I know what's happened. Some moron has a java applet displaying something wonderfully important like the fucking time in their little corner of hell, and if I wait about thirty more seconds, I'm going to hear a little pop, and I just know the sound of that pop is like the sound of a dick popping out of my anus, cause I know that java has just ass raped me, my browser, and my machine.
Pop! Your clock is now ready sir!
O U C H ! ! ! ! Rapist.
I have personally installed, setup and maintained a 5 (3 slaves, 1 master/slave, 1 slave/master) node cluster using Heartbeat and MySql replication. It works great!! My guess is that 80% of MY Mysql usage is content and needs READ-ONLY access. So I have 3 slaves that are used in a Read-Only cluster. The master is one of 2 other machines and ALL WRITES go to it. In the event of a MASTER db going down, the remaining slave promotes itself and updates the other slaves to point to itself. Been working great fo 8 months!!!
Are your slaves using InnoDB or MyISAM? If the latter have you checked if the 30+ second delays are caused by the replication thread begin blocked by a read lock? The course grain (table level) locking of MyISAM make its impossible to get consistent behavior under load in a read/write environment (even if the only writer is the replication thread). Use InnoDB instead, it is amazing.
You need a license only if you choose to distribute the software. If this is an in-house application, simply obtain copies of MySQL Standard/Max (GPL) directly from MySQL mirrors for each server. Since you do not perform the distribution, you are in good shape (see MySQL License Policy - Licensing -2).
However, the folks at MySQL AB are very decent folks who offer great support and warranty for their product and who have to feed their families, and licenses are cheap. IMHO, buy at least one license for a master and one for a slave. That way you get support for the program in each role.
I run two types of clusters, one of them is a RAC 9i on Linux. Nothing and I mean nothing has the functionality of RAC 9i. You can put a bullet through one of the nodes right in the middle of a query being returned and still get your records just like nothing ever happened. The other database I run is a postgresql on redhat advanced server and the database files are sym linked into the san (this is high availability only) . If I had to do it again I would not use postgresql because it scales for shit and I cannot under any circumstances keep it up in a 24/7 configuration. The database needs to have vaccuum run on it once a day and I have to do that manually because half the time it fails. Running a vaccum on the database while clients are connected basically locks everyone tight until it is finished.
If you cannot spend any money and wish a fast, scalable and higly available system my advice is first sapdb and or mysql and advanced server on some sort of shared scsi.
Now all of you big postgresql advocates flame away but it does not change the facts. I love the database but if you need heavy lifting it just does not cut the mustard.
Got Code?
I live in a mixed environment of a couple of commercial databases and a couple of popular no-cost ones.
The bad news is, you get what you pay for.
You'll have open source aficionados telling you how [insert brand] free db is k00l. Sadly, in reality, the few most popular such packages are simply way behind commercial systems. Some commercial systems are also way behind other commercial systems... (I mean, Windows as a db server platform? Get real & get sober, dude.)
I wish I could recommend a free or at least cheap db server. But if you care about your data so much that you are seriously going into replicated systems, the couple of most popular free packages at least aren't there yet even in basic ACID reliability.
I wish I could tell you different, and I wish the company where I work wouldn't have to pay the megabuck-class "maintenance fees" for our commercial dbs.
Check Mnesia DB from Erlang package. It's not relational, but has high-availablility replication, conflict management, etc. It's reliable and tested. By Ericsson.
Good license.
Fairly Well Crafted Troll. However, your cheating by omission.
,NY
back to REALITY
IBM Research
Austin, Texas
San Jose, Ca
Westchester County, NY
Microsoft Reseach
Redmond
Silicon Valley (and the Newest MSR Lab)
GE Reseach
Niksayuna
Lots of other companies have labs worldwide, but that doesn't mean they are shutting down their labs in America and opening where labour is cheaper (not necessarily, im sure that perhaps its true in a few places) but rather spreading their location (and therefore business influence) worldwide. Try and troll better, as your fairly pathetic.
http://www.livejournal.com/community/lj_maintenanc e/60984.html
- I am made of meat.
No, PDF's are a useful document format. _ACROBAT_ is what you are complaining about...
For me, moz silently downloads the file in the Download Manager, then spawns xpdf... No browser lockup, no 100% CPU... In windows, GSView achieves the same effect. No need to complain just because you choose to use an inferior (for your purposes) product...
Why not just switch to another PDF viewer... There are tons out there...
"Go to CNN [for a] spell-checked, fact-checked summary" -- CmdrTaco
microsoft sql server [BWEG]??? :o
this mischievous comment prompted by the redmond folks' ad i was horrified to encounter on this page.
*free* trial w/ the 'softies? that's about the most you'll end up paying for "free".
--v
It got to the point where the slave servers (P4, 2GB RAM, Hardware RAID) could not keep up with the Master replication _and_ service SELECT queries. The data was too big for RAM (filesort) and the drives were not fast enough (2 drives mirrored). The Master is dual PIII 2Ghz, 2GB RAM , and fast RAID 5 hardware.
I ended up solving the problem with a hardware upgrade. I replaced all 4 servers with 1 Quad-Opteron 1.8GHz, 16 GB RAM, and _VERY_ Fast RAID 10 across 9 fast drives.
Please feel free to check it out. For the first time in a long time, I'm not affraid that the MySQL server will be the bottlenect in this very dynamic web site.
We use Linux, Apache 1.3.x, MySQL 4.0.x, and PHP 4.x to build the pages and generate XML to our Flash MX applications.
Superdudes.Net
Flash heavy signt. Free registration required to access the coolest features (those which beat up the MySQL server).
Here's what I don't understand about multi-master database clusters:
In a multi-master situation, if you connect to either database it should give you exactly the same results. Well, it can't do that unless all masters are online with respect to the other masters.
So, for any multi-master it seems almost as if you have to trade some kind of reliability.
And let's say you have two databases and both are up, but neither one can see the other. How are you supposed to provide an answer from either one? If it's an INSERT, it certainly can't commit, no?
Social scientists are inspired by theories; scientists are humbled by facts.
Now about the real reasons why you would not see Postgres/MySQL results any time soon there:
The most obvious one is that running TPC is costly. If you look at top solutions prices, it is around $5-10M, about half of which is hardware. I doubt Postgres or MySQL have money to run these tests.
But it is also obvious that this is only part of the answer - if you look at who submits results, usually it is not DB vendor, usually it is hardware vendor. So if HP or IBM (top hardware vendors) believed they could get reasonable results with Postgres or MySQL, they would definetely do this (especially because they could save on software part of total solution price). The fact that they don't do this proves neither of these databases is ready for complex databases and huge TCP loads.
MSDOS: 20+ years without remote hole in the default install
While the Oracle query optimizer is probably superior of postgreSQL I would like to point out that it's still not the "cream of the cream". Be aware that Oracle cannot distribute one SQL statement to multiple processors/nodes. Other database systems like DB2 or Teradata do this automatically while you have to write SQL/PL in Oracle to achieve this. I always wonder why the "measurement" for OSS databases is Oracle and never DB/2.
We've been running DB2 on Linux for nearly 4 years now - it works fine on our quad Xeon box. At first I had a few problems sorting out the weird Mainframe jargon IBM uses for configuration, but once I got it running it hasn't really given any problems.
DBs aren't very big - data for dynamic web pages at 2Gb, and a warehouse at 4Gb.
I'm not a DBA, and it's so low maintenance that really I know very little about it at all, it takes care of itself.
Certainly works better than the SQL Server we have of equivalent vintage, which seems to have a problem scheduling queries - one big query brings the whole system to a crawl. On the DB2 server a big query increases latency slightly, but that's about all you notice.
We use oracle. We have several tb of data which get queried by analysts and productions jobs. Oracle is falling over (in part due to weak db tuning/design) and terrdata is in the running to replace it.
Any thoughts on better, cheaper routes to take?
You can't just string Web pages together using cat, silly. There can only be one root element per HTML page (between the and tags). The correct, conforming behavior would have been to ignore all those zillions of other pages you tacked on. I'm surprised all the open source solutions didn't throw up an error message or complain somehow, while Word did the "right" thing. This may be due to the common OSS development value of convenience over overly restrictive correctness, but it still seems wrong in my book.
The production version of MySQL still has no subqueries, they're due in 4.1. Some types can be emulated with a left join-ish thingy (IIRC), but that's one big minus for MySQL. Disregarding that, it's a kick-ass db.
As for optimizing, InterBase is (or can be) one weird mofo where it comes to query plans -- if you have complex databases. I worked on a db with 7 tables and I had to include all of them in one query (the relations were like that). The plan it came up with was seriously fscked (despite the fact that AFAIK the database structure was _pretty_ carefully thought out).
The point is my naive friend, when the economy comes back, the hiring will be done over seas not in America.
Someday your job goes away and you start to cry I will laugh at you.
Linda tuples are not new. They are supported in many languages... If I'm not missing something. Is this not what tuple-space was invented for?
Has no-one slathered linda onto some mysql backend.
my $0.05
I remain amazed at the lack of awareness of what IBM has been doing with mainframes for over 20 years when it comes to database clustering. Parallel Sysplex is what IBM calls its clustering framework for mainframes. Critical core system environment applications like DB2, IMS, and CICS are written to take advantage of services from OS/390 and a special CPU known as a coupling facility that enable extremely high reliability, recovery and failover. Since most involved in open-source seem seriously mainframe clueless, I am not surprised that both the open source and closed source offerings on Linux still do not even come close to what IBM has achieved.
Of course it doesn't help that computer science arrogance continues to make people look at mainframes as if they were dinosaurs, and as a result important lessons go unlearned. If open source database developers want to get a clue, it's high time they learned about Parallel Sysplex. By the way- Linux on the mainframe doesn't even take advantage of this, but it probably doesn't help that IBM keeps the whole approach of the underlying supporting hardware proprietary. It costs alot of money to learn how to use the coupling facility, for example.
If you have a widely replicated, multi-master, database and the replication network becomes partitioned, it's impossible in general to fully resolve the inconsistencies when connectivity is restored.
There are particular instances of applications where a specific replication solution might be adequate. But it really does depend on the application requirements.
just curious but can anyone out there actually verify that RAC performs better than other clustered dbms's? my company was "convinced" by oracle consultants that RAC was the wave of the future (and it is, i'm sure) but the cluster performance is less than optimal...we've done a ridiculous amt of trouble shooting trying to figure out where the problems are, w/little to no luck. we're about a week away from scrapping the whole thing and moving to a more conventional architecture.
It would seem that the mySQL zealots have failed to address the real issue: clustering.
Sure, mySQL supports transactional tables but it does not support two-phase commits. Without distributed transactions, how can you build an ACID cluster? Replication is not the same as clustering!
And, I'm sorry, but mySQL does not come close to Oracle. I manage a large, distributed J2EE application running on mySQL (4.0.x). mySQL *still* does not support sub-selects, stored procedures, views, or dynamic tables (SELECT a, b, c FROM foo) AS bar(a, b, c). It does not support MINUS. It does not support constraints (e.g. column x must be greater than 15). It is not as fault tolerant as the commerical solutions. I could keep going if you would like? Yes, 4.1.x supports some of these things. Yes, PostgreSQL support some of these things too -- but it has other flaws.
There is a reason Oracle (and SQL Server, and DB2, etc.) costs a lot of money. As much as I love open source, if my project could afford it we would be running Oracle. Not because I like Oracle (it has its faults too, it is a resource hog), but because none of the open source solutions meet my needs as completely as it does.
Another thing -- did anyone had a look at SAPdb and Interbase? They are Free too and there's not much talk about them. Are they useable? Do they provide replication?
I run several MySQL clusters built on fairly low-end hardware (dual PIII700's, 1GB RAM, two 73GB SCSI-U160 disks in a software RAID1 config).
These are set with two master->master replication (write) nodes (high availability handled by HA-LVS, failover time is approx. 5 seconds) and a number of slaves (read-only). The configuration took, at most, an hour. The InnoDB table types work fine, autoincrements work fine. Referential Integrity works fine. Rarely does it take over a second for the replication across a 100Mbit network.. except where we're storing sizable LOBs or LONGs (normal transit time over network applies, very little latency induced by the database).
There's an appropriate use for Oracle, MySQL, and even SQL Server. Match the database to your requirements. If you need the more powerful features found under the hood of Oracle, then use it. If you're only worried about clustering but require referential integrity and other ACID components, then MySQL will fit the bill just fine.
-AC
I agree that the Open Source solutions don't compare. As a cheaper alternative to Oracle, MS or DB2, I hear you can cluster SQL Anywhere from iAnywhere. It runs on Linux, etc and apparently is good for over 100 GB and a couple hundred users.
http://dbd-hard.sourceforge.net/
I've been working on this project for some time now. HARD stands for High Availability and Replication Driver (although server would be a better term than driver). It is exactly what the article is asking for. The problem is that I can't do it all myself and there is alot to do, but the alpha version I originally tested worked perfectly, just stability problems (hence the re-write)
I've been primarily using SQL Server [release of the moment] throughout my working life, with a little teeny bit of Oracle and (very recently) turned MySQL from something I play with as a hobby into part of a project my boss has me working on. Of those three, only MySQL (arguably because I haven't been working with it in a business/production environment for even a year yet, but still) has never let me down. I've had all manner of glitches, inexplicable crashes, and entire tables changing their datatypes and horking the applications I had hooked up to them from Oracle & SQL Server, which I'll admit can probably be blamed on the clueless admins (whose most common response was "uh...data type?"), but I was exceptionally clueless when I started playing with MySQL a couple of years ago and I have yet to see any signs of horkage. Heck, I'm nowhere near MySQL-certifiable, and like the parent says I see no reason that it's not a good setup within certain conditions. I wouldn't use it for everything, mind you, but I also don't eat my soup with a pitchfork.
...though from the environments I've worked in, it appears that "the right tool for the job" is Phil, that creepy guy in Marketing. Ew.
"Linux doesn't exist. Everyone knows Linux is an unlicensed version of Unix"- Kieren O'Shaughnessy
From what I've heard, Google uses open source technology for their search engine. They surely have an open source clustered database for their search terms. Or have they come up with some sort of custom dbms?
really it's not that hard to find this information. goto postgresql.com, click on support and then read:
http://www.pgsql.com/support/
postgresql, like alot of opensource software, has a free component as well as a commercial one. statements like:
I don't doubt the capabilities of PostGres and MySql, but when your DBMS is doing memory dumps, you don't want to have to scour google for an answer. You want the vendor on the phone as fast as possible and MS, in my exeperience, is very good at phone support when it comes to SQL Server.
would indicate that very little research was done when determining how to build your cluster. this is fine if you are familar with ms, but implying that the only way to get support for opensource solutions is through google is very misleading.
-- john
I haven't read up on mysql lately but i'd double check and make sure that isn't an issue for you before you make a jump.
IMO OSS is fine for internal low-key applications. But for production financial systems I wouldn't bet my career on an OSS solution. The biggest reason is support. While OSS databases may have more 'developers' they don't have the same level of support as say Oracle. If I have a P1 problem, I can get a technician working on it 24x7 until its resolved. Sure, i'm sure there are companies that may offer support for OSS databases but like I said, i'm not going to bet my career on them.
There could also be financial arrangements within the level of service contracts.. I doubt you'll see anyone in the OSS community providing that.
Besides slashdot, I haven't run into many people pushing OSS software (including linux) and i've worked for a couple of large IT organizations. Unfortunately, there seems to be too many 'zealots' out there ready to push their own preference of software without actually looking at the issues objectively. I see it everyday here at work. The oracle people push Oracle, the microsoft people push SQL Server, the OSS people (haven't run into any of those yet..).
"Thanks to the remote control I have the attention span of a gerbil."
Basically the big problem is that DB servers are IO-bound, not CPU-bound. Clustering is a way to increase throughput for CPU-bound application, up to the point where they become IO-bound.
Clustering DB servers only works in situations where you have a write-seldom, read-often application. Otherwise, the bandwidth traffic of write replications starts to overwhelm the useful bandwidth.
All that said I have seen a MS SQL Server setup that may be interesting to the Free/OSS database folks; basically you have a SAN and connect multiple database servers to it. It requires specialized hardware, but you could probably fake it with a computer with a big raid array and gigabit networking.
The idea was that the SAN manages disk accesses such that multiple computers can read the same disks at the same time, and therefore everyone always has a fresh copy of the data.
I am disrespectful to dirt! Can you see that I am serious?!
YES .. with innodb transaction
Hi,
I wonder why nobody mentions Interbase and/or Firebird, since it is a reliable and fast database. Firebird is a fork of Interbase 6, because Borland stopped distributing it under an open-source licence since they released Interbase 7. So the Firebird guys startet to work on and enhance IB 6, and they're doing a great job.
Is there any cluster-like support for Firebird at all?
cu, Daniel
eat(this);
My take has a developer taint (i'm not a DBA nor do I pretend to be one), but it is two fold, the first is that all though they may be faster trigger and stored procs for the most part only serve to confuse and frighten the developer, because you've moved events outside of the programmer's control, (hmmm that table got updated but none of my code touches it) By avoiding triggers and stored procs you also make portability easier. The first rev of a product I worked with had code that saidI took all of that out and made non-db specific code this certainly made it easier when our company was aquired and I now had to support MSSQL, Oracle, Sybase and DB/2 on four different platforms non the less!
The second comment I want to make is that ObJectRelationalBridge (OJB) from Apache will eliminate 99% of your SQL from your Java code, but letting you treat your objects as object and you no longer have to worry aboout SQL for the simple things.
Right now all of my servers are running db2 8.1.0 & 8.1.2 on aix 5.1. Reliability is great, performance is fine. These are all stand-alone 2-6 CPU boxes without extended memory, interpartition-parallelism, etc - so they have minimal OS dependencies.
I'm setting up our RH7.3 servers next week. I think we'll be fine - by avoiding certain features, and having the flexibility of designing our own application.
Good luck
back in the day i ran IT for a company that, sadly, has gone the way of many many software companies. so it goes.
anyway, we used something by these folks:
http://www.missioncriticallinux.com/
called 'convolo' but the technology is called 'kimberlite'. it worked quite well - as we were working for some big name movie houses we needed full availability. our testing was quite intense and it worked fantastic.
its GPL - mission critical provides service, if i'm not mistaken.
you'll need to go a bit nuts with the hardware, of course...
enjoy!
I wouldn't be surprised to see OSDL submit a TPC test in the next couple years.
--- It is not the things we do which we regret the most, but the things which we don't do.
It could be me but I think your math is a little off.
2 Nines
87.6 hours per year down time
438 minutes per month down time
101 minutes per week down time
3 Nines
8.76 hours per year down time
43.8 minutes per month down time
10.1 minutes per week down time
4 Nines
52.6 minutes per year down time
4.38 minutes per month down time
1.01 minutes per week down time
5 Nines
5.26 minutes per year down time
26.2 seconds per month down time
6.06 seconds per week down time
Based on one average year having 31536000 seconds (365*24*3600), one average month having 2628000 seconds (average year / 12) and one average week having 606462 seconds (average year / 52).
Yes I know we can and should define the extact number of seconds in each month (28, 29, 30 or 31 days) and a week should be 604800 seconds (7 * 24 * 3600) but for the sake of arguement, it's close enough.
If VISTA is the answer, you didn't understand the question
If it's been more than a year since you benchmarked Postgresql, you should really try one more time. there have been massive performance improvements in the 7.2/7.3 and 7.4 beta (soon to be rc) releases there.
/ Tidbits /perf.html
Note that, much like db2, you have to tune it to your load to get best performance, but you could afford to buy a nice used E10k for less than the cost of the db2 licenses you had and run Postgresql there.
For tuning, go here:
http://www.varlena.com/varlena/GeneralBits
--- It is not the things we do which we regret the most, but the things which we don't do.
It may be petty but are which technology are we talking about? From my understanding load-balancing is having whole requests sent to different machines based on the individual machines availability while clustering is sending one request to a group of machines who work on that request together. I think we're talking about load-balancing.
and the triple-expanding spray foam
and the vice-grips
and the spork^H^H^H^H^H
-73, de n1ywb
www.n1ywb.com
I more liek teh mikky mouse
multi master data-bases do updates on all the masters at the same time using a two-phase (or other exotic) commit mechinism. All of the masters are updates in the same transaction. Kinda ups the bandwidth requirements on the backbone between the masters. To avoid the reliablity problem, down servers are noted and updated later (before they come back online) by log sniffing or something.
This seems to be a large operation with much money involved. So you it is very likely that You need professional support. There is a company behind MySQL that maintains the server and sells support contracts (at least I think so). So MySQL might be much better acceptable to Your boss than Postgres. Postgres might be better (can do more) but since You would be buying a large contract from MySQL or Oracle, just give both a call, ask Your question very specifically and then compare the answers.
You might even find a company to offer You Postgres support, but I dunno.
Btw. if You want 24/7 with low response times You have to buy the whole system somewhere. The only company that offers such kind of contracts (worldwide under 4 hours for anything) on x86 (cheaper than the other stuff, like Sun or IBM) is Red Hat with their Server for Linux afaik. You might then be able to get them to support Your MySQL system as well, because if You have two vendors, like Red Hat and Oracle then they might blame each other in case of a failure.
But those contracts at Red Hat cost six figures.
Just go for Debian GNU/Linux!!!! I love it and it runs and runs and runs....
If it's an INSERT, it certainly can't commit, no?
In those cases, one of the masters MUST be shot in the head (That is, it should shut down). The other may then journal any commits it does until the other master can contact it and play the journal back. Of course, while doing that, it must not accept any queries.
And this is based upon the fact that the third world has begun marching towards the first world. Whatever my clown. I think you're too use to zero-sum games, and believe that their gain is our loss. Personally I think 1 billion wealthy Chinese is a pretty lucrative potential market.
$20,000
t Ds pRte.jsp?section=11221
http://oraclestore.oracle.com/OA_HTML/ibeCCtpSc
Or $400/named user.
The source was my gun. I got some good clusters on him!
Actually, with modern disks this is something you don't have much control over any more. There's a good description of disk organisation here where you can find the following explanation:
And disk geometry becomes even more of an abstract when you bring intelligent RAID controllers into the picture (like HP's HSG and EVA controllers).
though they may be faster trigger and stored procs for the most part only serve to confuse and frighten the developer, because you've moved events outside of the programmer's control
It will only "confuse and frighten" developers that don't know SQL, and who, by definition, shouldn't be programming database apps in the first place
, (hmmm that table got updated but none of my code touches it)
So dump out the metadata table where the trigger and SP code is stored. It's not exactly rocket science
By avoiding triggers and stored procs you also make portability easier.
But not more reliable. What happens when hundreds of connections are being made to a transactional database, where updates are being made that need to cascade down several tables in order to ensure referential integrity. You really want to do that without triggers? With your "portable" apps, either you'll have to lock *all* the tables (if the database does not support row locking) involved in an update until *all* of the updates associated with a single transaction, or you'll just have to tolerate concurrent updates destroying your referential integrity.
I've seen programmers do the latter, and then refuse to disclose their source code on the basis that they're the big expert. What a larf.
Furthermore, you must be only acquainted with toy databases, if the order of magnitude difference in the execution speed of a stored procedure is not a *major* consideration. Databases are operating all the time, and there are some operations which must be completed before others. If your little java app falls over in the middle of taking ten times longer to complete a critical transaction with all of its tables locked when the same transaction could have been completed with a single stored procedure that fires off half a dozen triggers, and automatically rolls back if it doesn't complete -- I know which one I'd prefer. If it was my money being transferred from one account to another in a bank, for example.
It will only "confuse and frighten" developers that don't know SQL, and who, by definition, shouldn't be programming database apps in the first place
>, (hmmm that table got updated but none of my code >touches it)
So dump out the metadata table where the trigger and SP code is stored. It's not exactly rocket science
I agree, but the point I'm trying make is that the developers should be abstracted from the DBs all together, the only thing worse than developers acting as DBAs is DBAs acting as developers.
But not more reliable. What happens when hundreds of connections are being made to a transactional database, where updates are being made that need to cascade down several tables in order to ensure referential integrity. You really want to do that without triggers? With your "portable" apps, either you'll have to lock *all* the tables (if the database does not support row locking) involved in an update until *all* of the updates associated with a single transaction, or you'll just have to tolerate concurrent updates destroying your referential integrity.
If you are developing a "real" app you should be using a database that supports row level locking. What is less reliable is when the app doesn't work as the code shows because activities are going on behind the scenes.
I've seen programmers do the latter, and then refuse to disclose their source code on the basis that they're the big expert. What a larf.
I've seen DBA explain that it must be the code causing the problem only to have write a very simple app (so the DBA can understand) to prove that the problem is on the DB side. What a larf.
Furthermore, you must be only acquainted with toy databases, if the order of magnitude difference in the execution speed of a stored procedure is not a *major* consideration. Databases are operating all the time, and there are some operations which must be completed before others. If your little java app falls over in the middle of taking ten times longer to complete a critical transaction with all of its tables locked when the same transaction could have been completed with a single stored procedure that fires off half a dozen triggers, and automatically rolls back if it doesn't complete -- I know which one I'd prefer. If it was my money being transferred from one account to another in a bank, for example.
I suppose if you consider DB2 or Oracle toy databases then I guess you are correct, you are out of my league. The comment "If your little Java app falls[sic]" is typical of the ego you have to deal with when working with the typical DBA and the need to have developers dealing with DB side of thing and more with the business logic. Usually the "little Java app" is a J2EE application and not the "toy" that most DBA seem to think anything not written in PL/SQL (or the like) is.
I agree, but the point I'm trying make is that the developers should be abstracted from the DBs all together, the only thing worse than developers acting as DBAs is DBAs acting as developers.
In a perfect world, both would understand the requirements of the system and work together.
But not more reliable. What happens when hundreds of connections are being made to a transactional database, where updates are being made that need to cascade down several tables in order to ensure referential integrity. You really want to do that without triggers? With your "portable" apps, either you'll have to lock *all* the tables (if the database does not support row locking) involved in an update until *all* of the updates associated with a single transaction, or you'll just have to tolerate concurrent updates destroying your referential integrity.
If you are developing a "real" app you should be using a database that supports row level locking. What is less reliable is when the app doesn't work as the code shows because activities are going on behind the scenes.
You mean like when several different apps are all doing transactions at once, and one app has just done a query and is getting ready to lock a row in response to it -- and another app has already gone and changed it? Because the updates were implemented with query-lock-update-query-lock-update rather than all in one go with um, you know those overrated things called triggers and stored procedures ?
I've seen programmers do the latter, and then refuse to disclose their source code on the basis that they're the big expert. What a larf. I've seen DBA explain that it must be the code causing the problem only to have write a very simple app (so the DBA can understand) to prove that the problem is on the DB side. What a larf.
Sound like both sides have to stop larfing at each other and start cooperating . A J2EE update that does query-lock-update-query-lock-update and hence gets trod on by other concurrent J2EE apps, and hence falls over is not the DBA's arrogance getting in the way, it's the developer's lack of database development experience. Java is a wonderful language, and does wonderful things for portability -- but certain transactions do have to be executed all in one go, and without triggers and stored procedures, the system will not scale nearly as far, so a balance needs to be struck between portability and specificity to that database, if it's going to work --and scale.
This is a big challenge, and I wish you well. Good luck getting your DBAs with the program. Involving them in the requirements review, development cycle and conformance testing might be a big help here, as well as asking them to give you some advice on DB specific trigger and stored procedure implementation -- particularly if you want to avoid things "going on behind the scenes" foiling your beautiful portable code. Java is more readable, and IMHO, because Java developers have typically a much more intensive training in how to program, typically much better designed than some of the piles of triggers, queries, snippets and pl/sql I've seen out there. But some of the database functionality you can get virtually for free with those big expensive RDBMS engines are just...merely necessary, if you don't like them, wonderful if you need the thing to be scalable on the same hardware.