Open Source Database Clusters?

Bailing wire and duct tape by Anonymous Coward · 2003-09-11 11:46 · Score: 4, Funny

Works everytime.

Re:Bailing wire and duct tape by Anonymous Coward · 2003-09-11 11:49 · Score: 2, Funny

Clusters? Usually, we just tape a bunch of cats together...
Re:Bailing wire and duct tape by Vaevictis666 · 2003-09-11 12:12 · Score: 2, Informative

Off-topic, but it's Baling wire. As in the wire used to hold together bales of hay and whatnot.
Re:Bailing wire and duct tape by Read+Icculus · 2003-09-11 12:51 · Score: 2, Informative

This is just as off-topic as the grandparent, but having grown up around plenty of farms I've used and seen baling wire more than a few times. It is indeed actual wire, (galvanized steel according to one site that still sells it), and it works just fine for tying up a bale of hay. Some people still use it, but it has mostly been replaced by plastic twine, as it is indeed cheaper and easier to use. Here's a link to a fellow waxing sub-poetic on the bygone days of baling wire - Read All About It. And this is a site that sells baling wire, and has a few pictures - REC.

--
Anti-social? My code is just platform-specific.
Re:Bailing wire and duct tape by nolife · 2003-09-11 13:12 · Score: 2, Funny

Damn, I thought is was barbed wire. No wonder I was the only one having a hard time.

--
Bad boys rape our young girls but Violet gives willingly.

transactionality is hard by solarisguy · 2003-09-11 11:48 · Score: 5, Insightful

For what it's worth, the commercial solutions are hard to setup, unstable and terribly difficult to maintain, and this is after a small fortune has been invested in making them work. Not to knock the open source solution, but it's hard to beleive that something that is infrequently used and difficult to understand will be truly production quality if you want to use it for money.

Re:transactionality is hard by subk · 2003-09-11 11:50 · Score: 3, Funny

Is it just me or is that last sentance rather hard to parse?

--
Now, if you'll excuse me, I have backups to corrupt.
Re:transactionality is hard by sys$manager · 2003-09-11 11:55 · Score: 4, Insightful

Oracle 9i RAC running on Veritas Foundation Suite HA+ Database Edition is a snap. Parallel server in 8 was hard to set up and unstable though.
Re:transactionality is hard by djbckr · 2003-09-11 13:09 · Score: 2, Informative

Our shop is running 9i RAC. It's hard to set up, (for a newbie at clustering).
Now that I have, it's pretty cool and quite stable. We've tested transparent failover a few times (once due to an instance failure) and nobody notices. Amazing.
In my opinion, it's worth the cost. We'll have to agree to disagree with open source solutions. For those that can't afford it, I suppose the alternative is the better solution.
Re:transactionality is hard by Rorschach1 · 2003-09-11 15:12 · Score: 4, Insightful

I never set up a pre-9i cluster on Windows, but I ran Parallel Server in 7.1 on OpenVMS and it worked great. Of course, OpenVMS has had real clustering for a long time - Windows still isn't anywhere near where OpenVMS was 20 years ago.
Re:transactionality is hard by sys$manager · 2003-09-11 17:32 · Score: 3, Insightful

Nothing will ever touch VMS clustering, it's kind of sad that 20 year old technology is so much more stable than modern technology. I mean VMS has only advanced two major versions in ten years.
Re:transactionality is hard by nettdata · 2003-09-11 23:02 · Score: 2, Funny

I mean VMS has only advanced two major versions in ten years.

Yeah... then all the maintainers went off and started Debian.

--

$0.02 (CDN)

Check out Emic Networks by venom600 · 2003-09-11 11:49 · Score: 5, Informative

We've been evaluating the Emic application cluster for MySQL and have had pretty good results. It's a new product (so YMMV), but it looks promising.
Emic Networks

Re:Check out Emic Networks by grugruto · 2003-09-11 19:36 · Score: 2, Interesting

One main issue I see with the Emic solution is that it does not support transactions. I saw their demo at the last LinuxWorld in SF and they are just using a multicast layer to broadcast the queries to all nodes (they don't parse SQL so they can't handle transactions properly).
Moreover, if you have queries like UPDATE ... WHERE date=NOW() , you will just get a different result on every node! At least, solutions like C-JDBC replaces macros such as NOW or RAND on the fly so that all databases are consistent.

MySQL Replication by infernalC · 2003-09-11 11:52 · Score: 4, Insightful

MySQL has very nice replication functionality, and, in certain circumstances, you can even set up replication rings. It is somewhat flexible about the topology you choose to use, so pick the one best for your application. Load balance ala DNS and you're in business.

Re:MySQL Replication by sys$manager · 2003-09-11 11:58 · Score: 2, Insightful

Yes, as long as you don't give a damn about ACID compliance.
Re:MySQL Replication by Hamstaus · 2003-09-11 13:25 · Score: 5, Informative

Rings? Are you implying *all* servers involved in the replication process could handle writes rather than a master that handles writes and a bunch of slaves that handle all the read access? If this is true, point me to some docs :) That would be too cool.

Here you go.

The part you are probably interested in is this:
You should run your slaves with the --log-bin option and without --log-slave-updates. This way the slave will be ready to become a master as soon as you issue STOP SLAVE; RESET MASTER, and CHANGE MASTER TO on the other slaves.
Note that if you decide to "ring" your server setups, then you are not necessarily helping distribute the load, you are simply creating redundant masters in the case that your primary machine becomes unavailable. Also, you'll have to write your own monitoring scripts. MySQL says they are working on some tools for this... I'm excited to see what they come up with.

--
I moderate "-1, Fool"
Re:MySQL Replication by linhux · 2003-09-11 13:32 · Score: 2, Insightful

I used to say the same a few years ago. But nowadays I say that most website that actually has some kind of functionality at all above "insert-update-select" an article or similar, needs - or at least wants - transaction and isolation support. It makes it much easer to create logical database schemas without having to tuck everything into a single table just to make sure you only need one atomic INSERT. Instead you can spread things out and allow for a lot more flexibility - and then you'll want transactions to maintain consistency among the tables.
Re:MySQL Replication by techwolf · 2003-09-11 13:58 · Score: 3, Informative

Bah, DNS isn't load balancing.

LVS + MySQL works really well. We've got grouped clusters of databases that we can allocate more/less resources to as needed. Reporting cluster for the slower queries, faster cluster for the real-time queries and a few specific application clusters.

Replication keeps them in sync but there isn't a good HA solution available for the master database yet. Perhaps in MySQL 5.0. In the meantime, use DRBD + heartbeat for near HA.

--
I don't do this for karma, I do it for cash. It's much better.
Re:MySQL Replication by csnydermvpsoft · 2003-09-11 14:08 · Score: 2, Informative

Note that if you decide to "ring" your server setups, then you are not necessarily helping distribute the load, you are simply creating redundant masters in the case that your primary machine becomes unavailable.

Not necessarily. The largest part of most database access is reads - searching, retrieval, etc. This often times vastly outnumbers writes, depending on the application. Reads do not have to be replicated, giving a big performance boost.
Re:MySQL Replication by fava · 2003-09-11 15:29 · Score: 4, Informative

Almost right. MySQL is free to use in a commercial application, its not free to distribute or embedd in a commercial application.

MySQL is dual licenced, and one of those licences is GPL. You can use mySQL for free anywehere and in any manner that conformed to the GPL.
Re:MySQL Replication by strobert · 2003-09-11 17:30 · Score: 3, Insightful

sorry, maybe it is just me, but the whole "ARRGG IT AIN'T ACID" is a lot of hype to me. ACID boils down to transactions. plain and simple.

And I have found in many applications it is easier to deal with transaction type data consistency at the app layer instead of the db one.

knowing that a DB transaction is complete doesn't help you if for in order to move forward you have to have db ops done in mtuliple servers and/or a change happen with an external vendor.

And generally some bad code/process will at some point munge your data in a similar way as if you had a db crash in the middle of a transaction.

I have generally seen for most applciations you are better off just coding things to treat outside input (including data from a db) as evil until you
have verified it and cope with the abnormalities.

Yes there are exceptions, but ACID tends to be a knee jerk reaction, and most people realyl need to be askign themselves what it ACTUALLY buys them.
Re:MySQL Replication by Daniel+Phillips · 2003-09-11 19:01 · Score: 4, Informative

sorry, maybe it is just me, but the whole "ARRGG IT AIN'T ACID" is a lot of hype to me. ACID boils down to transactions. plain and simple.

Perhaps you need a deeper understanding.

ACID tends to be a knee jerk reaction, and most people realyl need to be askign themselves what it ACTUALLY buys them.

It buys them a database that you they can expect to still be there, sound and consistent, after the machine blows a fuse in the middle of 200 simultaneous updates. It buys them a database that doesn't accumulate rot over time because somebody deleted a customer at the same time somebody in another city entered an invoice. It buys them queries that give the right answer, because of only ever seeing the database in a consistent state, even while other queries running at the same time are only partially completed.

Basically, it gives them a database capable of completely correct operation, not just mostly correct. Of course that may not matter to you, in that case I have a faulty pacemaker to sell you.

--
Have you got your LWN subscription yet?
Re:MySQL Replication by Brian+Blessed · 2003-09-12 11:14 · Score: 2, Informative

No, that does NOT conform with the GPL.

Yes it does. Where in the GPL does it say that the software cannot be used in commercial applications?
It does not! It only requires that you provide the source (which has nothing to do with whether the software is commercial or not).

-1:Troll by stratjakt · 2003-09-11 11:52 · Score: 4, Insightful

Do anyone have experiences with clusters of MySQL , Postgres-R, C-JDBC or other solutions? How does it compare to commercial products?

They don't compare to commercial products. I know it isn't what you want to hear, and there are hundreds of kids here to tell you different, but they just dont compare. Those kids database experience doesn't extend past an address book.

Even if you manage to get them to technically keep up, transaction wise, to Oracle or SQL Server, the ACID enforcement isn't there, the syntaxes are kludgy. Gack.

My company ships products with SQL Server or Oracle as the back end. I've tried to put together an OSS solution so I could impress the big boss with millions of bucks of saved license fees. They just aren't anywhere close IMO.

Run a SQL Server farm on the back end if you cant afford an Oracle license. Don't be an OSS idealogue in the business world, you end up unemployed.

--
I don't need no instructions to know how to rock!!!!

Re:-1:Troll by venom600 · 2003-09-11 11:59 · Score: 5, Insightful

ACID enforcement isn't there
Actually ACID compliance is getting pretty darn good in databases like MySQL. Care to elaborate about what ACID compliance issues you have?

Don't be an OSS idealogue in the business world, you end up unemployed.
Actually, in our flailing economy 'OSS idealogues' as you call them are making a lot of head-way. OSS now has a viable alternative to *just about* any commercial enterprise software out there.
Re:-1:Troll by Elladan · 2003-09-11 12:20 · Score: 2, Interesting

Oh come on. 99% of the time, this "failure" is due to the admin having the number of concurrent mysql sessions set lower than the number of apache sessions. Since they'll never hit that situation in their half-assed testing with one browser...

And how exactly do you intend to compare the situation where MySQL saturates to the situation where apache saturates remotely, exactly? If apache falls over, you're getting no connection at all. Perhaps the database is working great, you'll never know. :p
Re:-1:Troll by zulux · 2003-09-11 12:28 · Score: 2, Troll

Oracle has some amazing features, but PostgreSQL kicks the crap out of MS SQL Server -

MS SQL skips record when it queries - more info here

MS SQL crashes for no fucking reason.

MS SQL requires x86 hardware - No Sparc, No POWER, No MIPS. Just crappy x86.

There is no 64 bit version os MS SQL.

PostgreSQL has a very robust multi-version concurency controll mechanism - somthing MS SQL could only dream of.

And if your *REALLY* need to scale PostgreSQL - run is on a SUN/SGI/IBM.

Not a bunch of fucking Intel toys.

--
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.
Re:-1:Troll by venom600 · 2003-09-11 12:30 · Score: 4, Insightful

I realize that stuff has improved in the year since I've seriously looked at it, but I'm doubtful it's reached the level of Oracle or SQL Server.
You should look again. MySQL, for example, has full transaction support with InnoDB table types.....AND it's pretty damn fast.

Watch 404 messages from websites for telling clues - mysql always fails before apache.
I'm sorry, but that doesn't seem like a very accurate way of measuring database reliability. One of the cool (and sometimes harmful) things about open databases is that there is no entry fee...meaning anybody and their brother can set up a MySQL server. This means that the number of ill-managed MySQL servers out there probably out-numbers Oracle or SQL Server installations (which, typically, have a somewhat knowledgeable admin behind them) by 10 to 1. A MySQL database managed by somebody who knows what they are doing will go head to head with Oracle or SQL Server installations which are also managed by someone who knows what they are doing.
Re:-1:Troll by Kunta+Kinte · 2003-09-11 12:30 · Score: 4, Insightful

Run a SQL Server farm on the back end if you cant afford an Oracle license. Don't be an OSS idealogue in the business world, you end up unemployed.
And I would fire the IT guy who causes my company to spend $10,000 for SQL Server in a situation where the free MySQL or Postgres would do.
Just focus on the right tool for the job. If the database is a simple one. If it is regularly backed up and your company can stand a small period of downtime, why on earth would you buy Oracle or MS SQL Server?
This is not to say that MySQL is unreliable. I have *never* seen MySQL crash, or lose any of my data. So it would be silly of me to go with Oracle, just because everyone else is doing it.
The right tool for the job people.

--
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW
Re:-1:Troll by Lumpy · 2003-09-11 12:33 · Score: 3, Insightful

They don't compare to commercial products. I know it isn't what you want to hear, and there are hundreds of kids here to tell you different, but they just dont compare. Those kids database experience doesn't extend past an address book.

yeah, nobody would ever run a high traffic website on OSS database.....

Anyone know of a high traffic website that uses per and OSS database servers in a cluster?

Oh yeah.... this place

--
Do not look at laser with remaining good eye.
Re:-1:Troll by jamie · 2003-09-11 12:53 · Score: 4, Informative

"Last time I read one of Rob's reports on slashdot they had 10 terabyts of data in the database.. and that was 2 years ago. no that's not "ALOT" but it's nothing to sneeze at."

Nah, our DB totals only about 6 GB. Slashdot isn't an especially big database.
Its only claim to fame is that it delivers about 30 dynamic pages a second, 12 hours a day.
Re:-1:Troll by Tmack · 2003-09-11 12:54 · Score: 4, Interesting

I would have to second this. I use MySql at work as the main database for the NOC and service activations and circuit delivery groups. The database (running off an old Sun Netra box) handles the load of all the scripts (mostly perl) used by all those groups. This includes scripts that monitor circuit status (ala Netcool), test new circuits, keep track of customer installations, change requests, troubles, router configs, etc... The MySql server has never caused dataloss, and the only instances where it "crashed" were errent querys in alpha CGI script releases that caused basically an infinite loop around a search on the 20K+circuit entries on a non-Indexed field, that a simple restart of the mysqld fixed. Even when the Beta version was released running on a linux P4 box we never had issues, as opposed to the Oracle backended system used for the main corp. database that regularly causes much frustration among co-workers (not to mention the internal conflict between 2 development teams (corp vs us) trying to control the access and data of the corp database vs the ease of development of new utilities to make Customer installation and support easier.
TM
P.S.Cant wait for our Sun V280r shows up!

--
Support TBI Research: http://www.raisinhope.org
Re:-1:Troll by afidel · 2003-09-11 12:59 · Score: 2, Informative

There is no 64 bit version os MS SQL

Bullshit, it's been out for months, see This article. As to the rest of your argument check out TPC-C results and say that MS SQL doesn't scale, it's the second highest scorer and has 6 of the top 10 results. This is a real world load testing benchmark that many companies base purchasing decisions on. (ok the MS solutions are a little unusual in that they are shared-nothing but the other competitiors are free to do likewise).

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:-1:Troll by justin.warren · 2003-09-11 13:12 · Score: 4, Informative

An appropriate subject line.
Some of us who compare OSS databases to commercial ones have experience that extends past address books. And no, I'll pass on the DSW if you don't mind.
My main problem with PostgreSQL is the query optimiser. Oracle's query optimiser is definitely superior as Postgres occasionally comes up with some peculiar query plans. In a product I'm involved with, we hand tune our SQL from the ground up, so this is less of a problem for us. I find the two products to be pretty comparable in other aspects, though I haven't tried Postgres-R yet.
I haven't played with MySQL since back when you couldn't do sub-SELECTS, so I have no idea how much it's progressed since then.
At this stage, I'd suggest you stick with a commercial product for replication or clustering for high end work. Clustering and replication is still the bleeding edge for OSS, so use it with caution on non-critical tasks. Having said that, these are complex tasks you're talking about, and even the commercial products have their own peculiarities at times. High volume replication using Oracle materialized views over database links comes to mind.

--
Just because you're paranoid doesn't mean they're NOT after you.
Re:-1:Troll by fupeg · 2003-09-11 14:48 · Score: 2, Insightful

ACID transactions using InnoDB, and performance is quite good on MySQL 4.0. My company uses a MySQL/InnoDB server for the loading of content from our partners to our site. The loader is an extremely multi-threaded piece of code that can put quite a strain on database. The loader also writes part of the data it loads to an Oracle 9i database. The MySQL db is running on a dual processor machine (the same machine the loader is running) and the Oracle db runs on its own 4-processor machine (both run Linux, though the Oracle one uses a slightly older kernel as required by Oracle.) When we crank up the loader, it's the Oracle server that becomes unresponsive first. MySQL scales better, at least in this case. We've had no problems with ACID transactions, and the loader code involves several long, distributed transactions (across dbs) running with isolation level of read-committed. We also use a third party search engine that uses SQL Server for persistence. This is typically the bottleneck in the system. When we talk about improving scalability and performance, the MySQL db is the last thing ever mentioned.
Re:-1:Troll by IANAAC · 2003-09-11 15:11 · Score: 2, Insightful

One of the cool (and sometimes harmful) things about open databases is that there is no entry fee...meaning anybody and their brother can set up a MySQL server.

Not to knock MySQL, but this is actually part of hte problem. You get anybody and their brother setting something up that then crashes and burns.
I don't care what anybody says, setting up a database is nothing trivial.
Regarding your comment that MySQL would go head to head with Oracle or SQL Server is, frankly, laughable.
I would suggest, partucularly to someone familiar with Oracle, PostgreSQL over MySQL any day. In addition to MySQL's much-hyped transactions (which have been in PostgreSQL forever), PostgreSQL has a procedural lanaguage with which Oracle folk would feel quite at-home.
Let's not forget that PostgreSQL's SQL implementation is much more standards compliant.
Again, this is not to rant on MySQL, but it kind of irks me that someone can slap up a MySQL DB and claim to know all about databases.
Re:-1:Troll by stanwirth · 2003-09-11 15:56 · Score: 3, Insightful

My main problem with PostgreSQL is the query optimiser. Oracle's query optimiser is definitely superior as Postgres occasionally comes up with some peculiar query plans.

I had the same experience. You basically have to optimse large queries combined with joins and subselects on Postgresql yourself -- and often with Oracle, as well, if its for tables with > 1-10M records. ish. You might want to check out DB2. Awesome clustering -- IMHO more sophisticated and flexible than Oracle's. YMMV depending on the application, as always. Also, if it's a development environment, you can test DB2 and Oracle on linux boxen to your heart's content for the same price as PostGreSQL -- free .

MySQL may be able to handle subselects, but it's still struggling with triggers and stored procedures.
Re:-1:Troll by jamie · 2003-09-12 02:02 · Score: 2, Interesting

It's been a long time since Slashdot went down for any significant amount of time.
We do planned code upgrades once a week and have to kick each webserver, but the load balancer keeps the site up transparently. We probably lose a total of a few hundred incoming connections each time we do that (a total of maybe 5 seconds worth, once a week).
In the last year, I think there was once that we had to roll code back and were probably down for a few minutes, and I think one other time when we were down for an hour, I forget what exactly.
And then of course we've had network troubles occasionally, but that could be us or it could be you :)
None of that has been because of database failure (to get back ontopic sorta :) ... the MySQLs just all keep humming.
Re:-1:Troll by LadyLucky · 2003-09-12 07:45 · Score: 2, Interesting

Actually ACID compliance is getting pretty darn good in databases like MySQL. Care to elaborate about what ACID compliance issues you have?
Bull pucky: From someone who with their only deployment of MySQL into a live environment went completely pear shaped, MySQL crashing several times per day. The damned thing doesn't report ANYTHING to the error log, except "I'm starting up again, and oohhh look at all that corrupt data, I hope I can do something about that!". I would never touch the database again, not with a 10 foot bargepole.
We're dropping that pile of crap faster than you can click the hyperlink on the MySQL website which says it may take up to two weeks to get any kind of support even in the case of an emergency.
We're now using MSDE for low powered embedded installations that the MySQL crowd had pushed prior to this. Who would have thought, use the Microsoft solution because the open source one doesn't cut it.
Sorry, it's been a long week of conference calls and VPNs in the middle of the night because MySQL decided to crash once again.
MySQL isn't there, and it's lost all trust from anyone who knows about databases.

--
dominionrd.blogspot.com - Restaurants on

Not personally, but by revividus · 2003-09-11 11:53 · Score: 5, Interesting

I've been looking into MySQL for a bit, and I saw this article recently, which is directly concerning clustered database servers running MySQL.

Maybe it will be of interest...

--

philcrissman.com.

Clustering is never fun by scosol · 2003-09-11 11:53 · Score: 2, Insightful

Open-source or not...

I would say just get a bigger box for your PostgreSQL solution and do semi-realtime remote replication on the tables you dont want to lose.

--
I browse at +5 Flamebait- moderation for all or moderation for none.

Huh? by Wakko+Warner · 2003-09-11 11:54 · Score: 4, Funny

You can "cluster" MySQL? Does it involve "rsync" and "cron"?

- A.P.

--
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"

The big problem is replication by MarkusQ · 2003-09-11 11:56 · Score: 5, Interesting

IMHO, the biggest problem is replication; keeping them all consistent in the face of asyncronous updates. It can also reduce/eliminate the advantages of clustering if you have a significant number of updates compared to the number of quieries.

I guess the best answer depends on how dynamic your data is. If it's static, there are all sorts of easy answers. If all the updates come from a central source, or on a predictable schedule, you're almost as well off. If updates come from the great unwashed but the data can be partitioned in some way (say, geographically) you can still do it. If updates come from all over but queries can be centralized, or if your database is tiny, or if latency isn't a problem, or if you have a machine that prints money, it can still be done.

If you want to do everything for everyone everywhere, right now if not sooner, for under twenty bucks, you're screwed.

So, what are your needs?

-- MarkusQ

Re:The big problem is replication by wfrp01 · 2003-09-11 12:34 · Score: 3, Informative

PostgreSQL has released their replication technology under an open source licence.

--

--Lawrence Lessig for Congress!
Re:The big problem is replication by caluml · 2003-09-11 20:06 · Score: 2, Funny

I swear, take your eyes off the internet for a weekend or two ...
Your request to surrender your Slashdot licence has been noted, and accepted.

--
Get your own free personal location tracker

PostgreSQL and pg_dump by zulux · 2003-09-11 11:57 · Score: 4, Interesting

Check out the new replication at postgresql.org: it's master -> multiple slave replication.

Then have your slave database query the master database - and if it no longer responds, it could promote itself to master.

The replication is the easy bit - the slave promotion is the hard and gritty bit.

--

Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.

Re:PostgreSQL and pg_dump by tcopeland · 2003-09-11 12:27 · Score: 4, Informative

Another hard bit is that the Postgres replication doesn't support sequences - see the details in the aptly named "Things to Remember" section of the installation documentation.

So if your master fails, presumably you have to recreate the sequences starting at a number high enough to avoid conflicting numbers before switching over to a slave. Seems like this could be a problem.

Nonetheless, Postgres is cruising away on RubyForge; 300,000 records and counting...

--
The Army reading list
Re:PostgreSQL and pg_dump by zulux · 2003-09-11 12:47 · Score: 2, Informative

I love PostgreSQL sequences - I think definatly a feature I would miss.

PGSQL stores it's sequences seperate from the tables. When you need a sequence, you can have PGSQL create one at table creation time, or you can link your table to an already available sequence.

The sequences are under your full controll - you can reset them, roll them back or set them to any value you want.

Clever things you can do with this:

Set one server to have high squence numbers and another to have lower ones - in the middle of the night they can swap data that they have dominion over. This poor-man replicaiton works well over flaky internet connections. Imagine a remote office with a 56K modem that swaps over in the middle of the night.

When you need to update the entire database to take advantage of another type of data abstrations - you can have your new style of tables use the old sequence numbers as well. The older tables grow and the new tables grow - but they never share a sequence number. The every night you can import the newer sytle of table with the old style data - you can let some users use the old front end, and test the new front end and the entire migration can be seamless.

--
Moneyed corporations, non-working 'poor' and criminal prisoners are turning productive citizens into tax-slaves.

MySQL + BigIP by zarthrag · 2003-09-11 12:00 · Score: 2, Interesting

What *I* would probably attempt would be to setup a replication ring, and use a bigIP to make them all look like the same server. Then you get your load balancing, and scalability. I have yet to try this, but I will in the (very) near future.

--
Why can't all fpga/microcontroller manufacturers just release free optimizing compilers???

High Availability by mcdrewski42 · 2003-09-11 12:00 · Score: 5, Insightful

HA is always crapshoot/tradeoff between cost and risk. Throw enough $ at the problem and you'll approach 100% availability.

I know that 'more robust' is a nice thing to want, but you really need to think about what you really need. If it takes 15 minutes to switch over to a backup copy (using some magic RAID disk mirroring maybe?) and 15 minutes to restart the app and let it checkpoint it's way up to a decent operational speed again, is that good enough?

If it takes an hour, how about that?

How much time/heartache or money is it worth for you to have system downtime, and how much are you willing to expend to reduce it by 5, 15, 30 minutes?

So, there's really a continuum of availabilty you have to pick your point in. At the low end, you have no backups and recreate everything from scratch. At the high end you use Vendor X's real clustering solution and 24x7 monitoring, then have zero downtime even in a disaster. Somewhere in the middle is you.

Now I realise this an overtly commercial view of things, but if needs be replace money with effort and season to taste.

--
/* affect != effect */ void affect(int *thing,int effect) { *thing += effect; }

Well.... by agent+dero · 2003-09-11 12:00 · Score: 2, Insightful

If you're working with enough data that would require a CLUSTER, then I would suggest a commercial product.

But if you need that SPEED, but not a lot of data storage, I'd say a decent sized MySQL cluster would cover you, depending on what your needs are.

If you are in the position to actually need a cluster to do that much work, you should be able to get something commercial and more large-scaled oriented

--
Error 407 - No creative sig found

eRserver by linuxwrangler · 2003-09-11 12:01 · Score: 5, Interesting

I have found PostgreSQL to be nearly bullet-proof. I routinely have connections up for months at a time (that's individual persistent connections - the server is up much longer and the connections usually get dropped when I upgrade the client software). Still, sh*t happens and replication has been a sore point for many databases both open and commercial.

You should investigate eRserver. It was originally a commercial replication product for Postgres but has been open-sourced. I haven't tried it yet but it's on my to-do list.

--

~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis

Re:eRserver by TheFuzzy · 2003-09-11 12:04 · Score: 5, Informative

Well, the .ORG domain runs on PostgreSQL + eRServer, so that's one scalable solution ...

Shared storage? by crstophr · 2003-09-11 12:01 · Score: 5, Informative

You can make a High Availability cluster out of most any software if you have some kind of shared storage.

People have used firewire drives connected to two different computers to accomplish this cheaply. Oracle is giving away a cluster filesystem (so they can sell RAC on linux) there is OpenGFS as well for filesystem usage.

Just write some basic monitoring scripts that will bring up your postgress database on the second server should the first one fail. Just make sure those scripts completely take down the old database on the first server in the case of a partial failure. Having two databases try to open the same data would be a really bad thing.

Here are some links to articles that should help:

Overview

Howto

Cluster Filesystem

These are mainly geared for Oracle/RAC, all you need is the firewire shared storage and cluster filesystem. You're on your own to write the monitoring and failover scripts. Hope this helps. --Chris

Re:Shared storage? by Kashif+Shaikh · 2003-09-11 13:41 · Score: 2, Insightful

Oracle is giving away a cluster filesystem (so they can sell RAC on linux) there is OpenGFS as well for filesystem usage.

I was saying "Wow! Oracle has released a clustered filesystem!", until I discovered it only works with shared-storage. Meaning it won't create a filesystem image across a cluster network, where data is distributed. But rather the cluster filesystem is stored in a centralized location, but can be accessed by multiple members of the cluster at the same time for both read and write.

Re:Clustered JDBC by Trejkaz · 2003-09-11 12:03 · Score: 2, Informative

Funny, that's what the article was about. c-JDBC is an implementation of RAIDb.

--
Karma: It's all a bunch of tree-huggin' hippy crap!

Emic, InnoDB Hot Backup by vinsci · 2003-09-11 12:06 · Score: 5, Interesting

Two MySQL products I found interesting (neither of which is open source at this time):

CLUSTERING IN TUNE WITH APACHE AND MYSQL (Free registration might be required. Also see Emic Application Cluster for MySQL
InnoDB Hot Backup (with point in time backup)

The rest of this comment is quoted verbatim from InnoDB News

MySQL/InnoDB-4.0.1 and Oracle 9i win the database server benchmark of PC Magazine and eWEEK. February 27, 2002 - In the benchmark eWEEK measured the performance of an e-commerce application on leading commercial databases IBM DB2, Oracle, MS SQL Server, Sybase ASE, and MySQL/InnoDB. The application server in the test was BEA WebLogic. The operating system was Windows 2000 Advanced Server running on a 4-way Hewlett-Packard Xeon server with 2 GB RAM and 24 Ultra3 SCSI hard drives.

eWEEK writes: "Of the five databases we tested, only Oracle9i and MySQL were able to run our Nile application as originally written for 8 hours without problems."

The whole story. The throughput chart.

--

Trusted Computing FAQ | Free Dawit Isaak!

Re:Emic, InnoDB Hot Backup by vinsci · 2003-09-11 12:37 · Score: 2, Informative

Sigh, the folks at eWEEK have revamped their website and in the process managed to kill most old links...
Use this link to the article instead:
Database Server Clash Revisited
http://www.eweek.com/article2/0,4149,1238712,00.as p

--

Trusted Computing FAQ | Free Dawit Isaak!

What is slashdot doing? by rtnz · 2003-09-11 12:06 · Score: 5, Interesting

What does Slashdot do for this? I recall way back in the day there was some information about what the Slashdot tech looks like, anyone have info regarding their database setup? L

Re:What is slashdot doing? by sbszine · 2003-09-11 12:54 · Score: 2, Informative

Slashdot runs MySQL db on a couple of boxes. Check the FAQ and the IRC interview log. According to the FAQ, Slashdot is / was financially contributing to replication in MySQL.

--
Vino, gyno, and techno -Bruce Sterling

is it just me? by gfody · 2003-09-11 12:09 · Score: 3, Funny

or does this term sound kind've like a made up buzzword like ".NET powered Java schemas!" or "SOAP servlet toaster oven with X-M-L!"

--

bite my glorious golden ass.

Replicated MySQL by Jack+Auf · 2003-09-11 12:12 · Score: 3, Informative

Using one server as a master and n servers as slaves. Just make sure to write everything to the master. Replication to the slaves generally takes about a second or maybe two depending on load.

OK, not quite the same thing but this works quite well for ready heavy applications, and is very reliable unless you get a slave out of sync.

This was on v3.n.n - the good folks at MySQL have made many improvements to the replication facilities in the 4.n series I believe.

--
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety" - BF

three types of clusters by u19925 · 2003-09-11 12:12 · Score: 4, Informative

there are basically three type of clusters:

1) shared nothing: in this, each computer is only connected to each other via simple IP network. no disks are shared. each machine serves part of data. these cluster doesn't work reliably when you have to aggregations. e.g. if one of the machine fails and you try to to "avg()" and if the data is spread across machines, the query would fail, since one of the machine is not available. most enterprise apps cannot work in this config without degradation. e.g. IBM study showed that 2 node cluster is slower and less reliable than 1 node system when running SAP.

IBM on windows and unix and MS uses this type of clustering (also called federated database approach or shared nothing approach).

2) shared disk between two computers: in this case, there are multiple machines and multiple disks. each disk is atleast connected to two computers. if one of the computer fails, other takes over. no mainstream database uses this mode, but it is used by hp-nonstop. still, each machine serves up part of the data and hence standard enterprise apps like SAP etc cannot take clustering advantage without lot of modification.

3) shared everything: in this, each disk is connected to all the machines in the cluster. any number of machines can fail and yet the system would keep running as long as atleast one machine is up. this is used by Oracle. all the machine sees all the data. standard apps like SAP etc can be run in this kind of configs with minor modification or no modification at all. this method is also used by IBM in their mainframe database (which outsells their windows and unix database by huge margine). most enterprise apps are deployed in this type of cluster configuration.

the approach one is simpler from hardware point of view. also, for database kernel writers, this is the easiest to implement. however, the user would need to break up data judiciously and spread acros s machines. also adding a node and removing a node will require re-partitioning of data. mostly only custom apps which are fully aware of your partitioning etc will be able to take advantage.
it is also easy to make it scale for simple custom app and so most of TPC-C benchmarks are published in this configuration.

approach 3 requires special shared disk system. the database implementation is very complex. the kernel writers have to worry about two computers simultaneously accessing disks or overwriting each others data etc. this is the thing that Oracle is pushing across all platforms and IBM is pushing for its mainframes.

approach 2 is similar to approach 1 except that it adds redundancy and hence is more reliable.

Re:three types of clusters by Pro_Piracy_Guy · 2003-09-11 12:27 · Score: 3, Informative

approach 3 requires special shared disk system. the database implementation is very complex. the kernel writers have to worry about two computers simultaneously accessing disks or overwriting each others data etc. this is the thing that Oracle is pushing across all platforms and IBM is pushing for its mainframes.
I recently attended an Oracle Convention, and the one thing everyone (except Oracle) will admit about RAC is that it is very difficult to implement and very very expensive. Of the many vendors, the cheapest RAC solution I came across was in the $30,000 to $50,000 range (scaleable turn-key solution - price does not include Oracle license fees). Most of the reps I spoke with said unless you are a huge enterprise with lots of cash to blow, RAC is NOT the way to go.
Just my $0.02

Das DB by Flip+Chart · 2003-09-11 12:12 · Score: 3, Informative

What about SAPDB isn't it a potential choice. I thought I read somewhere that MySQL and SAPDB were merging. Chech it out http://www.sapdb.org/

MySQL replication: Flawless (so far) by allankim · 2003-09-11 12:13 · Score: 5, Informative

I've been running a 3-4 node MySQL 3.23.x cluster on Slowlaris 9 since January. It has survived several catastrophic power outages and numerous other insults without a hiccup. Load is fairly light (about 3,000 updates daily and a similar number of queries on each server) so YMMV.

ZEO (Zope Enterprise Option) by Wheat · 2003-09-11 12:22 · Score: 3, Informative

ZEO will allow you to scale the ZODB (Zope Object Database) across multiple processors, machines, and networks. Although the ZODB is a Python object database, so it's probably not an option to port your current database. There are other limitations of the database - it's not always the fastest, it's an object database so concepts like foreign keys are not fully there, but it can give you high availability. As of new Zope 2.7 in beta though, ZEO is quite easy to set-up, and it is open source.

PostgreSQL eRServer 1.0 + Backplane by pabos · 2003-09-11 12:31 · Score: 4, Insightful

Two options I haven't seen anyone mention yet are PostgreSQL eRServer 1.0+ (see PostgreSQL news item "PostgreSQL now has working, tested, scalable replication!" from August 28, 2003 or a lengthier press posting "PostgreSQL, Inc. Releases Open Source Replication Version") and Backplane.

eRServer has been in development for over two years, is used in production settings and is released under a BSD license (as with PostgreSQL). It uses a single master/multiple slave asynchronous replication scheme. There are cautions in the release that replication may be difficult to setup.

Backplane seems to be particularly well-suited to clustering data quickly across a WAN. A quote may explain it better:

The Backplane Open Source Database is a replicated, transactional, fault-tolerant relational database core. Currently supported on Linux and FreeBSD, Backplane is designed to run on a large number of small servers rather than a small number of large servers. With Backplane, it is possible to spread the database nodes widely, allowing database operations to work efficiently over WAN latencies while maintaining full transactional coherency across the entire replication group.

Backplane's native quorum-based replication makes it easy to increase the database capacity (by simply adding a new node), re-synch a crashed server, or take down multiple nodes for maintenance (such as an entire co-location facility) - all without affecting the database availability.

I haven't used either yet, but you may wish to give them a look.

Talk to the folks at deviantart.com by cubal · 2003-09-11 12:36 · Score: 3, Informative

deviantart.com, IIRC, runs about 3 mysql servers behind a load-balancing cache/server, so have had to deal with a lot of the difficulties involved in that.

interesting press release by Kunta+Kinte · 2003-09-11 12:37 · Score: 2, Interesting

MySQL Teams With Veritas, SGI on Clusters - http://www.eweek.com/article2/0,4149,1208538,00.as p and http://www.mysql.com/press/release_2003_23.html

Supposedly should be out by now.

--
Based on upvotes, Ageism is the only "-ism" Slashdotters care about and think isn't SJW

Re:eRserver, more info. by ron_ivi · 2003-09-11 12:49 · Score: 3, Interesting

.org and .info are both using it.

The press release of ER Server becoming open source is quite informative (karma?) as well.

Marc of PostgreSQL Inc's an incredible resource on the postgresql mailinglists too; and PostgreSQL Inc has a really cool policy that allowed them to do donate their code to the community that way:

From their release: " "DATELINE FRIDAY, DECEMBER 15, 2000 Open Source vs. Proprietary: We advocate Open Source, BSD style :) We will consider and develop short term (up to 24 month) proprietary applications and solutions where there is a strong business and intellectual property case to be made. *All" proprietary developments that we are involved in *will* become open source within two years of implementation, without exception." ".

Also cool, they provide hosting http://www.pgsql.com/hosting/ which donates "25% of all profit from these services ... directly back into the PostgreSQL Project. "

Ron

I'm not affiliated with them in any way, just appreciative of Marc's contributions on the mailingslists and to postgresql as well.

Agreed. by oneiros27 · 2003-09-11 12:53 · Score: 4, Interesting

Availability is one of the basic issues when sizing your system. [ie, can you have it down at night for a cold backup, or does it have to be available 24x7? Can you even get a maintenance window once a month?]

As with sizing your UPS and/or generators, you need to determine what the cost to your business is for downtime.

Now, yes, you might have some issues in SLAs that spell out how much it'll cost you, if you have to refund customers's money [for service based orgs]-- or how much profit you'd lose if your customers couldn't purchase items [for sales based orgs]. But unfortunately, you have to also consider the recovery costs, the costs of damage to your reputation, etc.

If it's not worth your purchasing an Oracle or other, more expensive database, there's good odds that it's not worth the headaches of maintaining a high availability cluster with automatic failover. Instead, you can mirror the data, and keep transaction logs that you can replay.

You can have a spare system on standby, that you can keep updated on a regular basis (again, your cost of downtime, and the necessary time to recover the system will affect your choices), and when your main system should fail, you can push the most recent diffs to your standby, reconfigure the application servers to recognize the new server as the old one, and you're back in business.

It requires a bit of planning, and making sure that the necessary manual steps are well documented [so that anyone can do it, should the server outage be caused by something serious enough to take out your administrator, too], but it's easier and cheaper to build and maintain than a true cluster.

--
Build it, and they will come^Hplain.

Re:What the author really wants by mcpkaaos · 2003-09-11 13:00 · Score: 2, Funny

I get the feeling that in a room with more than one door, it takes you all day to find your way out. :)

--
It goes from God, to Jerry, to me.

Need to define the problem better by koreth · 2003-09-11 13:02 · Score: 4, Interesting

Why do you want clustering? Do you need to scale up transactions per second? If so, are these primarily reads or writes? The answer to that question can make a huge difference in your clustering and replication strategy.

Clustering read-mostly data for performance reasons is relatively easy; for many applications, where a second or two of staleness on the replicated databases is acceptable, you can make do with a bunch of independent copies of the database, with all updates going to an authoritative database and getting replicated out from there asynchronously.

If your data can be partitioned cleanly -- that is, if you have groups of tables that are never joined with tables in other groups -- then you can perhaps get some benefit from putting different data on different servers, with no replication required. Obviously that's only worthwhile if the query load is comparable between groups.

If, on the other hand, you require ACID-compliant updates of all the replicants as a unit, you're entering difficult territory and you might have no choice but to go with a commercial solution depending on the specifics of your needs.

At just about all of the places where I've done database programming where this has come up, we ended up buying a much beefier database server (lots of processors and memory, good I/O bandwidth, redundant networking and power supplies) with disk mirroring, rather than get into the headaches of replication. A big Sun or HP server is certainly more expensive than some mid-range Dell or no-name PC, but it may end up being cheaper than the engineering time you'd spend getting anything nearly as robust and high-performance on less expensive hardware.

I've also found that very often when there's a database bottleneck that looks like it requires bigger hardware, the problem is the data model or the queries (unnecessary joins, no indexes where they're needed, poorly-thought-out normalization, etc.) or the physical layout of the data (indexes competing with data for access to the same disk, fragmentation in indexes/data, frequently-used tables spaced far apart on disk.)

If I'm dealing with Oracle, sometimes the solution is as simple as adding an optimizer hint to make the query do its joins in a sensible way. Sometimes denormalization is helpful, though you want to be careful with that. Sometimes a small amount of data caching in the application can mean a tremendous decrease in database load. And so on.

If you can tell us more about the specifics of your situation, there are lots of people here who can offer more specific advice.

DB2 ICE sets TPC-H performance standard on Linux by Anonymous Coward · 2003-09-11 13:03 · Score: 2, Informative

Don't know how DB2 ICE would do compared to Open Source soloutions but take a look at the interesting results of the recent TPC-H benchmark performance testing on Clustered and non-Clustered 100GB and 300GB configurations. It appears that the IBM DB2 Integrated Cluster Environment (DB2 ICE) for Linux is heads above the rest.

Re:DB2 ICE sets TPC-H performance standard on Linu by dougnaka · 2003-09-11 13:23 · Score: 2, Interesting

I run DB2 on Linux.
It's been the largest pain in the ass I've ever had managing servers.
MySQL spanks DB2, as does postgreSQL.
Our DB2 on Linux crashed so much we spent months before we had a production ready system. We were replacing PostgreSQL and we had to rethink everything. It couldn't handle our insert load, and we were going from 4 dual 733 intel boxes to two large quad xeon boxes with 15,000 rpm disks.
We spent $100,000 on DB2 license (that with the discounted half price DB2 EEE for linux). We are now in the process of migrating to MySQL after some large benchmarks. With a few simple indexes MySQL inserts twice as fast as DB2 and selects in 0.00 seconds on any row, vs. DB2's .460 seconds for any row in a 22 million row table.
Throw in the support scam they pulled on us, and IBM is a joke of a company. If they weren't pushing Linux they'd annoy me more than Microsoft does. The support scam went like this. We purchased 8 CPU licenses for DB2 EEE In 8/02. In 3/03 we start recieving calls from salesmen to get our upgrade business since our 1 year support contract expires on 5/1/03. I call IBM with a serious chip on my shoulder and get the story that our anniversary date automatically defaults to any dates held by previous contracts, "it's easier that way". We had some AS/400's (talk about poor performing overpriced junk). So they wanted about $50K for "support" for another year. We declined their offer and considered suing. At $50k/year losing 4 months of support isn't acceptable to a small business.
So I am bitter at IBM. But not without reason. During our first 3 painful months deploying DB2 I opened 15 PMR trouble tickets. Of the 15 I resolved 14 while either on hold or waiting for a call back from them. ALL of the PMR's were opened with status "critical, production down". The last PMR IBM claimed to either be a bug in the Linux kernel or in DB2, they didn't know, but when I pressed, they did offer a patched version that we could "try out" on our production box to see if it worked. Throw in that clustering didn't work as advertised (not at all under moderate load), and DB2 is a pile of junk.

As the IT geek the fault landed squarely at my feet, so I did some thorough investigation and benchmarking. default config DB2 is considerably faster than both PostgreSQL and MySQL at everything but inserts. But throw in a few indexes and MySQL and PostgreSQL owns DB2's sorry excuse for a database.

I AM bitter, and this probably is flamebait. But I'm past caring about IBM and their scam operation. I'm sticking with what works, and so far NOTHING from IBM has worked.

I wasted 3 months of 7 day work weeks averaging 12 hour days on DB2 and it's so called Linux support.

end

--
My Linux Command of the Day site : LCOD

Off topic, but it's not. Why PDF and not HTML? by jerryasher · 2003-09-11 13:42 · Score: 2, Offtopic

PDF blows.

I hate PDF links. On Windows the experience is great, let's come to a complete halt as I watch CPU load hit 100%, wait for a splash screen, and watch the damned thing decide to show me the text at 245% zoom.

What a load of shit.

What's wrong with HTML as a virus free, pleasant to experience, documentation format?

Just say no to PDF.

Right tool for the job? by kpharmer · 2003-09-11 14:06 · Score: 2, Interesting

> The right tool for the job people

Right, and a myoptic application of the above advice would lead to a dozen different database products in a typical department. They'd all be the right tool for some job - unless you're hoping to reuse skills, reuse backup solutions (TRM for DB2, Veritas for Oracle, etc), have any hope of reliable integration, etc.

So, yeah - get the right tool for the job. But before you right that out you need to take a big step back and get a sense of what your strategic direction is, and what are all the implications of such a decision.

I know a lot of folks converting mysql to other solutions right now - because some junior guy figured it was the best solution. It might have been for the app - but it wasn't for the department. Which is like winning a battle but loosing the war.

Re:DB2 ICE sets TPC-H performance standard on Linu by kpharmer · 2003-09-11 14:20 · Score: 2, Informative

Ouch, sounds like you should have gotten an experienced dba to set it up for you. DB2's too complex to go with simple defaults, and clustering is definitely a high-skills endeaver.

As far as insert loads go, we've seen 500 rows / second on five year old hardware without any problems. Although that's far short of what DB2 is capable of, it's fine for a sustained load. Beyond that batch loads hit 15,000 rows per second easily on the same box.

And as far as pricing goes, today you could get DB2 Express for those little dual-cpu boxes for just $500. A really fast four-way will cost you $32,000 - still way shy of $100k. You don't need to hit that kind of pricing unless you're doing inter-partition parallelism. And as I mentioned above - that's just not worth doing unless you've got the right skills to pull it off.

Here kiddie, kiddie by C10H14N2 · 2003-09-11 14:25 · Score: 2, Insightful

Great, we had to go there. "Kiddie" and then resort to recommending MS SQL, which has proven itself to be, shall we say, not the most "reliable" on the planet and elaboration shouldn't be necessary in these parts. Oddly enough, although MS SQL was "licensed" (a-la CPM, sigh) from Sybase, people often simultaneously disparage Sybase and praise MS SQL, despite the fact the Sybase licenses are roughly half of MS SQL. So we're left, essentially, with Oracle, which the vast majority of businesses will find not the least bit cost effective, for that matter even necessary, certainly not at twice to four times that of the next commercial competition. No wonder people look for open source solutions.

At any rate, many "big kids" are using the most unfairly bullied product, slandered most likely because it is a software boy-named-sue, MySQL. Why not have a read before taking childish pot-shots:

http://www.mysql.com/press/MySQL_userlist.pdf

In the end this silly "I'm a big boy because I use oracle and your a little gurly kiddie because you don't" bullshit is just empty bravado. Businesses generally attempt to find the most cost effective means to meet a need and often Oracle ends up being like buying a stealth fighter to deliver a pizza. It often just doesn't make sense even for a big kid with billions of dollars, which might be why the $30B+ multinational BASF uses PostgreSQL.

Frankly, after the named-user license Oracle sold the State of California, no matter how idiotic the clearly comatose contract negotiators were, one would be remiss to not consider other companies with slightly less egregious behavior on record.

Cluster for MySQL Described by Mine_Field · 2003-09-11 14:25 · Score: 4, Informative

Here is a description of a Cluster created on MySQL with Linux boxes - similar to Google. http://www.dwreview.com/Product_Reviews/Review_Dat a.html and http://www.dwreview.com/Data_mining/Intelligent_Da taMining.html

I've used MySQL and PHP on a reasonably big site. by Anonymous Coward · 2003-09-11 14:50 · Score: 4, Informative

I maintain a site that does a fair bit of traffic (Daily avgs: files served = 1.8 Million, bandwidth = 20 Gigs)

We have 1 "master" MySQL server which gets all updates and inserts, etc. We have 2 "slave" servers which each take a signifigant portion of the select queries. All machines run the same 4.0.x version of MySQL. (Web access is PHP on Apache) All machines are dual x86s packed with RAM.

Setting up replication is pretty easy. And for the most part things are pretty nice. The load average drops a lot on each machine when we add a new slave. (Oh don't forget to enable query caching.)

We have had some problems though. Because the site gets so much traffic sometimes queries take a while to run and to propagate to the slave servers. This means if you update your data (via the master) and then do a select from one of the slaves your change may not show up yet. For most web apps this might not seem really big.

But it leads to the web users changing things and not seeing the results right away. So they figure the site is "broken" and they repeat what they just did only to have it take place twice. If you have your browser "refresh" the page first usually the data has come through but many people don't do this. The result is they don't feel their account has been credited or something. These kinds of bugs are hard to track down too.

I wrote a program to check repetatly (sleeping from 1/4 to 1/25 of second in between) and the slaves were almost always in perfect sync with the master. (as per MySQL's binary log position indicator). That was really impressive however there are times when the servers are under load that the slaves will be out of sync for 30 to 60+ seconds! (Measuring in the tens of thousands of byte offest differences in the binary log position.)

The solution we've been using is that any time there is an update to the database and the imediate page seen next by the user relies on the changed data we do the selects from the master server. This seems to work for now but I'm not sure how long we will be able to scale this way.

In summary so long as the laod on the machines stays around 1.0 or lower everything runs pretty smooth. If the loads hit 3 to 5 or higher then people notice (or rather mention) that things seem odd. (By the way those are linux load averages which IIRC is different than under Solaris.)

What I would like to see is a virtual server type system where one machine accepts all queries and hands them out to a set of replicating servers without requiring the application to know about it. This is nice for developing applications but the real reason is the master can then prevent the syncing issues discussed above.

SF

What's the application? by BanjoBob · 2003-09-11 14:52 · Score: 2, Informative

Enterprise database solutions are quite varied. Is it a data warehouse or something financial or ???

You pick the right tool for the job. I've seen massive databases on Sun Enterprise E-6500s and Oracle do a LOT if the database is properly configured. But one structure doesn't work for all applications. Do you use stored procedures? Do you index? Do you require triple replication to reindex one system have a backup and a live production system? Do you need remote fail-safe operation? These types of questions need to be answered before you settle into one solution.

--
Banjo - The more I know about Windoze, the more I love *nix

Re:DB2 ICE sets TPC-H performance standard on Linu by dougnaka · 2003-09-11 15:18 · Score: 2, Informative

Our fast four way was under $32k, I threw in the price of the x300 storage array we bought with it and the 4 CPU licenses for DB2 EEE 7.2 we bought for $11.5K each (half off the SRP).
Our db2 does over 15,000 rows/second in BATCH mode. It was a sad day when we had to log our transactions to text files for batch processing.
We did end up hiring a good DBA to help us with our DB2. It's worth noting that we didn't have this extra cost or need with our PostgreSQL setup.
I'm curious about anothers experience with DB2 on Linux, as I assume you're running on. Tell me, what versions do you run? What kernel? What kind of reliabity do you get?
We initially ran DB2 on a Redhat 7.3 setup with a severely modified kernel, 2.4.15 I think it was. We went to RHAS 2.1 with the RedHat kernel after so many stability problems in DB2. The new version didn't fix the problems, it only threw in a slew of new problems relating to the hardware. Our new setup is Gentoo 1.4 with a 2.4.21 kernel. It runs much faster, sees all HT enabled processors and throws no APIC errors, and hasn't crashed.
So, what are your experiences with DB2 on Linux. If you're not runing on Linux, what are you running on?

--
My Linux Command of the Day site : LCOD

Re:Off topic, but it's not. Why PDF and not HTML? by shibashaba · 2003-09-11 15:45 · Score: 2, Interesting

You can't embed fonts and images inside of html documents.

On linux(with my blazingly fast duron 650) a 500 page pdf I made with OpenOffice takes a few seconds to load in konqueror. I had downloaded all the indiv. web pages that made up the book(wasn't avail as one file), used cat to put them together and then waited 20 minutes for open office to load it. Mozilla took about the same time to load the same file, and konqueror was a little bit less than ten minutes. God knows how long IE would have taken, if it would have loaded at all. While were getting off topic here, Word2000 would only bring up the first web page becuase and ignored all the rest in the file. God only knows what IE would have done.

Basically, for big files PDF is the only option as far as I'm concerned. I am sorry that Microsoft and the creaters of pdf can't provide you with a decent computing experience for such basic tasks. There's only $50 billion dollars and decades of experience between the two companies, these poor guys are doing all they can.

--
---------- Open Source is capitalism applied to IP.

Plenty of poorly managed SQL Server installations. by tjstork · 2003-09-11 15:56 · Score: 2, Interesting

One of the telltale signs of a SQL Server installation is the frequent "deadlock" messages. I would say that if you are going to complain about transaction handling in MySQL, even the standard version that doesn't have it, you should probably complain about the transaction handling in SQL Server. If it deadlocks, and does not deadlock avoid, then it ain't an enterprise solution.

--
This is my sig.

SAPDB seems right by christooley · 2003-09-11 16:01 · Score: 2, Informative

According to the FAQ it supports clusters/high availability of several types (towards the bottom), has Oracle 7 compatability, and has the option to upgrade to commercial support (something available for Postgres, MySQL and most others as well). It's got an install base of users used to large environments and has been reasonably proven in the field. Just a thought.

Java's turn... by jerryasher · 2003-09-11 16:18 · Score: 2, Funny

Here's another experience I love.

I'm in mozilla, and I middle click a link. Somewhere in the background a page starts loading in a new tab. No problem. I will continue reading. Ah, background loading into new tabs.

Then.

The.machine.comes.to.a.halt.

It.takes.me.awhile.to.realize.this.

but.

I.can.not.scroll.I.can.do.nothing.

And I know what's happened. Some moron has a java applet displaying something wonderfully important like the fucking time in their little corner of hell, and if I wait about thirty more seconds, I'm going to hear a little pop, and I just know the sound of that pop is like the sound of a dick popping out of my anus, cause I know that java has just ass raped me, my browser, and my machine.

Pop! Your clock is now ready sir!

O U C H ! ! ! ! Rapist.

MySql Clusters work great!!! by texasrocket · 2003-09-11 16:24 · Score: 3, Informative

I have personally installed, setup and maintained a 5 (3 slaves, 1 master/slave, 1 slave/master) node cluster using Heartbeat and MySql replication. It works great!! My guess is that 80% of MY Mysql usage is content and needs READ-ONLY access. So I have 3 slaves that are used in a Read-Only cluster. The master is one of 2 other machines and ALL WRITES go to it. In the event of a MASTER db going down, the remaining slave promotes itself and updates the other slaves to point to itself. Been working great fo 8 months!!!

You need a license only if: by infernalC · 2003-09-11 17:38 · Score: 2, Informative

You need a license only if you choose to distribute the software. If this is an in-house application, simply obtain copies of MySQL Standard/Max (GPL) directly from MySQL mirrors for each server. Since you do not perform the distribution, you are in good shape (see MySQL License Policy - Licensing -2).

However, the folks at MySQL AB are very decent folks who offer great support and warranty for their product and who have to feed their families, and licenses are cheap. IMHO, buy at least one license for a master and one for a slave. That way you get support for the program in each role.

Re:You need a license only if: by toddhunter · 2003-09-11 18:36 · Score: 2

the problem I have found with mySQL is the 'internal' distribution clause in their license. This is different to standard the standard concept of distributing it externally.
Also from their examples page
http://www.mysql.com/products/licensing-examples .html
If you are selling a non-GPL application that requires MySQL and works with a webserver, then you need to provide commercial MySQL license(s) to your customers.
Now this is all well and good, the mySQL guys should earn good money for what they do. But if you do contact them, all you will get is a response saying 'well, you better buy a commercial license just in case'. I don't like that. If they are going to charge for something then do it and make it clear what they are trying to do. IMHO they are using doubt over the GPL to sell licenses

Well you sort of can! by codepunk · 2003-09-11 18:02 · Score: 3, Interesting

I run two types of clusters, one of them is a RAC 9i on Linux. Nothing and I mean nothing has the functionality of RAC 9i. You can put a bullet through one of the nodes right in the middle of a query being returned and still get your records just like nothing ever happened. The other database I run is a postgresql on redhat advanced server and the database files are sym linked into the san (this is high availability only) . If I had to do it again I would not use postgresql because it scales for shit and I cannot under any circumstances keep it up in a 24/7 configuration. The database needs to have vaccuum run on it once a day and I have to do that manually because half the time it fails. Running a vaccum on the database while clients are connected basically locks everyone tight until it is finished.

If you cannot spend any money and wish a fast, scalable and higly available system my advice is first sapdb and or mysql and advanced server on some sort of shared scsi.

Now all of you big postgresql advocates flame away but it does not change the facts. I love the database but if you need heavy lifting it just does not cut the mustard.

--

Got Code?

Re:Well you sort of can! by Sxooter · 2003-09-12 04:40 · Score: 2, Interesting

Just wondering, but are you on an older flavor of Postgresql? Most of the issues you mention (i.e. vacuum slowing things down) have been fixed for quite some time.

Also, if you haven't bothered to tune your postgresql.conf file on an older install, it will run for shit. I.e. the default settings are for a small workgroup type setup, not enterprise class stuff.

Keep in mind, Afilias runs the .org and .info tlds on postgresql, so it can't be impossible to get 24/7 operation out of, or the .org domain would be offline several times a day.

--

--- It is not the things we do which we regret the most, but the things which we don't do.

MNESIA from ERLANG by WetCat · 2003-09-11 18:21 · Score: 2, Informative

Check Mnesia DB from Erlang package. It's not relational, but has high-availablility replication, conflict management, etc. It's reliable and tested. By Ericsson.
Good license.

Livejournal.com is clustering MySQL by BiOFH · 2003-09-11 18:36 · Score: 2, Informative

http://www.livejournal.com/community/lj_maintenanc e/60984.html

--
- I am made of meat.

Re:In databases, you get what you pay for by La+Camiseta · 2003-09-11 19:24 · Score: 2, Informative

But if you care about your data so much that you are seriously going into replicated systems, the couple of most popular free packages at least aren't there yet even in basic ACID reliability.

What are you talking about? PostgreSQL has supported ACID reliability for years.

Plus, PostgreSQL also now supports replication, the same as the one that PostgreSQL, INC. has been selling as an add-on for years (they finally opensourced it).

Another database that I would check out is SAPdb. SAP originally created it to be a competitor to Oracle, so that their customers wouldn't have to buy Oracle databases (read: pretty complicated setup, but worth it). But now they've opensourced it too, and as far as I know, it supports replication. And in the next release when MySQL takes over (Q4 2003, it'll be renamed MaxDB, and MySQL will be working on the code as well as SAP), it will have a proxy available so that you can just use MySQL database drivers to access it.

Re:I've used MySQL and PHP on a reasonably big sit by DaMoose · 2003-09-11 19:35 · Score: 2, Insightful

OMG - as I was reading your post, I starting to think you must be one of my developers - we had the identical configuration and issues with Superdudes.net.

It got to the point where the slave servers (P4, 2GB RAM, Hardware RAID) could not keep up with the Master replication _and_ service SELECT queries. The data was too big for RAM (filesort) and the drives were not fast enough (2 drives mirrored). The Master is dual PIII 2Ghz, 2GB RAM , and fast RAID 5 hardware.

I ended up solving the problem with a hardware upgrade. I replaced all 4 servers with 1 Quad-Opteron 1.8GHz, 16 GB RAM, and _VERY_ Fast RAID 10 across 9 fast drives.

Please feel free to check it out. For the first time in a long time, I'm not affraid that the MySQL server will be the bottlenect in this very dynamic web site.

We use Linux, Apache 1.3.x, MySQL 4.0.x, and PHP 4.x to build the pages and generate XML to our Flash MX applications.

Superdudes.Net

Flash heavy signt. Free registration required to access the coolest features (those which beat up the MySQL server).

replication by juraj · 2003-09-12 00:35 · Score: 2, Informative

Actually, I've been looking for replication in a Free database for months. The things I don't get with the various "proxy" solutions:

if I do insert delayed to a database (which is replicated to two databases), or a simple insert. I have an autoincrement field. Who guarantees, that they will have the same value. If insert delayed is performed, how does the "proxy" guarantee they are actually issued in the same order. i don't care what's the order, but I want the order to be the same on both databases.
if a database falls down, how do I get instant resynchronization? I don't want to copy the full *GB database back and forth, while having a read lock. i want instant resynchronization from the point where the database fell down.
I want to write to both databases and the changes replicated to the others. I want peer to peer database, not master-server, because that involves always knowing who is the master on the application level (okay, I can grab an IP address, but much easier and nicer solution is having a cloud of servers and these things solved in the cluster, not by my hacks).

Another thing -- did anyone had a look at SAPdb and Interbase? They are Free too and there's not much talk about them. Are they useable? Do they provide replication?

Slashdot Mirror

Open Source Database Clusters?

99 of 350 comments (clear)