Power Outage Takes Wikimedia Down

← Back to Stories (view on slashdot.org)

Power Outage Takes Wikimedia Down

Posted by ryuzaki0 on Monday February 21, 2005 @02:28PM from the problem-with-non-absolute-power dept.

Baricom writes "Just a few weeks after a major power outage took out well-known blogging service LiveJournal for several hours, almost all of Wikimedia Foundation's services are offline due to a tripped circuit breaker at a different colo. Among other services, Wikimedia runs the well-known Wikipedia open encyclopedia. Coincidentally, the foundation is in the middle of a fundraising drive to pay for new servers. They have established an off-site backup of the fundraising page here until power returns."

7 of 577 comments (clear)

Min score:

Reason:

Sort:

mysql bad at disaster recovery? by bdigit · 2005-02-21 14:34 · Score: 4, Interesting

This is not a troll or a flame at all but between this and the livejournal servers, it sure sounds like hell if your mysql servers ever go down unexpected.

Is mysql the only dbase like this or does postgres get corrupted as well during unplanned downtime? If I recall from using MSSQL servers , we never had a problem like this. We would simply reboot the servers and not worry about tables being left in unrecoverable states. Please correct me if I am wrong though.

Is there any way around this or will this always be a problem with mysql?
1. Re:mysql bad at disaster recovery? by ctr2sprt · 2005-02-21 14:50 · Score: 5, Interesting
  
  We have a similar problem at work. There we don't endure database corruption, we just get broken replication. It appears to be working, but it actually isn't. So we have to take the master offline (actually just acquire a write lock on the DB, it can still answer SELECTs), tar up its (massive) database, scp it to the slaves, start the master, stop the slaves, untar the database, restart the slaves, and restart replication. The entire process can take several hours and it's easy to make mistakes. We put stickers on our MySQL servers saying "DO NOT REBOOT WITHOUT CONTACTING OPS MANAGEMENT," though unfortunately faulty DIMMs are illiterate.
  I don't know if PostgreSQL has similar problems, but I very much doubt that Oracle or DB2 do. I know that improved failover support has been a target of the PSQL developers for a little while now, so while it may not be on par with Oracle and DB2 it's probably closer than MySQL. At least for now.
  I wish this had prompted management to consider alternatives to MySQL, at least for our mission-critical database servers, but unfortunately it hasn't. They don't even see that we could sell an enterprise-level RDBMS as a significant feature - we're a webhosting company - and charge through the nose for it. Oh well. They don't listen to peons like me, they just make me fix MySQL replication every two weeks.
2. Re:mysql bad at disaster recovery? by ctr2sprt · 2005-02-21 15:48 · Score: 4, Interesting
  
  Mysql can handle reboots well.
  No. It can't. We have two concrete examples in this very page - one provided by Wikimedia, one provided by me - which directly contradict your statement. Maybe under some circumstances MySQL can handle reboots, but it's been proven already that it can't always do so. Perhaps your MySQL experience is not with high-load applications (at least not the level of load Wikimedia and my employer see).
  BTW, write a damn script. Mysql was written for unix, unix thrives on scripts. If you can't handle writing a script, why the hell are you a DB admin?
  Because the process doesn't lend itself well to scripting. For example, MySQL automatically releases locks when you close your connection to the DB. Presumably this is to avoid deadlocks and for other good reasons, but it's not trivial to write a script to do that. Also, since this is an important system, we don't like the idea of trusting computers to handle its repair: we want someone knowledgeable monitoring every step in case something doesn't work exactly right. I can of course sit there and watch the script do its thing, but that defeats the purpose of scripting the process in the first place.
  Regardless, the difficulty of the task is not the main issue. The main issue is that we are dealing with north of 1GB of data here, and on busy servers on a busy network that means restarting replication takes an hour or longer. So not only is performance reduced by 33% when we take the slaves offline one at a time, performance is reduced further by the traffic of tar/scp in the background. Not to mention the fact that, because we have a lock on the master's DB, so you can't even consider the DB cluster fully functional.
Re:Coincidence... ;) by daveo0331 · 2005-02-21 14:40 · Score: 5, Interesting

On the other hand, subjecting the donation page to the Slashdot effect seems like a great way to reach the fundraising goal in no time. Assuming of course the page itself stays up.

Seriously though, if you like wikipedia, consider donating, even if it's just 5 bucks. I think it's even tax deductible if you itemize.

--
Remember the days when Republicans were the party of fiscal responsibility?
Re:Another indictment of MySql by ergo98 · 2005-02-21 14:51 · Score: 5, Interesting

No database can guarantee data integrity in the case of a power failure.

Barring a couple of extreme exceptions, of course a modern database system should protect integrity in the case of a power failure, or any other sudden system failure (kernel panic, GPF, whatever). In the case of the much maligned SQL Server, you can hit the power button all you want mid-transaction and you're going to get a blister on your finger before the database is corrupted.
Re:They should ask for more... by brion · 2005-02-21 16:01 · Score: 4, Interesting

Our database masters do have dual power supplies. The circuit breakers were tripped on both sides.

--
Chu vi parolas Vikipedion?
Re:Coincidence... ;) by fredrikj · 2005-02-21 22:21 · Score: 4, Interesting

On the other hand, subjecting the donation page to the Slashdot effect seems like a great way to reach the fundraising goal in no time. Assuming of course the page itself stays up.

You do know that Wikipedia receives something like 100 times the traffic Slashdot does, right?