Power Outage Takes Wikimedia Down

← Back to Stories (view on slashdot.org)

Power Outage Takes Wikimedia Down

Posted by ryuzaki0 on Monday February 21, 2005 @02:28PM from the problem-with-non-absolute-power dept.

Baricom writes "Just a few weeks after a major power outage took out well-known blogging service LiveJournal for several hours, almost all of Wikimedia Foundation's services are offline due to a tripped circuit breaker at a different colo. Among other services, Wikimedia runs the well-known Wikipedia open encyclopedia. Coincidentally, the foundation is in the middle of a fundraising drive to pay for new servers. They have established an off-site backup of the fundraising page here until power returns."

25 of 577 comments (clear)

Min score:

Reason:

Sort:

Another indictment of MySql by Anonymous Coward · 2005-02-21 14:34 · Score: 5, Insightful

Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state

Ya know, I just don't understand why so many projects with such high visibility and requirements for reliability use a toy database like MySQL.

Someone PLEASE tell me why. Because right now the only thing I can think is that people just don't know how to pronounce "Postgres".
1. Re:Another indictment of MySql by Anonymous Coward · 2005-02-21 14:55 · Score: 5, Insightful
  
  No database can guarantee data integrity in the case of a power failure
  
  This is false. SQL Server 2000 (yeah, I know, instant mod-down) has a transaction log and so does Oracle and I'm sure every other half-decent database. ALL committed transactions are preserved and the data is in a consistent state.
  
  MySQL does not have this and the developers don't seem to care much about it. This is the problem with open-source in general, if someone is just doing it for fun they aren't going to spend any time on the stuff they don't care about personally.
2. Re:Another indictment of MySql by Anonymous Coward · 2005-02-21 16:56 · Score: 2, Insightful
  
  I realise since you seem to be involved with wikipedia, you'll be modded up no matter what. However, what you just said makes no logical sense. The grandparent basically said that mysql's transaction support sucks and consequently it can't guarantee db integrity over a power failure. You said that because *one* server came back up with no problems that he should reassess mysql. You could have *all* your servers come back with no problems and it still wouldn't change the grandparents assessment. You would just be getting lucky.
3. Re:Another indictment of MySql by Tough+Love · 2005-02-21 17:35 · Score: 4, Insightful
  
  Since at least one of our MySQL database servers has so far restarted successfully with all InnoDB data intact, perhaps you'd care to reconsider your assessment that MySQL is incapable of doing what it just did?
  
  But one didn't. That's a much more informative data point.
  
  --
  When all you have is a hammer, every problem starts to look like a thumb.
4. Re:Another indictment of MySql by Jamesday · 2005-02-22 06:51 · Score: 2, Insightful
  
  Depends on the cause. If the database server software was being lied to by the OS, controller or drives I'm not sure just how much I'm inclined to blame the database server sofware.
  
  I am inclined to ask the database server vendor to see if they can find ways to protect against it and I've briefly discussed that already.
Re:They should ask for more... by Raul654 · 2005-02-21 14:39 · Score: 2, Insightful

Right, because we all know money grows on trees...

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Re:mysql bad at disaster recovery? by YU+Nicks+NE+Way · 2005-02-21 14:46 · Score: 4, Insightful

There's a simple way around this: stick to PostgreSQL, MSSQL, Oracle, DB/2, or some other real database. MySQL doesn't make the grade, precisely because things like this can happen.
Re:Coincidence... ;) by Raul654 · 2005-02-21 14:57 · Score: 3, Insightful

No no, but with the google deal looming, the tin-foil-hatters are paying close attention to wikipedia, and every little thing gets overly-scrutinized.

--

To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
Re:They should ask for more... by man_ls · 2005-02-21 15:05 · Score: 3, Insightful

IIRC, that's the Fire Code. The breaker needs to be able to unconditionally kill all power inside the facility. Thus -- it kills the power post-UPS.
Re:They should ask for more... by PornMaster · 2005-02-21 15:08 · Score: 3, Insightful

Sometimes it costs more to do things wrong, in the long term, than to do them right.

--
500GB of disk, 5TB of transfer, $5.95/mo
Re:They should ask for more... by mboverload · 2005-02-21 15:11 · Score: 2, Insightful

Hey man, they have their traffic doubling every 4 months, they NEVER planned for this sucess this early. Building infrastructure is hard when you never plan for it.
Re:What Happened. by wakejagr · 2005-02-21 15:18 · Score: 2, Insightful

Kudos to Wikimedia for actually explaining what happened and not just putting a "This page is down, please try again later" messege up. Many people/companies/groups/etc would be too proud or too afraid of bad publicity to actually explain the problem.

--
Don't save Windows XP! http://www.petitiononline.com/jjw1xp/petition.html
Re:Distributed Wikipedia? by midom · 2005-02-21 15:31 · Score: 3, Insightful
Well, distributing a wiki is a task a bit more complex than distributing search index (async!) or seti@home (async). You don't care in async data arrays wether the packet you sent to some node is hour or day old. You care about that in wiki, because every user will be pressing 'edit' button, and data should be consistent everywhere. We are working on distribution.
- Distributed caches - now majority of hits are served by caches, and some of them are offsite. It was a pilot project for a while and now we're trying to design and build scalable infrastructure for that. But still, lots of edits are served uncached.
- Distributed file systems - are there any? NFS is single-server system, MS has something, PVFS has no redundancy, GoogleFS is closed and not released, Coda, AFS, all of those just don't work. Right now we're trying to develop MogileFS (the perl-based app-level file storage by LiveJournal) store and sure there are other ideas.
- Distributed database - there are no proper large database multimaster opensource solutions. MySQL with replication and transactional data store is used. In this event it would be great to have second datacenter nearby with additional DB replicas and gigabit interconnection, but that costs money. And app-level bidirectional replication is in plans for both MySQL and PostgreSQL. And SAN deployment is too costly.
And yes, MediaWiki code has PostgreSQL support, but migrating from one database to another without proper tests, benchmarks and insurance isn't very mature.
Re:What Happened. by Anonymous Coward · 2005-02-21 15:33 · Score: 2, Insightful

You do know that in real datacenters you don't have a UPS on each PC, but a UPS for the ROOM and between this UPS and your servers you are going to need brakers, so if you put to many things on a circuit it may cause problem, as simple as that.
Re:Stupid question... by ScrewMaster · 2005-02-21 15:34 · Score: 2, Insightful

Something still doesn't add up. Even if a backup generator autostarts successfully, there's a significant delay between mains failure, switchover, and the generator picking up the load. That's usually a few seconds or more, too long for a computer to run off the residual charge in its power supply filter caps. There would still have been an inverter-charger somewhere to keep the equipment running until the generator was fired up. Sounds like somebody screwed up, either by tripping the wrong breaker, or by designing the facility improperly to begin with.

--
The higher the technology, the sharper that two-edged sword.
why, why, why? by CAIMLAS · 2005-02-21 15:42 · Score: 2, Insightful

Why were they not using battery backup on their database servers (IE, their critical servers)? That way the servers would have the necessary 10 minutes (or whatever) so that they can shut down the DBs and power off the systems.

This is a negligible cost for something as integral as an active sync with the work that people have performed - for free.

Why is this not seen as important? "The wiki users will just recreate the material"? That's somewhat presumptuous.

Now, livejournal I can understand not doing this (as there are many clients which allow people to sync with their online journals and the material is fairly culturally worthless), but wikipedia? It's one of the better things on the Internet.

--
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
Re:Distributed Wikipedia? by InfiniteWisdom · 2005-02-21 15:47 · Score: 2, Insightful

170GB isn't that big and people routinely run far more critical stuff without any kind of exotic seti@home-like distribution. What's really inexcusable is the fact that a power failure caused database corruption that turned a 2 minute power outage into major downtime.
Re:mysql bad at disaster recovery? by fimbulvetr · 2005-02-21 16:09 · Score: 2, Insightful

I'd rather just agree to disagree on this one, at this point it's all just what we have observed. It heavily depends on the situation, how the db is setup, etc.

As far as the script, yes, it does have locks, and rightly so. It's not terribly tough to write a lock aware script. In my opinion, the replication setup is extremely easy to script. I'd much rather script it than sit in front of the console. Once I see it work, I know it will work every time, and I won't worry about something like me or a peer mistyping the server-id at 4:00am. Even at 20GB, it can't be terribly long at 100Mb/s.

You only need the lock on the master while you're tar'ing the snapshot for distribution to the other servers. Once it's tar'd, unlock master, gzip, redistribute, tar zxvf, setup slave and it will catch up.
Re:What Happened. by Anonymous Coward · 2005-02-21 16:09 · Score: 1, Insightful

The sticky point is the database servers, where all the important stuff is. Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state.

I don't get it, then why the fuck bother with InnoDB. Transactions/ACIDity imply a performance penalty over just cache and async write of a direct image. One pays this penalty for the benefits (usually critical for many applications) of data integrity and robustness. How would you like your bank to run on MySQL?

This is the dumbest thing I've ever heard. I used to tell MySQL weenies that their DBMS sucked because it had no transaction support, then recently these annoying inbred fuckwits tell me that MySQL is just as good as Oracle because it has InnoDB support (we'll let the fact that the schema is kept in the shitball format slide)...Well apparently these morons don't have a fucking clue what transaction processing really means. Usually COMMIT and ROLLBACK are suppossed to actually mean something... and even working 90% of the time doesn't cut it.

I would never donate to this goddamn Wikipedia project as long as I know that the funds are going to end up being sapped to support their crippled shitball database.
Re:URI to the Rescue by J'raxis · 2005-02-21 16:23 · Score: 2, Insightful

URIs are a superset of URLs and URNs. I think what you're talking about is a URN, isn't it? These are the URIs that specifically name something uniquely (for example, urn:isbn:1902593790 or urn:oid:1.3.6.1.4.1.20115) but don't necessary help you locate it at a specific place.

--
Liberty in your lifetime
Re:This is why you don't turn Google down by multisync · 2005-02-21 16:42 · Score: 2, Insightful

They should talk about his work and his contribution to American culture. They shouldn't be making fun of him. He deserves better.

If Hunter S. Thompson were still alive, he'd be making fun of himself for killing himself.

--
I don't care why you're posting AC
MySQL not ACID by Tough+Love · 2005-02-21 17:06 · Score: 2, Insightful

From the wikipedia page:

At about 14:15 PST some circuit breakers were tripped in the colocation facility where our servers are housed. Although the facility has a well-stocked generator, this took out power to places inside the facility, including the switch that connects us to the network and all our servers. (Yes, even the machines with dual power supplies -- both circuits got shut off.)

After some minutes, the switch and most of our machines had rebooted. Some of our servers required additional work to get up, and a few may still be sitting there dead but can be worked around.

The sticky point is the database servers, where all the important stuff is. Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state.

(Bolding mine.) This proves that MySQL is not ACID, there is no way that a power outage is supposed to cause corruption in a database. This is not a troll, this is a simple conclusion. I really think that Wikipedia should switch to PostgreSQL, which is considerably more mature in terms of ACID compliance.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:mysql bad at disaster recovery? by Jugalator · 2005-02-21 19:22 · Score: 2, Insightful

They were lucky with that server?

I mean, if a few servers' databases survived, that may speak more of random luck of not being in a status so when the power outage occured nothing bad happened. If all of the databases survived, that speaks of MySQL being resistant to this sort of thing.

--
Beware: In C++, your friends can see your privates!
Re:This is why you don't turn Google down by mdecarle · 2005-02-21 21:14 · Score: 2, Insightful

Must you really know what the money is being spend on?

If you donate money, you are asking them to continue to offer their great service to you and other people. How they achieve that goal, is up to them, no?

You don't ask the Red Cross what they use your money for, do you? The organisation usually tells you afterwards.
Indeed. by Anonymous Coward · 2005-02-22 01:21 · Score: 1, Insightful

Sometimes the history of an article says just as much (if not more) than the article itself.