Power Outage Takes Wikimedia Down
Baricom writes "Just a few weeks after a major power outage took out well-known blogging service LiveJournal for several hours, almost all of Wikimedia Foundation's services are offline due to a tripped circuit breaker at a different colo. Among other services, Wikimedia runs the well-known Wikipedia open encyclopedia. Coincidentally, the foundation is in the middle of a fundraising drive to pay for new servers. They have established an off-site backup of the fundraising page here until power returns."
Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state
Ya know, I just don't understand why so many projects with such high visibility and requirements for reliability use a toy database like MySQL.
Someone PLEASE tell me why. Because right now the only thing I can think is that people just don't know how to pronounce "Postgres".
There's a simple way around this: stick to PostgreSQL, MSSQL, Oracle, DB/2, or some other real database. MySQL doesn't make the grade, precisely because things like this can happen.
No no, but with the google deal looming, the tin-foil-hatters are paying close attention to wikipedia, and every little thing gets overly-scrutinized.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
IIRC, that's the Fire Code. The breaker needs to be able to unconditionally kill all power inside the facility. Thus -- it kills the power post-UPS.
Sometimes it costs more to do things wrong, in the long term, than to do them right.
500GB of disk, 5TB of transfer, $5.95/mo
- Distributed caches - now majority of hits are served by caches, and some of them are offsite. It was a pilot project for a while and now we're trying to design and build scalable infrastructure for that. But still, lots of edits are served uncached.
- Distributed file systems - are there any? NFS is single-server system, MS has something, PVFS has no redundancy, GoogleFS is closed and not released, Coda, AFS, all of those just don't work. Right now we're trying to develop MogileFS (the perl-based app-level file storage by LiveJournal) store and sure there are other ideas.
- Distributed database - there are no proper large database multimaster opensource solutions. MySQL with replication and transactional data store is used. In this event it would be great to have second datacenter nearby with additional DB replicas and gigabit interconnection, but that costs money. And app-level bidirectional replication is in plans for both MySQL and PostgreSQL. And SAN deployment is too costly.
And yes, MediaWiki code has PostgreSQL support, but migrating from one database to another without proper tests, benchmarks and insurance isn't very mature.