Slashdot Mirror


Wikipedia Explains Today's Global Outage

gnujoshua writes "The Wikimedia Tech Blog has a post explaining why many users were unable to reach Wikimedia sites due to DNS resolution failure. The article states, 'Due to an overheating problem in our European data center many of our servers turned off to protect themselves. As this impacted all Wikipedia and other projects access from European users, we were forced to move all user traffic to our Florida cluster, for which we have a standard quick failover procedure in place, that changes our DNS entries. However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally. This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects."

6 of 153 comments (clear)

  1. Run both systems live at half capacity by Colin+Smith · · Score: 2, Interesting

    active/passive systems are a pain in the arse. The whole concept of testing failover in an active/passive situation is wrong. Anything which relies on human beings doing this and that and that and that is a bad solution.

    Just run active/active and load balancer over both sites. If one fails it's tests, you just pull it.

     

    --
    Deleted
  2. Re:Test, and Test Again by geniice · · Score: 4, Interesting

    Going by past statsitics the cost of downtime to wikipedia tends to be negative since donations rise. Not that this is something wikimedia aims to do.

  3. Denmark is still without Wikipedia by Anonymous Coward · · Score: 1, Interesting

    20:47 UTC+1, we are still without Wikipedia probably due to poor DNS propagation

  4. Distributed Wikipedia by RAMMS+EIN · · Score: 2, Interesting

    Speaking of Wikipedia, an idea that has long been in my mind, but that I have never sat down and worked out is distributed hosting of Wikipedia. The idea is that volunteers each contribute some resources (network capacity, storage space, RAM, and CPU cycles) to host and serve part of the content.

    This way, we should be able to reduce the load on the (donation supported) Wikimedia servers, as well as increase the redundancy in the system.

    Is anybody already working on this or are there perhaps even already implementations of this idea?

    --
    Please correct me if I got my facts wrong.
    1. Re:Distributed Wikipedia by u38cg · · Score: 2, Interesting

      Attempts have been made at the general case, but it is a hard problem: how do you ensure fair resource sharing and reliability?

      --
      [FUCK BETA]
  5. Jokes aside, how do you feel when you lost Wiki? by porky_pig_jr · · Score: 2, Interesting

    I was rather pissed. And the only thing I was going to do is to look up a few math terms. Ended up using PlanetMath and few other sites, but when Wki came back, I check them as well as guess what: they had the most comprehensive and informative articles. That's the first outage I remember since I started using Wiki.