Slashdot Mirror


Facebook Unveils Details of Downtime

An anonymous reader writes "Facebook officially gave out more technical details on the endless loop in a database control mechanism that forced a 2.5-hour shutdown of the social site, and the resulting combination of a productivity burst, increased fertility (check back on June 25, 2011) and mass hysteria all around the world."

15 of 103 comments (clear)

  1. not very technical by datapharmer · · Score: 5, Interesting

    The technical details are that I have an incompatible browser? Really Slashdot? Did you even check the links? of course not...

    --
    Get a web developer
    1. Re:not very technical by Anonymous Coward · · Score: 5, Informative

      Correct link to technical details:

      http://www.facebook.com/note.php?note_id=431441338919&id=9445547199&ref=mf

      (anon because I'm not a karma whore)

    2. Re:not very technical by ProdigyPuNk · · Score: 4, Funny
      You've got to read some of the comments posted in that thread, it's hilarious.

      Anoesj Sadraee It's great to hear and see that big companies like Facebook are so open with what they do. That's rare, very rare. Thanks!

      Anne Uriarte ~facebook is stiLL sooo sLow for uz irr! >;'((

      Phil McBride this site is becoming less secure lately... hackers are becoming more and more intelligent, i would know, cuz im a white hat lol

    3. Re:not very technical by elewton · · Score: 5, Funny

      You could spend it on Karma whores or play Karma poker.

  2. OH NOES by Pojut · · Score: 5, Insightful

    Meh, it happens...just like a power company, no one says a word when the thing works fine for weeks or months at a time...but when it goes down for a couple hours, people act like it never works.

  3. Link to Facebook Blog Post by ryanleary · · Score: 5, Informative

    Since the link in the summary is broken, this is the facebook blog post.

    Post contents:
    Early today Facebook was down or unreachable for many of you for approximately 2.5 hours. This is the worst outage we’ve had in over four years, and we wanted to first of all apologize for it. We also wanted to provide much more technical detail on what happened and share one big lesson learned.

    The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.

    The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.

    Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.

    To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover.

    The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.

    This got the site back up and running today, and for now we’ve turned off the system that attempts to correct configuration values. We’re exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.

    We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.

  4. Wifey was gutted... I wasn't by rabidjoe · · Score: 5, Funny

    In the downtime I tried to show the woman the virtues of an IRC channel she could share with family and friends and how much less bullshit she would find there..... "does it have farmville?". There is no hope for the sheeples, the planet is doomed

    1. Re:Wifey was gutted... I wasn't by Allnighte · · Score: 4, Insightful

      I was going to disagree with you until I read one of the Facebook comments on the blog post talking about the error:

      "John Marshall: how do i get job workin with facebook i live in newcastle in uk can any one from facebook staff get or can some one give me a email address that i can use to contact facebook please"

      :|
      very doomed.

  5. "mass hysteria"? by SuperBanana · · Score: 4, Insightful

    and mass hysteria all around the world.

    [citation needed].

    First I knew was when I read about it on another tech blog, hours after it'd happened...and I use Facebook. And I work with a ton of people who use it (grad students.)

    There wasn't mass hysteria; there was mass ambivalence. I'm now reading all these blog/news postings about how "everyone" went crazy. Nobody was talking about it where I ate dinner. Nobody was talking about it where I had coffee that evening. It didn't make my city newspaper- no "Facebook down, residents in despair" stories to be found.

    All this coverage claiming that everyone went nuts seems like a desperate attempt by Facebook PR to make something positive out of this...namely, trying to convince us that Facebook is so integral to the people who use it, it must, of course, be to us as well.

    1. Re:"mass hysteria"? by Anonymous Coward · · Score: 4, Funny

      To me, Facebook is about as integral to my life as the toilet is. It's there, it's gonna be used every once in a while and it involves a bit of dirty business that you just can't avoid.

  6. No one is thinking about the big losses here... by ProdigyPuNk · · Score: 5, Funny

    Let's look at the important thing here with this outage: How many cows, pigs, chickens, cats, goldfish, etc were made to suffer? I know my girlfriend couldn't take care of her virtual cats, and their litterbox ended up full. They were not at all happy. I'm sure the same thing played out across thousands of FarmVille, MyPets, etc accounts. Please, won't someone think of the animal?

  7. Twitter... by PmanAce · · Score: 4, Interesting

    I wonder if Twitter had a noticable increase in usage during the Facebook outage, or other social portals?

    --
    Tired of my customary (Score:1)
  8. My Favorite Downtime by WankersRevenge · · Score: 5, Funny

    My favorite server downtime story occurred back in early 2000 when I was working for Disney's Internet Group. All the message boards for the film and television websites ended up crashing. No one knew the cause and as the web-ops team investigated, we learned that the messageboard server wasn't even housed in any of Disney' server farms. After a lot of hair pulling, we found the server was located in a satellite office in Sunnyvale. Evidently, the server was just on an engineers desk. When that engineer left the company he neglected to tell anyone about the box so when the new engineer took his spot, she found she didn't like the noise from the machine. So one day, she pulled the plug, and put it in some out of the way spot in the office. There wasn't a lot of traffic on it, but it still makes me laugh to think of all the Tim Allen fans in distress over a misplaced box.

  9. Eerie sight by NicknamesAreStupid · · Score: 5, Funny

    As Facebook went off-line, I witnessed the unthinkable at an Internet cafe. Young men and women, innocently engaging in social networking intercourse, were suddenly thrown out of their Facebook world and into the reality of the real world, as though all had taken the red pill. Images distorted into 3D with a startling range of colors, sounds beyond stereo, and smells -- odors for the new fifth sense. Everyone looked around to witness "super high def" of each other, and some actually stood to experience a new perspective. Then, as if in concert, the unplugged Facebookers began to touch each other. Immediately untapped hormones raged as ancient primal urges emerged for the first time. Just as it was about to become an orgy of primal lust, the Cafe manager flipped on a You Tube video of Elmo, http://www.youtube.com/watch?v=UZHSDjtD-dg, and a disaster was avoided.

  10. it's only facebook by thetoadwarrior · · Score: 5, Funny

    Who cares if it's down even for a day. Just talk about your pointless activities twice as long the next day.