Slashdot Mirror


Facebook Unveils Details of Downtime

An anonymous reader writes "Facebook officially gave out more technical details on the endless loop in a database control mechanism that forced a 2.5-hour shutdown of the social site, and the resulting combination of a productivity burst, increased fertility (check back on June 25, 2011) and mass hysteria all around the world."

28 of 103 comments (clear)

  1. not very technical by datapharmer · · Score: 5, Interesting

    The technical details are that I have an incompatible browser? Really Slashdot? Did you even check the links? of course not...

    --
    Get a web developer
    1. Re:not very technical by Anonymous Coward · · Score: 5, Informative

      Correct link to technical details:

      http://www.facebook.com/note.php?note_id=431441338919&id=9445547199&ref=mf

      (anon because I'm not a karma whore)

    2. Re:not very technical by ProdigyPuNk · · Score: 4, Funny
      You've got to read some of the comments posted in that thread, it's hilarious.

      Anoesj Sadraee It's great to hear and see that big companies like Facebook are so open with what they do. That's rare, very rare. Thanks!

      Anne Uriarte ~facebook is stiLL sooo sLow for uz irr! >;'((

      Phil McBride this site is becoming less secure lately... hackers are becoming more and more intelligent, i would know, cuz im a white hat lol

    3. Re:not very technical by elewton · · Score: 5, Funny

      You could spend it on Karma whores or play Karma poker.

  2. OH NOES by Pojut · · Score: 5, Insightful

    Meh, it happens...just like a power company, no one says a word when the thing works fine for weeks or months at a time...but when it goes down for a couple hours, people act like it never works.

    1. Re:OH NOES by Mitchell314 · · Score: 3, Insightful

      But it's a helluva a lot more important for a power company to stay up than FB, no power can cause serious problems. But FB down for two hours, man, the gods forbid you actually are productive or something . . .

      --
      I read TFA and all I got was this lousy cookie
  3. Official Technical Details by ProdigyPuNk · · Score: 3, Funny
    You are using an incompatible web browser. Sorry, we're not cool enough to support your browser. Please keep it real with one of the following browsers:

    Obviously, the error was caused by too many people not keeping it real.

    1. Re:Official Technical Details by jdong · · Score: 3, Insightful

      I've got a great idea! Why don't we have every slashdot reader go in and try to fix the broken link? Then the problem will correct itself in no time!

  4. Link to Facebook Blog Post by ryanleary · · Score: 5, Informative

    Since the link in the summary is broken, this is the facebook blog post.

    Post contents:
    Early today Facebook was down or unreachable for many of you for approximately 2.5 hours. This is the worst outage we’ve had in over four years, and we wanted to first of all apologize for it. We also wanted to provide much more technical detail on what happened and share one big lesson learned.

    The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.

    The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.

    Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.

    To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover.

    The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.

    This got the site back up and running today, and for now we’ve turned off the system that attempts to correct configuration values. We’re exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.

    We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.

    1. Re:Link to Facebook Blog Post by Charliemopps · · Score: 3, Interesting

      Wow, I'm impressed by the detail they provided. More companies should handle outages like this. Makes them look like they know what they're doing, they figured it out, and it wont happen again. Instead of the typical stance of pretending it never happened.

  5. Wifey was gutted... I wasn't by rabidjoe · · Score: 5, Funny

    In the downtime I tried to show the woman the virtues of an IRC channel she could share with family and friends and how much less bullshit she would find there..... "does it have farmville?". There is no hope for the sheeples, the planet is doomed

    1. Re:Wifey was gutted... I wasn't by Allnighte · · Score: 4, Insightful

      I was going to disagree with you until I read one of the Facebook comments on the blog post talking about the error:

      "John Marshall: how do i get job workin with facebook i live in newcastle in uk can any one from facebook staff get or can some one give me a email address that i can use to contact facebook please"

      :|
      very doomed.

  6. "mass hysteria"? by SuperBanana · · Score: 4, Insightful

    and mass hysteria all around the world.

    [citation needed].

    First I knew was when I read about it on another tech blog, hours after it'd happened...and I use Facebook. And I work with a ton of people who use it (grad students.)

    There wasn't mass hysteria; there was mass ambivalence. I'm now reading all these blog/news postings about how "everyone" went crazy. Nobody was talking about it where I ate dinner. Nobody was talking about it where I had coffee that evening. It didn't make my city newspaper- no "Facebook down, residents in despair" stories to be found.

    All this coverage claiming that everyone went nuts seems like a desperate attempt by Facebook PR to make something positive out of this...namely, trying to convince us that Facebook is so integral to the people who use it, it must, of course, be to us as well.

    1. Re:"mass hysteria"? by Anonymous Coward · · Score: 4, Funny

      To me, Facebook is about as integral to my life as the toilet is. It's there, it's gonna be used every once in a while and it involves a bit of dirty business that you just can't avoid.

    2. Re:"mass hysteria"? by ForexCoder · · Score: 3, Insightful

      [citation needed].

      [citation]

    3. Re:"mass hysteria"? by kurokame · · Score: 2, Interesting

      To be fair, most grad students use Facebook primarily to help remind them that they need to come up for air occasionally. If it goes down and temporarily stops vying for their attention, they're likely to continue being absorbed with analyzing the data from their last attempt to apply an epicycle-based model to the sociology of small town karaoke sessions given a behavioral-political tensor formulation of motivation in a multidimensional vector space representing cheese.

    4. Re:"mass hysteria"? by microbee · · Score: 3, Insightful

      It's a toilet all right, but a very annoying one: you hear every flush coming from your friends' toilet as well.

    5. Re:"mass hysteria"? by neumayr · · Score: 2, Insightful

      Uh. I took the summary as sarcasm of sorts. Which made your reaction seem like quite the overreaction. Then you got +5 Insightful...

      --
      Truth arises more readily from error than from confusion. -Francis Bacon
    6. Re:"mass hysteria"? by IANAAC · · Score: 2

      It's a toilet all right, but a very annoying one: you hear every flush coming from your friends' toilet as well.

      Really?

      The first time I see notice of anyone's flushes, I block the flush notices.

      Seems to me, if you're complaining about it, you don't really know what you're doing (and that's considering all you have to do is hover your mouse over their post).

  7. No one is thinking about the big losses here... by ProdigyPuNk · · Score: 5, Funny

    Let's look at the important thing here with this outage: How many cows, pigs, chickens, cats, goldfish, etc were made to suffer? I know my girlfriend couldn't take care of her virtual cats, and their litterbox ended up full. They were not at all happy. I'm sure the same thing played out across thousands of FarmVille, MyPets, etc accounts. Please, won't someone think of the animal?

  8. The metaphysical reason was karmic payback by Anonymous Coward · · Score: 2, Insightful

    If you suck so bad on a global scale long enough, eventually the universe tries to step in.

  9. Farrmville continued to run? by Animats · · Score: 2, Insightful

    Does the clock stop for Farmville if Facebook goes down?

  10. Twitter... by PmanAce · · Score: 4, Interesting

    I wonder if Twitter had a noticable increase in usage during the Facebook outage, or other social portals?

    --
    Tired of my customary (Score:1)
  11. less productivity actually by Valpis · · Score: 3, Funny

    instead of people checking facebook every 5 minutes for the latest, very important, updates as they always do they now constantly was hitting reload for 2.5 hour

    --
    who shot the cat in the hat to experiment is insane
  12. Fertility was not affected by michaelmalak · · Score: 3, Informative

    Unless the particular arrangement of pixels on a Facebook webpage caused a powerful alignment of EM radiation, fertility was not affected. Perhaps fecundity was, but not fertility.

  13. My Favorite Downtime by WankersRevenge · · Score: 5, Funny

    My favorite server downtime story occurred back in early 2000 when I was working for Disney's Internet Group. All the message boards for the film and television websites ended up crashing. No one knew the cause and as the web-ops team investigated, we learned that the messageboard server wasn't even housed in any of Disney' server farms. After a lot of hair pulling, we found the server was located in a satellite office in Sunnyvale. Evidently, the server was just on an engineers desk. When that engineer left the company he neglected to tell anyone about the box so when the new engineer took his spot, she found she didn't like the noise from the machine. So one day, she pulled the plug, and put it in some out of the way spot in the office. There wasn't a lot of traffic on it, but it still makes me laugh to think of all the Tim Allen fans in distress over a misplaced box.

  14. Eerie sight by NicknamesAreStupid · · Score: 5, Funny

    As Facebook went off-line, I witnessed the unthinkable at an Internet cafe. Young men and women, innocently engaging in social networking intercourse, were suddenly thrown out of their Facebook world and into the reality of the real world, as though all had taken the red pill. Images distorted into 3D with a startling range of colors, sounds beyond stereo, and smells -- odors for the new fifth sense. Everyone looked around to witness "super high def" of each other, and some actually stood to experience a new perspective. Then, as if in concert, the unplugged Facebookers began to touch each other. Immediately untapped hormones raged as ancient primal urges emerged for the first time. Just as it was about to become an orgy of primal lust, the Cafe manager flipped on a You Tube video of Elmo, http://www.youtube.com/watch?v=UZHSDjtD-dg, and a disaster was avoided.

  15. it's only facebook by thetoadwarrior · · Score: 5, Funny

    Who cares if it's down even for a day. Just talk about your pointless activities twice as long the next day.