Facebook Unveils Details of Downtime
An anonymous reader writes "Facebook officially gave out more technical details on the endless loop in a database control mechanism that forced a 2.5-hour shutdown of the social site, and the resulting combination of a productivity burst, increased fertility (check back on June 25, 2011) and mass hysteria all around the world."
The technical details are that I have an incompatible browser? Really Slashdot? Did you even check the links? of course not...
Get a web developer
Meh, it happens...just like a power company, no one says a word when the thing works fine for weeks or months at a time...but when it goes down for a couple hours, people act like it never works.
Living With a Nerd
Obviously, the error was caused by too many people not keeping it real.
Since the link in the summary is broken, this is the facebook blog post.
Post contents:
Early today Facebook was down or unreachable for many of you for approximately 2.5 hours. This is the worst outage we’ve had in over four years, and we wanted to first of all apologize for it. We also wanted to provide much more technical detail on what happened and share one big lesson learned.
The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.
The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.
Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.
To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn’t allow the databases to recover.
The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.
This got the site back up and running today, and for now we’ve turned off the system that attempts to correct configuration values. We’re exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.
We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.
In the downtime I tried to show the woman the virtues of an IRC channel she could share with family and friends and how much less bullshit she would find there..... "does it have farmville?". There is no hope for the sheeples, the planet is doomed
and mass hysteria all around the world.
[citation needed].
First I knew was when I read about it on another tech blog, hours after it'd happened...and I use Facebook. And I work with a ton of people who use it (grad students.)
There wasn't mass hysteria; there was mass ambivalence. I'm now reading all these blog/news postings about how "everyone" went crazy. Nobody was talking about it where I ate dinner. Nobody was talking about it where I had coffee that evening. It didn't make my city newspaper- no "Facebook down, residents in despair" stories to be found.
All this coverage claiming that everyone went nuts seems like a desperate attempt by Facebook PR to make something positive out of this...namely, trying to convince us that Facebook is so integral to the people who use it, it must, of course, be to us as well.
Please help metamoderate.
If the woman has a GameCube or Wii, and you want to get her off FarmVille, try buying her a copy of Harvest Moon: Magical Melody.
Let's look at the important thing here with this outage: How many cows, pigs, chickens, cats, goldfish, etc were made to suffer? I know my girlfriend couldn't take care of her virtual cats, and their litterbox ended up full. They were not at all happy. I'm sure the same thing played out across thousands of FarmVille, MyPets, etc accounts. Please, won't someone think of the animal?
If you suck so bad on a global scale long enough, eventually the universe tries to step in.
Does the clock stop for Farmville if Facebook goes down?
I wonder if Twitter had a noticable increase in usage during the Facebook outage, or other social portals?
Tired of my customary (Score:1)
instead of people checking facebook every 5 minutes for the latest, very important, updates as they always do they now constantly was hitting reload for 2.5 hour
who shot the cat in the hat to experiment is insane
What happened? Someone friended themselves?
You know that will make you go blind.
The drones just have a slow uptake speed. Not to worry, it will be replaced soon enough with a different, equally less real, relationship framework. Sad, yes. But the followers will be calmed and I won't have to look at any more images of their freakish Imperial Leader who doesn't seem to be able to relate easily to others.
Posting anonymously because I care about slashdot even less than do the moderators.
I spend more time on DubLi. Facebook is just a waste of time. If I want to know what's going on with my friends, I'll talk to them. Simpl
2.5 hours of my life. I was so out of touch, so isolated, so alone. When it finally came backup it was such a relief, I could function again, i knew what was going on, I was back in touch. I never want to feel like that again.
What is this Facebook thing? Isn't that something kids do on computers?
Unless the particular arrangement of pixels on a Facebook webpage caused a powerful alignment of EM radiation, fertility was not affected. Perhaps fecundity was, but not fertility.
My favorite server downtime story occurred back in early 2000 when I was working for Disney's Internet Group. All the message boards for the film and television websites ended up crashing. No one knew the cause and as the web-ops team investigated, we learned that the messageboard server wasn't even housed in any of Disney' server farms. After a lot of hair pulling, we found the server was located in a satellite office in Sunnyvale. Evidently, the server was just on an engineers desk. When that engineer left the company he neglected to tell anyone about the box so when the new engineer took his spot, she found she didn't like the noise from the machine. So one day, she pulled the plug, and put it in some out of the way spot in the office. There wasn't a lot of traffic on it, but it still makes me laugh to think of all the Tim Allen fans in distress over a misplaced box.
best summary on /. ever!
$ unzip, strip, touch, finger, grep, mount, fsck, more, yes,fsck,fsck,fsck,umount, sleep
As Facebook went off-line, I witnessed the unthinkable at an Internet cafe. Young men and women, innocently engaging in social networking intercourse, were suddenly thrown out of their Facebook world and into the reality of the real world, as though all had taken the red pill. Images distorted into 3D with a startling range of colors, sounds beyond stereo, and smells -- odors for the new fifth sense. Everyone looked around to witness "super high def" of each other, and some actually stood to experience a new perspective. Then, as if in concert, the unplugged Facebookers began to touch each other. Immediately untapped hormones raged as ancient primal urges emerged for the first time. Just as it was about to become an orgy of primal lust, the Cafe manager flipped on a You Tube video of Elmo, http://www.youtube.com/watch?v=UZHSDjtD-dg, and a disaster was avoided.
Who cares if it's down even for a day. Just talk about your pointless activities twice as long the next day.
FB needs perspective [New York Post].
That'd be a bit more convincing if it were more than just an ad for a $3.95 article.
Any chance that this problem is based on the choice of database? Does the error handling for NoSQL-style databases, probably an entirely proprietary idea per implementation, lend to the problem? I would like to hear the thoughts on this from people in-the-know (i.e. people who actually use NoSQL).
Yes. There was a surge of idiots posting that Facebook was down.
It's just Facebook... If people really massively get hysterical over the unavailability of Facebook, that should count as yet another thing horribly, horribly wrong with the world...
I am not devoid of humor.