Early today Facebook was down or unreachable for many of you for approximately 2.5 hours. This is the worst outage we've had in over four years, and we wanted to first of all apologize for it. We also wanted to provide much more technical detail on what happened and share one big lesson learned.
The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.
The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn't work when the persistent store is invalid.
Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.
To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn't allow the databases to recover.
The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.
This got the site back up and running today, and for now we've turned off the system that attempts to correct configuration values. We're exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.
We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.
While, I am all for looking for life outside of earth (I am a firm supporter of the concept of Extra Terrestrial Intelligence), I have difficulty comprehending, why we keep looking for life signs that resemble that on Earth.
We (life on earth) had a unique set of conditions and situations that led to life forming here and eventually to humans evolving.
There doesn't seem to be (to my knowledge--correct me if I am wrong) anything to support the fact that similar chain of events have taken place on Mars... so why should life there , resemble in anyway what we expect it to do ?
The article says key molecules associated with life. How do we know what Martian Life forms have these molecules ? For all we know, one of the rocks that sit alongside one of the many instruments we sent there, maybe representative of Martian Life.... for all we know, they might be moving or have some phenomena that we don't observe simply because we are not either in a position to know how to look for it, or we can't observe it... We can see light and UV and IR (by instruments)..how if life there exisits in some pure energy form ? (and we can't see/feel/observe it ?)
I am all in support of this mission, but I believe that the space exploration agencies should try to keep a more open mind as to what they expect to find life to resemble out side of our planet.
Unbelievable....I'm the first to comment on this article!!!!
What's up people? no one interested in a Star Trek Parody? or is this the way we geeks are supposed to pay tribute to the Star Trek series? by not posting any comments on an article/link that makes fun of one of the greatest TV series of all times? If that's so, then I'm with you..
Just a thought, you did mention you wanted a "work in progress" kind of solution (OSS), but if you could get the data onto MySql or some other OSS database, a set of PHP/SQL queries could read off the required data (addresses) and then you could probably format and send to printer (am not too familiar with playing around with printer drivers and relate formatting)
Early today Facebook was down or unreachable for many of you for approximately 2.5 hours. This is the worst outage we've had in over four years, and we wanted to first of all apologize for it. We also wanted to provide much more technical detail on what happened and share one big lesson learned.
The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.
The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn't work when the persistent store is invalid.
Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second.
To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didn't allow the databases to recover. The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.
This got the site back up and running today, and for now we've turned off the system that attempts to correct configuration values. We're exploring new designs for this configuration system following design patterns of other systems at Facebook that deal more gracefully with feedback loops and transient spikes.
We apologize again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.
http://www.computerworld.com/action/article.do?command=printArticleBasic&taxonomyName=Careers&articleId=9115616&taxonomyId=10
http://www.extremetech.com/print_article2/0,1217,a=226537,00.asp
While, I am all for looking for life outside of earth (I am a firm supporter of the concept of Extra Terrestrial Intelligence), I have difficulty comprehending, why we keep looking for life signs that resemble that on Earth. We (life on earth) had a unique set of conditions and situations that led to life forming here and eventually to humans evolving.
... for all we know, they might be moving or have some phenomena that we don't observe simply because we are not either in a position to know how to look for it, or we can't observe it... We can see light and UV and IR (by instruments)..how if life there exisits in some pure energy form ? (and we can't see/feel/observe it ?)
There doesn't seem to be (to my knowledge--correct me if I am wrong) anything to support the fact that similar chain of events have taken place on Mars... so why should life there , resemble in anyway what we expect it to do ?
The article says key molecules associated with life. How do we know what Martian Life forms have these molecules ? For all we know, one of the rocks that sit alongside one of the many instruments we sent there, maybe representative of Martian Life.
I am all in support of this mission, but I believe that the space exploration agencies should try to keep a more open mind as to what they expect to find life to resemble out side of our planet.
with so many posts on the older pc's still running around, I thought i'd add my contribution..
Mine is a 486DX2 @66MHz, with 8MB RAM and a 1GB HDD.
It still manages to run Win95 (though really slowly compared to my current desktop)
My current desktop is a PIII@600MHz with 192MB RAM, and
2x 40GB HDD.
Mod one up for a worthwhile comment. Thx for the link
Unbelievable....I'm the first to comment on this article!!!!
/link that makes fun of one of the greatest TV series of all times? If that's so, then I'm with you..
What's up people? no one interested in a Star Trek Parody? or is this the way we geeks are supposed to pay tribute to the Star Trek series? by not posting any comments on an article
er...mr overlord...
i too have attempted to post several times, but to no avail, all rejects. this is my first approved post in a loooong time...
Just a thought, you did mention you wanted a "work in progress" kind of solution (OSS), but if you could get the data onto MySql or some other OSS database, a set of PHP/SQL queries could read off the required data (addresses) and then you could probably format and send to printer (am not too familiar with playing around with printer drivers and relate formatting)
-C'nut
f.y.i Firefox 1.0.3 also fails the Acid2 test.