How Facebook Keeps Messenger From Crashing On New Year's Eve (ieee.org)
Wave723 quotes IEEE Spectrum: On New Year's Eve, millions of people will use Facebook's Messenger app to wish friends and family a 'Happy New Year!' If everything goes smoothly, those messages will reach recipients in fewer than 100 milliseconds, and life will go on. But if the service stalls or fails, a small team of software engineers based in the company's New York City office will have to answer for it.
The article says the team "tested and tweaked the app throughout the year and will soon face their biggest annual performance exam," since Messenger's 1.3 billion monthly active users send more messages on New Year's Eve than any other day of the year. Many of them hit "send" at the exact moment when their clock strikes midnight, "and people often try to resend messages that don't appear to make it through right away, which piles on more requests."
The solution appears to be load testing, re-directing traffic, message batching, and discarding "read receipts" and temporarily disabling other minor Facebook functions -- or, more generally, what their engineering manager describes as "graceful degradation."
The article says the team "tested and tweaked the app throughout the year and will soon face their biggest annual performance exam," since Messenger's 1.3 billion monthly active users send more messages on New Year's Eve than any other day of the year. Many of them hit "send" at the exact moment when their clock strikes midnight, "and people often try to resend messages that don't appear to make it through right away, which piles on more requests."
The solution appears to be load testing, re-directing traffic, message batching, and discarding "read receipts" and temporarily disabling other minor Facebook functions -- or, more generally, what their engineering manager describes as "graceful degradation."
Simple, do awful things that will make people avoid using any of your services.
AC comments get piped to
"The solution appears to be ..." Stuff we've known since 1999?
It's one thing to say you know how to do it...
Quite another when literally BILLIONS of people are using your services all at once - especially around NYE where it's not even spread through the day, it's a huge DDOS equivalent with billions of messages at midnight exactly...
Planning for that kind of load and super-extreme bursting is not easy, at all. No matter how much you "know".
"There is more worth loving than we have strength to love." - Brian Jay Stanley
"Graceful degradation" is the unsung hero of properly engineered systems.