How Facebook Keeps Messenger From Crashing On New Year's Eve (ieee.org)
Wave723 quotes IEEE Spectrum: On New Year's Eve, millions of people will use Facebook's Messenger app to wish friends and family a 'Happy New Year!' If everything goes smoothly, those messages will reach recipients in fewer than 100 milliseconds, and life will go on. But if the service stalls or fails, a small team of software engineers based in the company's New York City office will have to answer for it.
The article says the team "tested and tweaked the app throughout the year and will soon face their biggest annual performance exam," since Messenger's 1.3 billion monthly active users send more messages on New Year's Eve than any other day of the year. Many of them hit "send" at the exact moment when their clock strikes midnight, "and people often try to resend messages that don't appear to make it through right away, which piles on more requests."
The solution appears to be load testing, re-directing traffic, message batching, and discarding "read receipts" and temporarily disabling other minor Facebook functions -- or, more generally, what their engineering manager describes as "graceful degradation."
The article says the team "tested and tweaked the app throughout the year and will soon face their biggest annual performance exam," since Messenger's 1.3 billion monthly active users send more messages on New Year's Eve than any other day of the year. Many of them hit "send" at the exact moment when their clock strikes midnight, "and people often try to resend messages that don't appear to make it through right away, which piles on more requests."
The solution appears to be load testing, re-directing traffic, message batching, and discarding "read receipts" and temporarily disabling other minor Facebook functions -- or, more generally, what their engineering manager describes as "graceful degradation."
"The solution appears to be ..." Stuff we've known since 1999?
That should be Facebook's new corporate slogan: "Graceful Degradation."
Simple, do awful things that will make people avoid using any of your services.
AC comments get piped to
"The solution appears to be ..." Stuff we've known since 1999?
It's one thing to say you know how to do it...
Quite another when literally BILLIONS of people are using your services all at once - especially around NYE where it's not even spread through the day, it's a huge DDOS equivalent with billions of messages at midnight exactly...
Planning for that kind of load and super-extreme bursting is not easy, at all. No matter how much you "know".
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Want to keep it from crashing on New Year's Eve? Just to load the damn thing. There, simple. Problem solved.
I read at +2. If your post doesn't reach that level I will not see or respond to it.
what their engineering manager describes as "graceful degradation."
If they'd just use SystemD their problems would be solved! For that matter though, I wish FaceBook would gracefully degrade to /dev/null.
... for a week, it's that missing Net Neutrality thing that routinely hits and throttles NetFlix. Yeah, that's the ticket.
Good luck to them though, it's a good engineering textbook problem. Stupid, yet necessary. (We have specific peak load times because we just do. Same thing with water supply and SuperBowl breaks, or 8AM/5PM rush-hour traffic.)
FB should also offer a "delivery within 100ms or your money back!" guarantee. See? The timestamp says it was _delivered_ to _our_ servers in 100ms; it's not OUR fault that the carrier couldn't get thru
OTOH they could use one of the internet broadcast functions -- "Happy New Year" simulcast everywhere. And actually, I bet they've got a embedded HNY compression bit somewhere to slightly lessen the transfer load, the same for a few other extremely common phrases.
If the universe is someone's simulation -- does that mean the stars are just stuck pixels?
These boys are clever
... couldn't they simply split their users up into, say, 24 groups, and reduce the load that way?
http://harridanic.com
A modified version of ejabberd, including what they got through the Whatsapp acquisition. Guess PHP/C++ never solved all of Facebook's problems.
facebook messenger is brittle poorly tuned garbage that cannot handle an ordinary upsurge in human use. AIM never had to be reengineered to survive new years eve without crashing, and it wasn't really all that good it just wasnt a flaming heap of shit
Snowden and Manning are heroes.
By stealing dimes from the Elves?
Yeah, it's called money. All you need is money.
How many years did we all have to suffer through Twitter Fail Whales while they were flush with cash?
There are plenty of examples of giant well funded enterprises with websites that utterly suck and can handle just about no load - especially if you look at websites where tech is secondary, any kind of unexpected load and BAM they are usually down.
Money can indeed help to buy the servers you may really need to handle load. Money can even help hire the people that understand how to handle load.
But money does not ENSURE you will have either thing.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I remember my father dealing with the same issue (everyone wants to communicate at midnight) for New York Telephone half a century ago.
I thought that facebook would remedy the problem be censoring half of them.
There's something like 26 midnights (timezones) around the world. Where's the problem?
The vast majority of messages avoid that peak: hardly anyone waits for the exact midnight to send a message. So the load gets smeared onto quite a chunk of time.
Look around you at the next NYE party and you will see just how wrong you are. Most people queue them up ahead of time and lots of people are hitting Send as the ball drops... (hint to devs, if someone has typed a partial message transmit that to the server in case they come back and hit send later - course Facebook was just screwed by that recently when it was found they had cached images on the server from never sent messages...).
At least it is spread across time zones but that is still a LOT of people, especially from the U.S. coasts.
The engineering problem boils down to: send short messages between pairs of arbitrary sources and destinations (although usually the source and destination are close to each other), with message size usually within 50-100 bytes. Let's be generous and say that with metadata they fit within 1500 bytes
Come on man, you know that modern web API's are not that compact, and we are talking Facebook here. You are off by an order of magnitude at least, way more when you stop to think that on NYE way more people are sending images also... One single response to a post on Facebook I just did with 14 words had a 9.5kb body going out, and a 21.2 k response.
Let's estimate the flow: after everyone raises the toast, exchanges hugs and kisses, says greetings, then sits down with the phone -- sending, let's say, 10 messages. This should take around half an hour. You get 300K messages per second. Not so impressive...
Think MILLIONS, possibly BILLIONS and you might be closer to the mark. On a *normal* day, Messenger and Whats App process over 60 billion messages a day... so that is 2.5billion messages every hour *normally*.
And that was from 2016. Do you think people send more, or fewer messages now than then!
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Millions of people.. sending a small TCP packet... containing a couple of hundred characters...
Wow. Gosh. The infrastructure that must take to handle...
Like... a couple of servers in a rack and a few gigabits of uplink at worst.
Honestly, has modern technology come to this?
One single YouTube video probably has more bandwidth, more data transferred, more CPU usage and less latency.
There's no way to reply to all the misinformed commenters here, but I'm really surprised at how naive the majority are. Clearly most of you have never worked with problems at this scale...this is a far more difficult problem to solve than you all think.
Ever hear of time zones?
this wouldn't be worthy of an explanation, but i suppose given the shit quality of facebook's code it's nothing short of a miracle. don't believe me? look at what they have on github. it's trash. don't believe that's representative of the quality of the code inside the company? if you can find one, ask any decent software engineer that works there and they'll let you know it is in fact worse.
if this type of load takes them any more than a few machines, they should be embarrassed.
Facebook having a bad year... let's finish it off with a feel good about how facebook is the only company that helps all these billions communicate...
"Graceful degradation" is the unsung hero of properly engineered systems.
well obviously not every device with its own messenger app gets a dedicated physical connection to the server. somewhere along the line, those 10 devices send to a antenna-node and the tower maybe has 10 elements and then the tower aggregates these into ONE physical link to a sub-station, which itself is connected to another 10 towers. from this sub-station another ONE physical link aggregates to a county sub-station etc etc. ...
what i would look out for (and marvel at) is how these aggregation routers spin-and-weave all this single connections into bigger and bigger SINGLE connections
Another factor may be that FB pissed so many people off by abusing their privacy that they deleted Messenger altogether. I did, anyway. Come on, people. Invite your friends for a gathering or accept another friend or family member's invitation. A messenger greeting blast has about as much impact and is about as memorable as a highway billboard encountered at 80 miles per hour. Do something meaningful.
"...who search the reason of things
Are those who bring the most sorrow on themselves." --Euripides, The Medea