Scaling Facebook To 140 Million Users
1sockchuck writes "Facebook now has 140 million users, and in recent weeks has been adding 600,000 new users a day. To keep pace with that growth, the Facebook engineering team has been tweaking its use of memcached, and says it can now handle 200,000 UDP requests per second. Facebook has detailed its refinements to memcached, which it hopes will be included in the official memcached repository. For now, their changes have been released to github."
Myspace used to run on cold fusion but switched to .NET. facebook runs on LAMP, though they have a customized MySQL and a customized linux kernel with support for the hierarchial page pinning algorithm.
Do you even lift?
These aren't the 'roids you're looking for.
From hardware perspective, Facebook uses 10,000 web servers and 1800 database servers to handle the massive traffic.
Nope. Facebook has more unique visitors per month, MySpace had approximately 106 millions users as of 8th September 2008, and FTFS, facebook has 140 million (Wikipedia says 120 million.)
And they also use about 200 memcached servers to speed things up.
Source: http://frro.net/blog/2008/04/26/just-how-big-is-facebooks-infrastructure/
2. If you'd read the next sentence right after your bold line, you'd notice they were talking about a kernel lock. Not a lock in memcached. Thats a totally valid reason to blame linux.
How do you hope to architect a fix for this? Thought I don't know the specifics, they said that they were using the same UDP socket to transmit from multiple threads. That means you have one kernel space data structure across the entire UDP/IP stack being shared by multiple threads. Therefore you need a lock around updates to that data structure.
Until we see some atomic sendto() operations this is not going to change.
I recently had trouble with my copy of Firefox on my home desktop. Even though adblock and filterset updater were installed i wasn't blocking any ads (i've since fixed it).
I was amazed at how many sites i regularly frequent that are now plastered in ads and horrible to use.
Mutexes aren't always slow. In the uncontended case they don't require a system call (although they do require an atomic operation which involves some inter-processor signalling).
Lockless algorithms are generally harder to get right, from what I've seen. It's not just locking the cpus for a cycle, but you also need to worry about using memory barriers (generally written in assembly) to enforce correct visibility across all cpus in the system.
There are guys on comp.programming.threads that spend a *lot* of time trying to perfect them, and there are often subtle errors that pop up later on. Given the number of problems that regular lock-based algorithms cause, I'd only use lockless if it's absolutely necessary.