The Computer Science Behind Facebook's 1 Billion Users
pacopico writes "Much has been made about Facebook hitting 1 billion users. But Businessweek has the inside story detailing how the site actually copes with this many people and the software Facebook has invented that pushes the limits of computer science. The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.' To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."
The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.'
Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?
Facebook is utterly trivial compared to many problems out there.
Fsck Facebook.
I totally believe that Facebook has 1 billion users... because I am 4 of them.
Oh yes, please tell me all about the computer geniuses that wrote the PHP scripts that power facebook!
...is looking for meaningful computer science discussion in a business magazine article.
There's no -1 for "I don't get it."
Probably out-of-context, as this whole site could be flushed down the toilet and
not much would happen - ads would'nt get fed to the gullable. Oh, dear.
Now the mastercard and visa credit cards networks - that is for real and makes
fb look like child's play. Which it is.
Bits of management, but definitely no CS in that story!
Everything you need to know in only 6 weeks!
Social networking maps very nicely to decentralized resources.
(I know who my friends are, and I can scrape their RSS feeds by myself.)
When you try to cram all that into one data center, and then try to replicate that across many data centers in real time ... yep, you've got a problem.
The mistake is in the belief that it's an "information technology" problem.
I'm kinda disappointed... I am truly interested in how Facebook scales and was hoping there would be actual Computer Science related material in the article... Any Facebook employees care to comment? What do you guys do to scale stuff? How about ./'ers from other companies that have to deal with scaling?
Hell, how do porn sites scale?
I've done the traditional Distributed Systems courses in University but I really wanted to know how it's done in the real world by AWS, Facebook etc...
Here's the breakdown.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
The print version is available.
I don't recommend reading it. There is absolutely nothing in this article about the actual engineering problems behind scaling for this number of users and how these problems are solved. In fact, there is nothing technical at all in this article except for some vague descriptions of the "bootcamp".
"What lies behind us, and what lies before us are tiny matters compared to what lies within us." Ralph Waldo Emerson
I'm sure there are some smart people working on how to mine every last drop of money out of our private lives at facebook, but IT?
Last I heard, fb uses mysql. That's not cutting edge CS.
It's actually a rather impressive setup. Some Facebook architects gave a talk in EE380 at Stanford a few years back. Originally, Facebook's architecture assumed that most "friends" would be regionally local, reflecting Facebook's college-campus origin. That's not how it worked out after some growth. So they have to assemble pages across regions and data centers. There's caching, but there's also active cache invalidation, which they can do because they control both sides of the cache. There's extensive inter-process communication, and it's not HTTP. There's a lot of PHP for the user-facing stuff, but it's compiled with their in-house compiler, not interpreted.
Facebook's purpose is banal, but the technology behind it is non-trivial.
I always wondered why it seemed like Facebook was written by a bunch of 16 yr old hackers, shipping half-baked buggy code - but know I know the truth: it's written by _thousands_ of engineers shipping half-baked code - every day!
Still, the Zuck is now worth billions of dollars, so maybe I have something to learn from the whole experiment... ... Nah, I'm still going to test my own code.
Not much computer science here, I was expecting more technical details from the summary.
Nothing I ever made got popular enough to even require more than one server.
Is what you meant.
With the way Facebook runs, surely it doesn't take much more than a six-hour lecture to learn.
Facebook's 1 Billion Users is very simple -- There isn't 1 Billion (unique) Living, Breathing, Computer Using Humans on Facebook.
If you believe there are actually 1 Billion, completely unique users on Facebook, then I need to ask that each of you turn over your Internet Licenses, power down your computers, and find a new hobby. You are just to dangerous to be allowed on the Internet without adult supervision.
Ever hear of bot nets?
You know...all those virus infected zombie computers that have been using networks on IRC for years. Well...these bots are now using Facebook, Twitter, and probably any other social network now.
If you still refuse to believe the facts. Well...I guess you must be OK living your lives being gullible -- unable to reason or use logic to derive a truth that actually makes sense.
As for the author, Ashlee Vance -- you are a failure as a journalist. You fail, because you are unable to do any real research to uncover real information and only write stories that have potential for getting your story published and hopefully someone will noticed you and throw fame and fortune at you. Keep Dreaming!
The HackDefendr
Keywords for the NSA overthrow oppressive regime true believers marathon Manhatten the financial district blueprints I
How do they count these 'users'? I have six accounts myself and most people I know have at least two. Now, there is a poll for Slashdot; How many Facebook accounts do you have?
I don't think I'm giving away the store when I tell you the bits were '0' and '1'.
Bad News: There will be a test.
Good News: it is true / false.
Let's see how your scan-tron scores... R.I.P.
This issue is a bit more complicated than you think.
Facebook acts like these are all live humanoids. Well, as anyone who posted a signup sheet for IM sports in college can attest, 1/3 of everyone who signs up for anything, anywhere, ever, is fake. See also, "Google+"
Facebook actually thinks these are one billion distinct humanoids? Zuck is stoopider than his investors look. As anyone who ever posted a signup sheet in a college dorm for IM softball can attest, at least a third of people who sign up for anything, anywhere, ever, are fake.
---------------------------------------
Rotate the pod, please, HAL....
You misinterpreted the heading - Facebook has the hardest information technology problem on the planet.
That information technology problem has nothing to do with servers and storage.
The hardest information technology problem on the planet is: How do the Facebook exec's stop the company going the way of Silicon Graphics (NYSE: SGI) - oh wait, no, (DELISTED by NYSE because the share price couldn't stay above $1: SGI); since the company creates no real value, and has done nothing but drop it's price since IPO.
*THAT* is the problem that Google isn't facing.
"The complexity and joins of various database tables must be insane."
Nah. You simply put one users data in one place (well more than one place for redundancy, but two or 3 not lots of places).
To build a page you can ask each machine processing that persons data. You ask the machines processing their fiends data for that data, and build the page. Arrange your network so that groups of machines are in subnets, and place the users data based on the connectivity onto machines in the subnet. So more connected users are on the same subnet.
The idea that you'd chuck everything in some massive database and make everything an SQL query, well that's not a good design.
Instead of connecting 1 billion people to 1 billion other people, it's really just connect Bob to his 10 friends * 1 billion page serves which is just a scaled up version.
facebook.pl
it's just one script in perl.
Some drink at the fountain of knowledge. Others just gargle.
In the 1960's we sent a man to the moon using the caveman's equivalent of today's technology tools. And now people think that Facebook is the top of the complexity pyramid? Sure they have a big scaling issue which nobody says is easy to solve, but to claim that it's the most complex technology problem the world faces today is laughably stupid.
It just means it doesn't work well enough.
Facebook is the worst performing and most opaque large scale site with the worst interface that I use regularly.
Browsing photos, the most basic Facebook activity is still a pain and buggy as hell on a slowish connection, and they keep changing the damn interface just when you figured out the previous unintuitive change. The mobile website sucks, their Android app sucks, I don't know what the new iOS app is like. The interface has gone from simplicity to being cluttered and horrible with multiple stream throwing information at you.
If Gmail worked like that I would have quit ages ago. If Amazon worked like that they wouldn't sell shit. Facebook still feels like a damn experiment coded by a few kids in a basement. If Youtube worked like that they would have been replaced long ago as the defacto video hosting site.
engineers it takes to keep such massive infrastructure up and running. If all it takes today is 2000 people, to manage the data of a billion people, then I really can't see a very __large__ need for software developers in the future.
It's not really one billion users. As any developer in any online service knows, the real figure is around 30% of the actual reported total. Still, it's no small challenge.
Kriston
I would be interested in learning more about the software and hardware side of Facebook. But after 15 seconds of scrolling I hadn't seen any ... just a lot of tedious "gotta do this" journalism ... and gave up. LOOOOOONG BOOOORING
"You must try to forget all you have learned. You must begin to dream." -- Sherwood Anderson
Good grief, editors, anywhere? Can we at least have a non-dupe in between the dupes, or interleave the dupes? Previous story was Why Worms In the Toilet Might Be a Good Idea.
Yeah, YouTube really nailed the comment system...
.. hearing day after day about Facebook or Zuckerberg, seeing Zuckerberg's face in some *cough* "creative" way or hearing him heralded as some business guru (let's just say I disagree).
I think the face is the worst. I can live with the claims of him being an innovator, I got inured to that after decades worth of Microsoft marketing.
Hell, I may switch back to a text only browser for my news - speeds things up as well.
Insert
The article was interesting but has nothing to do with computer science. It was about their development and release process and some of their management and datacenter ideas.
If you are interested in the real computer science there are writings out there about their database and memcached setup.