Slashdot Mirror


The Computer Science Behind Facebook's 1 Billion Users

pacopico writes "Much has been made about Facebook hitting 1 billion users. But Businessweek has the inside story detailing how the site actually copes with this many people and the software Facebook has invented that pushes the limits of computer science. The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.' To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."

31 of 113 comments (clear)

  1. Oh bullshit. by Anonymous Coward · · Score: 5, Insightful

    The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.'

    Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?

    Facebook is utterly trivial compared to many problems out there.

    1. Re:Oh bullshit. by AdamWill · · Score: 2

      Indeed. Handily proven by "To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code.""

      a) that's a terrible idea, and b) the fact that it's even possible (if it is, sounds like business magazine bs to me) speaks volumes. I only work for Red Hat, we're pretty cool but we're hardly the biggest fish out there, and you can imagine the chaos if we tried that...I'm sure others can apply it to their companies with similar results.

    2. Re:Oh bullshit. by stephanruby · · Score: 4, Informative

      "Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."

      Ah that's Zuckerberg's secret sauce apparently, plenty of overtime for six-weeks so that a new engineer can learn every bit of Facebook's code. This way, they can push the limits of computer science (or disregard them completely) and ignore the lessons from the Mythical Man Month.

      I cringe to think that many business people will actually take BusinessWeeks' article seriously.

    3. Re:Oh bullshit. by Dan+East · · Score: 5, Informative

      Actually, Facebook's problem isn't trivial in any sense of the word. The complexity and joins of various database tables must be insane. With YouTube it's all about raw bandwidth, which actually is a fairly easy problem to solve especially since 99% of that data is static. You just physically distribute it and throw money / resources at the problem. As far as database structure, any CS student should be able to reproduce the bulk of it in a single day. You have videos associated with users, and comments associated with videos, etc. The gist of it is straightforward.

      Now let's talk about Facebook. There is no compartmentalization of the data. You've heard the "six degrees of separation", whereby any two people on the planet can be socially connected to one another in at most 6 steps. Well, with Facebook, the average degree of separation between any two people is 3.74. What that means is everyone is very closely networked, all the data is dynamic (or more specifically, the data the users really care about is the dynamic and most recent data), and since many people (myself included) open up their information to "friends of friends", there is a tremendous amount of data that any one person can potentially have access to. Even Google searches don't have this problem, because the bulk of the common search terms can be preprocessed for easy retrieval, and having data that's an hour or two old isn't a huge issue.

      So you have this massive database (1 billion users, each with many different types of associated data - posts, images, videos, things they've liked, things they've shared, etc, etc), and each of those 1 billion users has an entirely different set of friends from which recent (basically real-time) data must be polled - over, and over, and over again, all day long. Now, throw in the very complex privacy rules, as to which types of posts can be seen by which types of friends, groups, block lists, etc, and the problem becomes very, very complex. Sure, most of us could bang out something with that core functionality without too much difficulty, but to make it work nearly real-time for 1 billion users at once? That's an incredible undertaking.

      --
      Better known as 318230.
    4. Re:Oh bullshit. by toastking · · Score: 2

      Facebook isn't just about status updates. They have a whole robust API they use to interact with apps and other websites. It hosts music, events, photos, videos, app data, along with tons of user data with timeline. You can share anything from a cat video to a milestone of you losing weight. Serving up all that data in quick and well-presented manner to millions of people around the world is very difficult.

    5. Re:Oh bullshit. by russotto · · Score: 4, Funny

      Hard to believe it takes so long to learn Facebook's code. I work at Google, and I learned every bit of Google's code in one day.

      I don't think I'm giving away the store when I tell you the bits were '0' and '1'.

    6. Re:Oh bullshit. by jittles · · Score: 2

      Oh yeah. I worked on an embedded project that had custom kernel code as well as over 2 million lines in system libraries. No one could possibly know every single line of that. The project I was in charge of there maybe had 200,000 lines of code, and I often had to rely on comments to remember what goes where! I had the unfortunate aspect of being the only team on an embedded processor and had to fix cross platform issues with the system libraries too. It was a lot of work.

    7. Re:Oh bullshit. by kestasjk · · Score: 2, Funny

      Pff. Apache + hadoop + mysql + varnish. Easy.

      The other day I had to write a red-black tree in my CS152 class, now that's a tough problem!

      --
      // MD_Update(&m,buf,j);
    8. Re:Oh bullshit. by jittles · · Score: 4, Interesting

      Except that I don't believe they have 1 billion real users. They probably have 100m users and another 900m users in fake accounts people use to play Farmville, etc.

    9. Re:Oh bullshit. by rtaylor · · Score: 5, Interesting

      It's made infinitely easier by being asynchronous and 99% reads. There are no timing issues. If a post is delayed to someones screen by a minute or two, nobody dies.

      It's not terribly difficult to make numerous (near infinite) read-only replica's of a database which are within tens of milliseconds of the primary; so that takes care of 99% of their problems.

      Handling their write load is harder but keep in mind the vast majority of their accounts are idle; and again asynchronous writes make it much much easier. They can shove everything through a message queue and put heavy-weight sharding of the data behind that.

      I think handling 100 Million banking customers in 2000 was infinitely harder than Facebook has it from a technical standpoint.

      --
      Rod Taylor
    10. Re:Oh bullshit. by Intrepid+imaginaut · · Score: 4, Insightful

      You think so? One person in six on this earth, including infants and the elderly in developing countries without regular internet access has an active facebook account do they? Facebook's numbers have never been properly audited, its not in their best interests to do so. The more users they can claim, the better for them. I would agree with possibly a couple hundred million, but I have a really hard time believing much more than that.

    11. Re:Oh bullshit. by hairyfish · · Score: 2

      It's actually 1 in 7 now ;) I used to work for an ISP back when there lots of them, and we used to offer one month free for new members. Most people quit after the first month, but that didn't stop us advertising how many customers we had in our database. I'd be willing to bet a lot of money FB is doing the same thing. I have 3 FB accounts myself, one I use, one I use for signing up to all those crap services which only let you use an FB account (hello Spotify) and like to spam your status, and another for testing things that has no connection to anything else of mine. None of them have real names.

    12. Re:Oh bullshit. by dzfoo · · Score: 3, Interesting

      I wrote a red-black tree for fun the other day. What's the problem?

      --
      Carol vs. Ghost
      ...Can you save Christmas?
  2. 1 billion users by Anonymous Coward · · Score: 5, Funny

    I totally believe that Facebook has 1 billion users... because I am 4 of them.

    1. Re:1 billion users by flimflammer · · Score: 2

      Put me down for 6.

    2. Re:1 billion users by PeanutButterBreath · · Score: 2

      Would it make you feel worse if the number was a "mere" 250m? Or 100m?

      I am currently ignoring 2 different accounts, FWIW. Facebook keeps sending notifications of various uninteresting types to both, I assume that they are both considered "active".

      I joined with a buch of real life friends years ago, and it appears that about 1 in 10 ever post anything on a regular basis.

      [Shrugs]

    3. Re:1 billion users by L3370 · · Score: 3, Insightful

      If he can make 4, so can the bozo that wants to create a fake account to for your pets, browsing ex girlfriends, gaming Farmville perks, and avoiding your boss' prying eyes.

      In short, there aren't a billion people on facebook--nowhere near it. An important fact for businesses that are looking to tap into a network of "real" people.

  3. PHP by Coolhand2120 · · Score: 3, Insightful

    Oh yes, please tell me all about the computer geniuses that wrote the PHP scripts that power facebook!

    1. Re:PHP by Anonymous Coward · · Score: 3, Insightful

      Yea. Because everyone knows no real website could possibly be written in structured, maintainable PHP. Well, except the biggest site on the Internet.

    2. Re:PHP by Anonymous Coward · · Score: 5, Insightful

      PHP has proven to be the best web development kit. It's only persistent failure is the legacy growth of inconsistent api calls. For the rest, it's turing complete, does scale well, and most of all is the best tuned hammer for the job. It delivers.

      In effect, PHP is a huge C api with its own C like language constructs, a layer of abstraction which takes away the mundane and gets you building web sites.

      Now C is hailed for its great power, and not made fun of because of its ability to make real crappy, insecure code.
      PHP however is not hailed for its great power, and made fun of because of its ability to make real crappy, insecure code.

      It's all a matter of perspective. The problem is low level programmers who can't live with the fact people make a billion dollar without obsessing over pointers or garbage collection.

    3. Re:PHP by phantomfive · · Score: 4, Informative

      Their PHP compiler isn't secret. It's open source and freely available.

      --
      "First they came for the slanderers and i said nothing."
  4. Your first mistake... by MrEricSir · · Score: 5, Funny

    ...is looking for meaningful computer science discussion in a business magazine article.

    --
    There's no -1 for "I don't get it."
    1. Re:Your first mistake... by Pseudonym · · Score: 3, Insightful

      At the risk of stating the obvious, an information technology problem is not the same as a computer science problem.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    2. Re:Your first mistake... by amoeba1911 · · Score: 5, Funny

      What the hell did I ever do to you?

  5. Read the article, not much CS inside... by file_reaper · · Score: 2

    I'm kinda disappointed... I am truly interested in how Facebook scales and was hoping there would be actual Computer Science related material in the article... Any Facebook employees care to comment? What do you guys do to scale stuff? How about ./'ers from other companies that have to deal with scaling? Hell, how do porn sites scale? I've done the traditional Distributed Systems courses in University but I really wanted to know how it's done in the real world by AWS, Facebook etc...

  6. 1 billion users, analyzed by damn_registrars · · Score: 4, Funny
    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
  7. Terrible by thePsychologist · · Score: 5, Informative

    The print version is available.

    I don't recommend reading it. There is absolutely nothing in this article about the actual engineering problems behind scaling for this number of users and how these problems are solved. In fact, there is nothing technical at all in this article except for some vague descriptions of the "bootcamp".

    --
    "What lies behind us, and what lies before us are tiny matters compared to what lies within us." Ralph Waldo Emerson
  8. It's rather clever by Animats · · Score: 5, Informative

    It's actually a rather impressive setup. Some Facebook architects gave a talk in EE380 at Stanford a few years back. Originally, Facebook's architecture assumed that most "friends" would be regionally local, reflecting Facebook's college-campus origin. That's not how it worked out after some growth. So they have to assemble pages across regions and data centers. There's caching, but there's also active cache invalidation, which they can do because they control both sides of the cache. There's extensive inter-process communication, and it's not HTTP. There's a lot of PHP for the user-facing stuff, but it's compiled with their in-house compiler, not interpreted.

    Facebook's purpose is banal, but the technology behind it is non-trivial.

  9. Billion Users? by macbeth66 · · Score: 2

    How do they count these 'users'? I have six accounts myself and most people I know have at least two. Now, there is a poll for Slashdot; How many Facebook accounts do you have?

  10. the secret is simple by goombah99 · · Score: 3, Funny

    facebook.pl

    it's just one script in perl.

    --
    Some drink at the fountain of knowledge. Others just gargle.
  11. Not really one billion by kriston · · Score: 2

    It's not really one billion users. As any developer in any online service knows, the real figure is around 30% of the actual reported total. Still, it's no small challenge.

    --

    Kriston