Slashdot Mirror


The Computer Science Behind Facebook's 1 Billion Users

pacopico writes "Much has been made about Facebook hitting 1 billion users. But Businessweek has the inside story detailing how the site actually copes with this many people and the software Facebook has invented that pushes the limits of computer science. The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.' To keep Facebooking moving fast, Mark Zuckerberg apparently instituted a program called Boot Camp in which engineers spend six-weeks learning every bit of Facebook's code."

9 of 113 comments (clear)

  1. Oh bullshit. by Anonymous Coward · · Score: 5, Insightful

    The story quotes database guru Mike Stonebraker saying, 'I think Facebook has the hardest information technology problem on the planet.'

    Really? You think keeping track of some people's dinner plans is the hardest IT problem on the planet? How about YouTube storing and serving truly ludicrous amounts of video. Web search? Watson?

    Facebook is utterly trivial compared to many problems out there.

    1. Re:Oh bullshit. by Dan+East · · Score: 5, Informative

      Actually, Facebook's problem isn't trivial in any sense of the word. The complexity and joins of various database tables must be insane. With YouTube it's all about raw bandwidth, which actually is a fairly easy problem to solve especially since 99% of that data is static. You just physically distribute it and throw money / resources at the problem. As far as database structure, any CS student should be able to reproduce the bulk of it in a single day. You have videos associated with users, and comments associated with videos, etc. The gist of it is straightforward.

      Now let's talk about Facebook. There is no compartmentalization of the data. You've heard the "six degrees of separation", whereby any two people on the planet can be socially connected to one another in at most 6 steps. Well, with Facebook, the average degree of separation between any two people is 3.74. What that means is everyone is very closely networked, all the data is dynamic (or more specifically, the data the users really care about is the dynamic and most recent data), and since many people (myself included) open up their information to "friends of friends", there is a tremendous amount of data that any one person can potentially have access to. Even Google searches don't have this problem, because the bulk of the common search terms can be preprocessed for easy retrieval, and having data that's an hour or two old isn't a huge issue.

      So you have this massive database (1 billion users, each with many different types of associated data - posts, images, videos, things they've liked, things they've shared, etc, etc), and each of those 1 billion users has an entirely different set of friends from which recent (basically real-time) data must be polled - over, and over, and over again, all day long. Now, throw in the very complex privacy rules, as to which types of posts can be seen by which types of friends, groups, block lists, etc, and the problem becomes very, very complex. Sure, most of us could bang out something with that core functionality without too much difficulty, but to make it work nearly real-time for 1 billion users at once? That's an incredible undertaking.

      --
      Better known as 318230.
    2. Re:Oh bullshit. by rtaylor · · Score: 5, Interesting

      It's made infinitely easier by being asynchronous and 99% reads. There are no timing issues. If a post is delayed to someones screen by a minute or two, nobody dies.

      It's not terribly difficult to make numerous (near infinite) read-only replica's of a database which are within tens of milliseconds of the primary; so that takes care of 99% of their problems.

      Handling their write load is harder but keep in mind the vast majority of their accounts are idle; and again asynchronous writes make it much much easier. They can shove everything through a message queue and put heavy-weight sharding of the data behind that.

      I think handling 100 Million banking customers in 2000 was infinitely harder than Facebook has it from a technical standpoint.

      --
      Rod Taylor
  2. 1 billion users by Anonymous Coward · · Score: 5, Funny

    I totally believe that Facebook has 1 billion users... because I am 4 of them.

  3. Your first mistake... by MrEricSir · · Score: 5, Funny

    ...is looking for meaningful computer science discussion in a business magazine article.

    --
    There's no -1 for "I don't get it."
    1. Re:Your first mistake... by amoeba1911 · · Score: 5, Funny

      What the hell did I ever do to you?

  4. Terrible by thePsychologist · · Score: 5, Informative

    The print version is available.

    I don't recommend reading it. There is absolutely nothing in this article about the actual engineering problems behind scaling for this number of users and how these problems are solved. In fact, there is nothing technical at all in this article except for some vague descriptions of the "bootcamp".

    --
    "What lies behind us, and what lies before us are tiny matters compared to what lies within us." Ralph Waldo Emerson
  5. It's rather clever by Animats · · Score: 5, Informative

    It's actually a rather impressive setup. Some Facebook architects gave a talk in EE380 at Stanford a few years back. Originally, Facebook's architecture assumed that most "friends" would be regionally local, reflecting Facebook's college-campus origin. That's not how it worked out after some growth. So they have to assemble pages across regions and data centers. There's caching, but there's also active cache invalidation, which they can do because they control both sides of the cache. There's extensive inter-process communication, and it's not HTTP. There's a lot of PHP for the user-facing stuff, but it's compiled with their in-house compiler, not interpreted.

    Facebook's purpose is banal, but the technology behind it is non-trivial.

  6. Re:PHP by Anonymous Coward · · Score: 5, Insightful

    PHP has proven to be the best web development kit. It's only persistent failure is the legacy growth of inconsistent api calls. For the rest, it's turing complete, does scale well, and most of all is the best tuned hammer for the job. It delivers.

    In effect, PHP is a huge C api with its own C like language constructs, a layer of abstraction which takes away the mundane and gets you building web sites.

    Now C is hailed for its great power, and not made fun of because of its ability to make real crappy, insecure code.
    PHP however is not hailed for its great power, and made fun of because of its ability to make real crappy, insecure code.

    It's all a matter of perspective. The problem is low level programmers who can't live with the fact people make a billion dollar without obsessing over pointers or garbage collection.