Slashdot Mirror


Facebook's Corona: When Hadoop MapReduce Wasn't Enough

Nerval's Lobster writes "Facebook's engineers face a considerable challenge when it comes to managing the tidal wave of data flowing through the company's infrastructure. Its data warehouse, which handles over half a petabyte of information each day, has expanded some 2500x in the past four years — and that growth isn't going to end anytime soon. Until early 2011, those engineers relied on a MapReduce implementation from Apache Hadoop as the foundation of Facebook's data infrastructure. Still, despite Hadoop MapReduce's ability to handle large datasets, Facebook's scheduling framework (in which a large number of task trackers that handle duties assigned by a job tracker) began to reach its limits. So Facebook's engineers went to the whiteboard and designed a new scheduling framework named Corona." Facebook is continuing development on Corona, but they've also open-sourced the version they currently use.

8 of 42 comments (clear)

  1. Re:Junk. by isopropanol · · Score: 2

    But between you and 1000 other people who care about slightly different sets, much of it is stuff that someone cares about.

  2. Re:Junk. by Daniel+Dvorkin · · Score: 4, Insightful

    Too bad that's 99.9% junk I don't care about.

    But between you and 1000 other people who care about slightly different sets, much of it is stuff that someone cares about.

    This. 99.9% (at least) of the entire internet is junk that any one person doesn't care about. But every bit has someone who cares about it (or did at one time) or it wouldn't be there.

    Well. I opened the story expected some reflexive Facebook-bashing, and I wasn't disappointed. When are people going to realize that FB's just another internet company with a reasonably successful business model, and worthy of neither adulation nor hatred?

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  3. Re:Junk. by Revotron · · Score: 4, Funny
    Yes, Facebook sure would be a lot more successful if 99.9% of people's posts got deleted and replaced with an on-screen notification that reads,

    This post has been removed because it is of no interest to Anonymous Coward. Please try posting things more in line with the following categories:

    1. Linux
    2. Open-source software
    3. Richard M Stallman
    4. OMG!!! PONIES!!!

  4. Re:Misleading headline by ArcadeMan · · Score: 4, Funny

    And why the fuck should I care about Windows 8 tablets? You are not making any sense!

  5. Facebook by gman003 · · Score: 3, Interesting

    I have to admit, while I hate using Facebook, and hate most of their business practices, I like how they're not just writing new infrastructure software, but are open-sourcing it all. I don't think it quite makes up for everything else, but it helps.

  6. Re:Junk. by SolitaryMan · · Score: 2

    So, you care about (1 - 0.999) * 500 TB = 500 GB of Facebook information every day??? Dude, where do you get the time?

    --
    May Peace Prevail On Earth
  7. Re:What? by Em+Adespoton · · Score: 3, Informative

    Hadoop: massive data storage system framework... "Apache Hadoop is an open-source software framework that supports data-intensive distributed applications"
    MapReduce: a way of managing distributed clusters of data sets... "MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers"

    Scheduling framework: a framework for providing optimal scheduling of something such that events are handled in an optimal manner.

    Or, to put it another way:
    http://lmgtfy.com/?q=hadoop
    http://lmgtfy.com/?q=mapreduce
    http://lmgtfy.com/?q=scheduling+framework

  8. Re:Junk. by martin-boundary · · Score: 2

    When are people going to realize that FB's just another internet company with a reasonably successful business model, and worthy of neither adulation nor hatred?

    Wrong. FB is worthy of hatred because what they do is inherently evil. They spy on people, and sell off that information.

    The "it's just a job/business" excuse doesn't work when the job/business is evil. For example, when the local Mafia goons come to collect protection money, it's "just a job" for them right? Nothing personal. They're just regular people who are trying to make ends meet, like eveybody else. Don't hate them. Wrong, it's evil, and the goons display a singularly bad sense of judgement in accepting to do this kind of work.

    Similarly, spies are evil, James Bond notwithstanding. They steal secrets, and betray people in the process. And FB are a spying organization. They treat users' rights as a joke, and due to their size and ubiquity, are substantially responsible for the state of privacy on the internet today.