Facebook's Corona: When Hadoop MapReduce Wasn't Enough
Nerval's Lobster writes "Facebook's engineers face a considerable challenge when it comes to managing the tidal wave of data flowing through the company's infrastructure. Its data warehouse, which handles over half a petabyte of information each day, has expanded some 2500x in the past four years — and that growth isn't going to end anytime soon. Until early 2011, those engineers relied on a MapReduce implementation from Apache Hadoop as the foundation of Facebook's data infrastructure. Still, despite Hadoop MapReduce's ability to handle large datasets, Facebook's scheduling framework (in which a large number of task trackers that handle duties assigned by a job tracker) began to reach its limits. So Facebook's engineers went to the whiteboard and designed a new scheduling framework named Corona."
Facebook is continuing development on Corona, but they've also open-sourced the version they currently use.
What do you mean "Hadoop MapReduce isn't enough"? It's the same fucking framework with a better scheduler.
"Its data warehouse, which handles over half a petabyte of information each day, has expanded some 2500x in the past four years"
Too bad that's 99.9% junk I don't care about.
With lime. Had to do it.
In layman's terms:
What is Hadoop?
What is MapReduce?
From the article, I derive that it is a scheduling framework. What the hell is a scheduling framework?
after paging through the code a bit, i found it interesting that they use java in their implementation (not just corona, but hadoop as well). i was wondering why, and after some googling found this link which helped explain the situation a bit clearer.
pretty interesting stuff. but id be willing to bet googles map reduce is written in c/c++
I have to admit, while I hate using Facebook, and hate most of their business practices, I like how they're not just writing new infrastructure software, but are open-sourcing it all. I don't think it quite makes up for everything else, but it helps.
They went from a couple GB to 5 TB. Impressive!!!
Half a PB of data flows through the infrastructure, but how much of that actually has to be stored? That's the real question.
9 Coronas
Have been code-named corona these last few years?? Seems like every org's got a project named corona nowadays.
Mmm. Snarky sarcasm, industry lingo, and obtuse responses. Thanks ofr nothing.
So it's some sort of file system or database? Or is it simply yet another programming language abstraction layer upon other programming language abstraction layers?
Then Facebook's map reduce is what, a programming method, as described above? It's made out by the article to sound like something more tangible than a methodology. Even reading the Wikipedia article doesn't make anything clear in layman's terms.
Thanks again.
Hard to believe how much technology goes into such a shitty website. Even right now over 90% of profile images aren't loading for me.
They could start by actually deleting deleted content. Seems simple to me. Lets hope their shortsightedness continues when everyone jumps ship for the next social fad, and continuing this rat race becomes far to costly.
So, my inability to derive that a massive data storage system framework is a "Database management architecture" in this case is a demonstration of my inability to comprehend technical stuff.
Sort of like your inability to understand that Let Me Google That For You with obtuse results is pure snark when someone asks for a layman's description. You didn't have to respond if you thought that the response was pointless or that you lacked sufficient understanding to explain it in a coherent fashion. Yet, you felt compelled to provide a smart-ass(not smart) response.
So, by your example, your inability to understand the technicalities of language and your absence of social skills makes it nearly impossible for you to understand the textbook definition of snarky. Allow me to try to explain it to you, by example.
http://www.lmgtfy.com/?q=define+snarky