Facebook's Prism, Soon To Be Open Sourced, Gives Hadoop Delay Tolerance
snydeq writes "Facebook has said that it will soon open source Prism, an internal project that supports geographically distributed Hadoop data stores, thereby removing the limits on Hadoop's capacity to crunch data. 'The problem is that Hadoop must confine data to one physical data center location. Although Hadoop is a batch processing system, it's tightly coupled, and it will not tolerate more than a few milliseconds delay among servers in a Hadoop cluster. With Prism, a logical abstraction layer is added so that a Hadoop cluster can run across multiple data centers, effectively removing limits on capacity.'"
...but when?
Love to see useful stuff open sourced, but part of me is annoyed it is Facebook doing it.
"If any question why we died, Tell them because our fathers lied."
You typically have O(ms) seek latency for hard drives, does this mean that Facebook had all data in RAM before they made Prism?
I misread the title, was disappointed until I saw the word Hadoop. It's such a a silly name.
I'm a fruit pirate. I bought a watermelon once, and spat the seeds in the back yard. They grew into another watermelon,
What is the sub-problem when running a Hadoop job that has this bottleneck and requires such low latency? Is it something that could have been avoided for a start?
And how does (or if, predictably, the media reports don't explain it, *would*) a logical abstraction layer solve this problem such that Hadoop's programmers couldn't have more easily done it within the application's own code?
Information theory is life. The rest is just the KL divergence.