Slashdot Mirror


Yahoo Releases Open Source Hadoop Distribution

ruphus13 writes "Yahoo has been a vociferous Apache Hadoop user and supporter for several years now, and uses it extensively within its Search technologies. Hadoop has been gaining popularity in the Cloud Computing space, with companies like the NYTimes converting 4TB and 11 million articles to PDFs in under 24 hours using Hadoop and EC2 in late 2007. Hadoop has been made available in Amazon's cloud and Yahoo has now released its own Hadoop version. From the article: 'At today's Hadoop Summit in Silicon Valley, Yahoo! announced the availability of the Yahoo! Distribution of Hadoop, a source-only version of Apache Hadoop that Yahoo! uses within its own search engine. [Hadoop] is an open source software framework that helps process very large data sets, and is widely used in large-scale data mining applications as well as in search tools at sites like Facebook and many others. For developers and users interested in Hadoop, it's worth noting that the Yahoo! Distribution of Hadoop has been widely tested and developed at Yahoo! for years now.'"

7 of 49 comments (clear)

  1. Timely article by C18H27NO3+ · · Score: 3, Informative

    Perhaps the Ask Slashdot inquirer in this thread will find this news usefull.

  2. Hadoop is awesome by fancellu · · Score: 5, Informative

    Not only is it used by Yahooo, but also by Facebook, who get 15TB of new data a day to handle. Checkout the very useful free vids from Cloudera. http://www.cloudera.com/hadoop-training-thinking-at-scale You can download a canned VM preloaded with Hadoop/Pig/Hive goodness, even a copy of Eclipse preconfigured. http://www.cloudera.com/hadoop-training-virtual-machine

    1. Re:Hadoop is awesome by DamnStupidElf · · Score: 2, Informative

      "Hive/Hadoop cluster at Facebook stores more than 2PB of uncompressed data and routinely loads 15 TB of data daily."

  3. Re:Hadoop? by fancellu · · Score: 2, Informative

    Its the name of the main developer's kid's toy elephant.

  4. Re:Why is this a big deal? by shadow42 · · Score: 3, Informative

    As far as I can tell, the distribution Yahoo is offering is just the vanilla Hadoop, but with Yahoo's patches on top of it. Yahoo is very involved in Hadoop's development (the project's founder is now employed by them), so a lot of their patches get incorporated back into Hadoop's source tree. Most of the changes Yahoo made are just performance/stability patches that haven't been incorporated into an official release yet. You could probably get the same distribution just by grabbing SVN trunk.

  5. Re:Obligatory Java is Slow Comment by dintech · · Score: 2, Informative

    Java has it's faults but being slow is no longer one of them. You should do some googling.

  6. Re:open source software by blueskies · · Score: 2, Informative

    Fail!

    ISO 32000-1:2008: PDF

    Because stitching together numerous TIFF files on your own is so much better!