Slashdot Mirror


MapReduce For the Masses With Common Crawl Data

New submitter happyscientist writes "This is a nice 'Hello World' for using Hadoop MapReduce on Common Crawl data. I was interested when Common Crawl announced themselves a few weeks ago, but I was hesitant to dive in. This is a good video/example that makes it clear how easy it is to start playing with the crawl data."

1 of 29 comments (clear)

  1. Re:Thanks for posting this.. by InsightIn140Bytes · · Score: 5, Informative

    Then you probably want to use it with some local data so you don't rack up huge bill. One Hadoop job on the whole dataset costs at least like $200, and that's for simple stuff.