MapReduce For the Masses With Common Crawl Data
New submitter happyscientist writes "This is a nice 'Hello World' for using Hadoop MapReduce on Common Crawl data. I was interested when Common Crawl announced themselves a few weeks ago, but I was hesitant to dive in. This is a good video/example that makes it clear how easy it is to start playing with the crawl data."
Then you probably want to use it with some local data so you don't rack up huge bill. One Hadoop job on the whole dataset costs at least like $200, and that's for simple stuff.