Slashdot Mirror


The Joys and Hype of Hadoop

theodp writes "Investors have poured over $2 billion into businesses built on Hadoop," writes the WSJ's Elizabeth Dwoskin, "including Hortonworks Inc., which went public last week, its rivals Cloudera Inc. and MapR Technologies, and a growing list of tiny startups. Yet companies that have tried to use Hadoop have met with frustration." Dwoskin adds that Hadoop vendors are responding with improvements and additions, but for now, "It can take a lot of work to combine data stored in legacy repositories with the data that's stored in Hadoop. And while Hadoop can be much faster than traditional databases for some purposes, it often isn't fast enough to respond to queries immediately or to work on incoming information in real time. Satisfying requirements for data security and governance also poses a challenge."

5 of 55 comments (clear)

  1. Well No Shi... by bigdady92 · · Score: 5, Informative

    Hadoop is not a magic thing that can all of a sudden produce reams of new data sets. The setup, on an enterprise scale, takes thousands or tens of thousands of dollars in hardware. Then you have the Map/Reduce jobs to create as well as pointing all your data to the new clusters. Then the tweaking starts, and then your pointy haired Boss or Accounting PencilTwit comes to you and demands results for all of this capital expense you just had them buy for some pinhead to get a better dashboard in sales.

    Hadoop, done right, takes many departments to work on and organize in a big enterprise. Small shops may have one guy who is both SA and Programmer who could get the job done enough to make a difference. Furthermore, you NEED a full install from a big vendor. Installing Hadoop from OpenSource is a nightmare, and the big vendor's make it painfully simple to get the job done quickly. Can you do it by hand? Sure. Do you have the time? Not when you have other projects to work on and you can spend the companies capital to get the install and config done in 1/10th the time. /Cloudera Certified //A year later and they still don't know how to get data through the pipeline ///Setting up the hardware was a BLAST!

    --
    Wheel of Time: Book by Book and Sumview (summary review) Bigdady92 style: http://bigdady92.blogspot.com/
  2. Apache Spark > Hadoop by Code+Herder · · Score: 5, Informative

    I used to be a big fan of Hadoop until I gave Apache Spark a try. My god, the speed, ease of use and install simplicity was just ridiculous. I mean, words failed me the first time I used it, I got it installed and working under 2 hours and it was so blazing fast, it was just a joke.

    For people who took a look a few years back, it has matured a lot from an interesting prototype to something I now use in production on my clients data. Documentation is still a bit sketchy for niche functions but it's improved a lot also.

    https://spark.apache.org/

  3. More accurate by BitZtream · · Score: 5, Informative

    Hadoop isn't a database.

    It's a data processing system for massive quantities of data processed and distilled in large batches. If you're trying to treat it as a database, you're doing it wrong. The article is simple using Hadoop for the wrong purpose.

    You use Hadoop to reduce large amounts of data into smaller more manageable collections of useful data, which can then be queried real time.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  4. Paywalled by LordLimecat · · Score: 3, Informative

    Since when is it acceptable to post articles that are paywalled?

    We're not even going to pretend to care about the article?

  5. Free mirror on nasdaq.com by michaelmalak · · Score: 4, Informative

    Free nasdaq.com mirror of this particular article.