Is Big Data Leaving Hadoop Behind?
knightsirius writes: Big Data was seen as one the next big drivers of computing economy, and Hadoop was seen as a key component of the plans. However, Hadoop has had a less than stellar six months, beginning with the lackluster Hortonworks IPO last December and the security concerns raised by some analysts.. Another survey records only a quarter of big data decision makers actively considering Hadoop. With rival Apache Spark on the rise, is Hadoop being bypassed in big data solutions?
They need to refer the the pieces of hadoop. HDFS is the storage piece and many things can interface to it, it isn't great but is often good enough especially if you just have a couple local disks per node. YARN is the scheduler piece, it is mostly awful performance-wise but is fairly easy to use...long run it'll lose to something like mesos I think.
That's a good call. With Cloudera and HortonWorks both adding new components to the Hadoop stack it has exploded in the number of components in the last a year or two, and that can be a bad thing. The complexity of the whole ecosystem is getting horrendous, with a typical configuration file doubling from 250 or so to 500 configuration items, which are almost all undocumented (unless you read the code - which scarcely qualifies as "documented") in the last year. For a practical deployment you are pretty much forced to use a commercial stack to get something up and running in a manageable fashion. And then there is the fact that the HDFS foundation is showing its age.
MR is the map reduce piece that everyone thinks of when you say hadoop. Almost everything will run quicker in spark(still using a map/reduce methodology) than hadoop MR.
Spark on Mesos is looking mighty awesome.
As a side note, I don't know anyone who still writes MR jobs directly, they are all doing pig or hiveql.
MapReduce is still viable for stable production jobs, but not in a dynamic requirements environment.
Although HiveQL is alive and kicking, the complete replacement of Hive Server with Hive Server 2, while possibly an improvement in usability overall (I am not convinced), it trashes your skill investment in the (now) obsolete Hive stack component. Maybe I am just grousing, but I start having reservations about technology planning in the data center when a key stack component changes so much it a relatively short period of time
Starships were meant to fly, Hands up and touch the sky - Nicky Minaj
Did I trip into a time warp and come out a decade in the past?
Who the fuck is actually talking about hadoop or map reduce in 2015? The same retards that were creaming their little cunts about it in 2005?
Even when you ignore the joke that is Java, hadoop is unwieldy, unreliable shit if you actually care about storing and retrieving correct, synchronized data.
If you're fine with throwing all of your data in a pot and getting some sort of result that looks mostly correct, then knock yourself out and use hadoop.
If your data needs to be correct, define it and its relationships then use SQL. You will have to pay someone decent money to do this correctly.
I would strongly disagree. In 1995 relational theory and practice was well understood by a large set of developers and had stable, well documented implementations. Raw Hadoop and the associated computational model is not at that level of stability, documentation and usability. In addition the relational model applies to many business problems, large and small. Hadoop is generally applicable and cost efficient only for larger, more complex problems.