Apache Hadoop Has Failed Us, Tech Experts Say (datanami.com)
It was the first widely-adopted open source distributed computing platform. But some geeks running it are telling Datanami that Hadoop "is great if you're a data scientist who knows how to code in MapReduce or Pig...but as you go higher up the stack, the abstraction layers have mostly failed to deliver on the promise of enabling business analysts to get at the data." Slashdot reader atcclears shares their report:
"I can't find a happy Hadoop customer. It's sort of as simple as that," says Bob Muglia, CEO of Snowflake Computing, which develops and runs a cloud-based relational data warehouse offering. "It's very clear to me, technologically, that it's not the technology base the world will be built on going forward"... [T]hanks to better mousetraps like S3 (for storage) and Spark (for processing), Hadoop will be relegated to niche and legacy statuses going forward, Muglia says. "The number of customers who have actually successfully tamed Hadoop is probably less than 20 and it might be less than 10..."
One of the companies that supposedly tamed Hadoop is Facebook...but according to Bobby Johnson, who helped run Facebook's Hadoop cluster before co-founding behavioral analytics company Interana, the fact that Hadoop is still around is a "historical glitch. That may be a little strong," Johnson says. "But there's a bunch of things that people have been trying to do with it for a long time that it's just not well suited for." Hadoop's strengths lie in serving as a cheap storage repository and for processing ETL batch workloads, Johnson says. But it's ill-suited for running interactive, user-facing applications... "After years of banging our heads against it at Facebook, it was never great at it," he says. "It's really hard to dig into and actually get real answers from... You really have to understand how this thing works to get what you want."
Johnson recommends Apache Kafka instead for big data applications, arguing "there's a pipe of data and anything that wants to do something useful with it can tap into that thing. That feels like a better unifying principal..." And the creator of Kafka -- who ran Hadoop clusters at LinkedIn -- calls Hadoop "just a very complicated stack to build on."
One of the companies that supposedly tamed Hadoop is Facebook...but according to Bobby Johnson, who helped run Facebook's Hadoop cluster before co-founding behavioral analytics company Interana, the fact that Hadoop is still around is a "historical glitch. That may be a little strong," Johnson says. "But there's a bunch of things that people have been trying to do with it for a long time that it's just not well suited for." Hadoop's strengths lie in serving as a cheap storage repository and for processing ETL batch workloads, Johnson says. But it's ill-suited for running interactive, user-facing applications... "After years of banging our heads against it at Facebook, it was never great at it," he says. "It's really hard to dig into and actually get real answers from... You really have to understand how this thing works to get what you want."
Johnson recommends Apache Kafka instead for big data applications, arguing "there's a pipe of data and anything that wants to do something useful with it can tap into that thing. That feels like a better unifying principal..." And the creator of Kafka -- who ran Hadoop clusters at LinkedIn -- calls Hadoop "just a very complicated stack to build on."
Indeed. I went though their "interview-process" a while back at the request of a friend that was there and desperately wanted me for his team. Interestingly, I failed to get hired, and I think it is because I knew a lot more about the questions they asked than the people that created (and asked) these questions. For example, on (non-cryptographic) hash-functions my answer was to not do them yourself, because they would always be pretty bad, and to instead use the ones by Bob Jenkins, or if things are slow because there is a disk-access in there to use a crypto hash. While that is what you do in reality if you have more than small tables, that was apparently very much not what they wanted to hear. They apparently wanted me to start to mess around with the usual things you find in algorithm books. Turns out, I did way back, but when I put 100 Million IP addresses into such a table, it performed abysmally bad. My take-away is that Google prefers to hire highly intelligent, but semi-smart people with semi-knowledge about things and little experience and that experienced and smart people fail their interviews unless they prepare for giving dumber answers than they can give. I will never do that.
On the plus side, my current job is way more interesting than anything Google would have offered me.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
That's because the mediocre programmers are the ones giving the interviews. Close friend interviewed last year only to sit in front of a bunch of know it all elitists. One douche rambled on about how he wishes there were monads in c++ and how great functional design is. Now my friend and his roommate are CS geeks and their spare time is doing shit like build a lisp interpreter in c++ just for fun. So he asked mr monad if the project used a functional approach which was a solid no. Idiot just wanted to show off the fact he knew what functional programming is and wasted time. He passed on the Google job for a big local company doing back end dev work. Job pays as good as Google without the pompous know nothing's with the ability to remotely work. Fuck working for Google.
For example, on (non-cryptographic) hash-functions my answer was to not do them yourself, because they would always be pretty bad, and to instead use the ones by Bob Jenkins, or if things are slow because there is a disk-access in there to use a crypto hash. While that is what you do in reality if you have more than small tables, that was apparently very much not what they wanted to hear. They apparently wanted me to start to mess around with the usual things you find in algorithm books.
No offense, but "I'd rather just use a library" seriously brings into question what you bring to the table and whether you'll just be searching experts-exchange for smart stuff other people have done..Like everybody knows you shouldn't use homegrown cryptographic algorithms, but if a cryptologist can't tell me what an S-box is and points me to using a library instead it doesn't really tell me anything about his skill, except he didn't want to answer the question. In fact, dodging the question like that would be a pretty big red flag.
Don't get me wrong, you can get there. But start off with roughly what you'd do if you had to implement it from scratch, what's difficult to get right, then suggest implementations you know or alternative ways to solve it. Because they're not that stupid that they think this is some novel issue nobody's ever looked at before or found decent answers to. They want to test if you have the intellect, knowledge and creativity to sketch a solution yourself. Once you've done that, then you can tell them why it's probably not a good idea to reinvent the wheel.
Live today, because you never know what tomorrow brings
No offense, but you miss the point entirely. What I answered is very far from "use a library". First, it is an algorithm, not a library. That difference is very important. Second, it is a carefully selected algorithm that performs much better than what you commonly find in "libraries" in almost all situations. And third, the hash-functions by Bob Jenkins (and the newer ones bu DJB, for example) are inspired by crypto, but much faster in exchange for reduced security assurances. In fact so fast that they can compete directly with the far worse things commonly in use. "Do not roll your own crypto" _does_ apply_ though.
So while I think you meant to be patronizing, you just come across as incompetent. A bit like the folks at Google, come to think of it...
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.