Hadoop 1.0 Released
darthcamaro writes "There has been a tonne of hype about Big Data and specifically Hadoop in recent years. But until today, Hadoop was not a 1.0 release product. Does it matter? Not really, but it's still a big milestone. The new release includes a new web interface for the Hadoop filesystem, security, and Hbase database support. '"At this point we figured that as a community we can support this release and be compatible for the foreseeable future. That makes this release an ideal candidate to be called 1.0," Arun C. Murthy, vice president of Apache Hadoop, said.'"
From Wikipedia:
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license.[1] It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.
Hadoop is a top-level Apache project being built and used by a global community of contributors,[2] written in the Java programming language. Yahoo! has been the largest contributor[3] to the project, and uses Hadoop extensively across its businesses.[4]
Hadoop was created by Doug Cutting,[5] who named it after his son's toy elephant.[6] It was originally developed to support distribution for the Nutch search engine project.[7]
...in case you're as ignorant as I am. Post anonymously to avoid karma whoring.
Um, what the heck is Hadoop? A filesystem? Linux distro? Database software? Something to do with web servers? Throw me a bone here, man. Why does this 'Big Data' need capitalization?
And most importantly, why did they go with the British spelling for 'tonne'? Is this a product of the UK?
What, read the article? Are you mad?
It was actually released over a week ago, but I guess the announcement got lost over the holidays. I am actually a bit surprised they did a 1.0 version before solving the "NameNode is a single point of failure" problem with HDFS. I know for a fact that big companies (one of which was a client) are sometimes hesitant to deploy Hadoop because of this.
In theory, you can also use Hadoop with purportedly more robust distribute file systems, like KFS (Kosmos File System, I think it's called). I've never seen this in the wild though.
Now it's released 1.0. it can increase Mozilla style.
I am a little ignorant to it because reading a PDF I downloaded is a world away from experience however the chief systems architect is (at least) conceptually enthralled, especially with how easily load can be distributed (I am not keen on replication and use it sparingly because setting the rules for merge replication a real minefield in a fast evolving environment), I personally think the strict adherence to structure that our regular database enforces has been nothing but a god send when a buggy release of software has interacted with the database. I also think that in this instance we shouldn't set any trends and let others iron out the bugs because as a company data is everything to us, so unusually I am the conservative one. I'll need convincing long before I can convince myself.
Off-topic, but this reminds me of something i found out yesterday: "HABOOBS" is a playable word in Words With Friends. Seriously, try it.
"There has been a tonne of hype about Big Data and specifically Hadoop in recent years. But until today, Hadoop was not a 1.0 release product. Does it matter? Not really
Wasn't there a slogan about "news for nerds, stuff that matters" around here somewhere?
Seems a fair number of you are unaware of what Hadoop is.
Hadoop is a platform that enables distributed computing. Specifically, it supports map/reduce programming in a manner similar to Google's App Engine, except that it is open source. It supports distributing data for redundancy and/or scalability (in other words, you can have multiple copies of each data item on multiple computers, or you can split a data set across multiple computers, or both, with the data set sharded across multiple machines but with copies of each shard on multiple machines).
There is a distributed filesystem built on top of hadoop called HDFS. There is a distributed key/value store (somewhat analogous to a database...actually, scratch that, it's a distributed hash map) called HBase. There are also a number of distributed computing libraries built on top of Hadoop, like Mahout (for machine learning), Hive (for ad-hoc querying of large data sets), and Pig (another distributed computing model that some consider to be easier than map/reduce).
The whole setup provides a distributed computing model similar to Google's distributed environment, supporting very large clusters, map/reduce programming, and distributed storage of very large and/or spare matrices and tables.
Hadoop is solid, has a lot of features, big companies use it. It seems great... And yet I prefer GridGain due to its simplicity, ease of use and development speed. It's like EJB vs service coded in PHP - for most pages/services EJB is just overkill.
ZooKeeper is a subproject of Hadoop ( and BookKeeper a sub-subproject, so to say ). I have been using both for a while now, and must say I am astonished about their resilience. Great products. Moreover, ZooKeeper is a valiant attempt at solving one of computer science's oldest standing problems: leader election in a ring. Hooray Hadoop, keep the good work going !
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
it runs on GNU HURD as well.
Amazing that nobody mentioned Cascalog https://github.com/nathanmarz/cascalog
Why not JHadoop? It's good to know it's another made up word for java stuff.