First of all, this article isn't a comparison or matchup - it's just a speculative post by someone who has done very little research and obviously lacks domain knowledge in the space. There is no mention of use cases, data sizes, performance, costs.
Hadoop is an open-source framework for distributed data processing, specifically an implementation of the MapReduce framework. BigQuery is a hosted service that allows you to run queries over massive datasets via an API. There are tools built on top of Hadoop that allow for fast querying over large datasets (Impala), and there are even tools that are not Hadoop based that provide this as well (Spark + Shark). However, actually using these tools is a whole different game - the author makes so mention of how many nodes/VM are required to compare the query performance of BigQuery.
Then there's data sizes. The author makes a strange claim that BigQuery "queries don’t run instantly; one of the samples took 3.3 seconds to grind through 3.49 Gigabytes of data. But that’s clearly fine for quick lookups." Huhn? What tool(s) are you comparing against? BigQuery allows users to run full table aggregate ad-hoc queries over really really big datasets (i.e. terabytes). In public talks, Google has demonstrated that it is possible to run regular expression match queries, with sums and aggregations, over several terabytes of data in under a minute. In order to do this with a MapReduce-based system, what needs to be done - perhaps use something like Hive, or write a custom MapReduce function - and what is the performance in this case? For the same use case, what is the cost of using some of the "OLAP" tools that the author describes? Would love to see some benchmarks.
Re: "In the end, BigQuery is just another database."
Huhn? BigQuery is not a database at all - it doesn't support CRUD operations on data - rather it is an append-only analytics tool. And conversely, databases, relational or not, aren't really the right tools for full table scan ad-hoc queries over many terabytes, which is what BigQuery is designed to do. BigQuery is a developer's product, and one that can be integrated with existing web apps via RESTful API. Hadoop has it's own development role and story (and tools like Cascading are really great) but it's not designed as the backend for interaction via a RESTful API out of the box - it takes a bit more work to provide Hadoop as a service for developers to integrate with an application.
Re: "The public version of BigQuery probably isn't even used by Google, which likely has something bigger and better that we'll see in five years or so."
BigQuery is based on Google's internal Dremel, which is used everyday by Google. There is a very public research paper describing Dremel (much the same as how Google described MapReduce years ago). Read about what is available in Dremel versus what is available in BigQuery: http://research.google.com/pubs/pub36632.html
The symptoms here are caused by things like inadequate access to clean water and lack of means to purchase food that gives proper nutrition. Hmmm, why don't people who live in places with clean water supplies (i.e., Northern California, Most of Western Europe) eat bioengineered rice? Could we solve the actual problem (working on issues of economic equality and proper utilities and civil infastructures) instead of feeding poor people? A very bigoted solution to our global problems.
Pelease from MIT... The $100 iPod project will let every child in every developing country download Sheryl Crow's new single to their own U2-branded iPod.
Years later Magnus reemerged as Magneto, who was determined to conquer the human race to prevent their oppression of mutants. Xavier's original team of X-Men thwarted his first public move in his war with humanity, the takeover of the Cape Citadel missile base. When they next clashed, Magneto was leading his original Brotherhood of Evil Mutants, which included Wanda and Pietro, now known as the Scarlet Witch and Quicksilver.
Not until years later would Magneto learn that they were actually his children. At one point Magneto genetically engineered a being called Alpha the Ultimate Mutant, who rebelled against him and turned Magneto back into an infant. Magneto thereafter had a series of battles with Xavier's new team of X-Men.
When scientists figure out how this process works, we should start a fund to genetically enhance the memories of the Slashdot editors, in order to prevent DUPES
Speaking of portable multimedia, I was lucky enough to get a Nintendo DS for Xmas! I hear that there will be a mp3/mp4 player for this great system, but I hope to actually be able to run linux (or maybe an alternative OS like kontiki) on the DS and play theora movies and vorbis audio... somehow.
Has anyone heard of any new progress on the Linux/Nintendo DS front? I heard that there was a $1,000 bounty for getting linux to load up on the DS, but that site is now down! I think it uses a dual ARM7/9 processor setup, so it might not be impossible!
True, I should have mentioned this, the program (was it called yaboot?) is kinda strange on the old world macs. You do have to have some os9 installed to actually boot 2 linux, it is not perfect... but luckily I don't have to reboot often!
I have also witnessed YDL 3 turn throw away g3 macs into stable, useful desktop systems, running firefox, snappy word processors like ABIWORD, and things like XMMS and Mplayer for multimedia.
I have "brought back to life" a fairly useless 6100 series PowerPC via Yellow Dog. I use it at work as an "everything" server (I know you have a machine like this too!): file server, internal webserver, mailing list server, and probably a dozen other things as I need them. Basically, its performance has been excellent, and it has been running for months at a time without any problems.
What surprised me was how solid the old powerPC macs were in terms of hardware. The old Apple os9 crashed so much, I could not beleive it was ALL software. I thought, it must be poorly written OS code plus some sloppy RAM/processor/Drive bus engineering! But lo and behold, with YDLinux on the machine, it is as stable as granite.
Yeah, I have been playing with a vBlog (video blog) here: m3blog.com, and my original idea was to quickly post unedited video quickly.
However, I quickly found out that is was more fun to do a little editing, as people weren't watching my raw posts, they quickly grew bored! And it wasn't very hard to do little quick edits, especially time-shifting, to make events seem like they took place before or after other a certain point.
I realize that good information access can really help people make advances in their lives, but really, economic inequality is a much bigger problem that the Digital Divide.
Perhaps it is more important to defeat one-sided trade agreements such as the FTAA and the WTO agreement on agriculture, which puts economic power in the hands of the industrialized north.
If more people had access to fair wages, self-sufficient farming methods and nutritious food, people wouldn't need to work so hard at creating Microsoft-funded landfill-bound consumer boxes...!
Sounds like you have some kind of sexual insecurity.
I enjoyed reading about Turing's sexuality in Crypto*, what a shame that in real life he was hurt professionaly by his sexual orientation
Gay rights are an important issue, don't pretend like it's not. Are you also of the opinion that there is no race problem in the world? SHould we not talk about it? I say, talk about it as much as possible.
Maybe you should take some social science classes, put down the sci-fi and go outside.
What are some of the cost/performance metrics of Splunk when data gets large (common for game developers).
How does Splunk do on data sizes in the 500 Gig range? And how much does it cost?
First of all, this article isn't a comparison or matchup - it's just a speculative post by someone who has done very little research and obviously lacks domain knowledge in the space. There is no mention of use cases, data sizes, performance, costs.
Hadoop is an open-source framework for distributed data processing, specifically an implementation of the MapReduce framework. BigQuery is a hosted service that allows you to run queries over massive datasets via an API. There are tools built on top of Hadoop that allow for fast querying over large datasets (Impala), and there are even tools that are not Hadoop based that provide this as well (Spark + Shark). However, actually using these tools is a whole different game - the author makes so mention of how many nodes/VM are required to compare the query performance of BigQuery.
Then there's data sizes. The author makes a strange claim that BigQuery "queries don’t run instantly; one of the samples took 3.3 seconds to grind through 3.49 Gigabytes of data. But that’s clearly fine for quick lookups." Huhn? What tool(s) are you comparing against? BigQuery allows users to run full table aggregate ad-hoc queries over really really big datasets (i.e. terabytes). In public talks, Google has demonstrated that it is possible to run regular expression match queries, with sums and aggregations, over several terabytes of data in under a minute. In order to do this with a MapReduce-based system, what needs to be done - perhaps use something like Hive, or write a custom MapReduce function - and what is the performance in this case? For the same use case, what is the cost of using some of the "OLAP" tools that the author describes? Would love to see some benchmarks.
Re: "In the end, BigQuery is just another database."
Huhn? BigQuery is not a database at all - it doesn't support CRUD operations on data - rather it is an append-only analytics tool. And conversely, databases, relational or not, aren't really the right tools for full table scan ad-hoc queries over many terabytes, which is what BigQuery is designed to do. BigQuery is a developer's product, and one that can be integrated with existing web apps via RESTful API. Hadoop has it's own development role and story (and tools like Cascading are really great) but it's not designed as the backend for interaction via a RESTful API out of the box - it takes a bit more work to provide Hadoop as a service for developers to integrate with an application.
Re: "The public version of BigQuery probably isn't even used by Google, which likely has something bigger and better that we'll see in five years or so."
BigQuery is based on Google's internal Dremel, which is used everyday by Google. There is a very public research paper describing Dremel (much the same as how Google described MapReduce years ago). Read about what is available in Dremel versus what is available in BigQuery: http://research.google.com/pubs/pub36632.html
Oops, I meant to say "feeding poor people bio engineered rice?" Wow, it's late I should sleep.
The symptoms here are caused by things like inadequate access to clean water and lack of means to purchase food that gives proper nutrition. Hmmm, why don't people who live in places with clean water supplies (i.e., Northern California, Most of Western Europe) eat bioengineered rice? Could we solve the actual problem (working on issues of economic equality and proper utilities and civil infastructures) instead of feeding poor people? A very bigoted solution to our global problems.
Pelease from MIT... The $100 iPod project will let every child in every developing country download Sheryl Crow's new single to their own U2-branded iPod.
More information on Magneto:
Years later Magnus reemerged as Magneto, who was determined to conquer the human race to prevent their oppression of mutants. Xavier's original team of X-Men thwarted his first public move in his war with humanity, the takeover of the Cape Citadel missile base. When they next clashed, Magneto was leading his original Brotherhood of Evil Mutants, which included Wanda and Pietro, now known as the Scarlet Witch and Quicksilver.
Not until years later would Magneto learn that they were actually his children. At one point Magneto genetically engineered a being called Alpha the Ultimate Mutant, who rebelled against him and turned Magneto back into an infant. Magneto thereafter had a series of battles with Xavier's new team of X-Men.
Service Unavailable
When scientists figure out how this process works, we should start a fund to genetically enhance the memories of the Slashdot editors, in order to prevent DUPES
Speaking of portable multimedia, I was lucky enough to get a Nintendo DS for Xmas! I hear that there will be a mp3/mp4 player for this great system, but I hope to actually be able to run linux (or maybe an alternative OS like kontiki) on the DS and play theora movies and vorbis audio... somehow.
Has anyone heard of any new progress on the Linux/Nintendo DS front? I heard that there was a $1,000 bounty for getting linux to load up on the DS, but that site is now down! I think it uses a dual ARM7/9 processor setup, so it might not be impossible!
Thanks!
True, I should have mentioned this, the program (was it called yaboot?) is kinda strange on the old world macs. You do have to have some os9 installed to actually boot 2 linux, it is not perfect... but luckily I don't have to reboot often!
Yes, actually, this is a great niche for YDL.
I have also witnessed YDL 3 turn throw away g3 macs into stable, useful desktop systems, running firefox, snappy word processors like ABIWORD, and things like XMMS and Mplayer for multimedia.
I have "brought back to life" a fairly useless 6100 series PowerPC via Yellow Dog. I use it at work as an "everything" server (I know you have a machine like this too!): file server, internal webserver, mailing list server, and probably a dozen other things as I need them. Basically, its performance has been excellent, and it has been running for months at a time without any problems.
What surprised me was how solid the old powerPC macs were in terms of hardware. The old Apple os9 crashed so much, I could not beleive it was ALL software. I thought, it must be poorly written OS code plus some sloppy RAM/processor/Drive bus engineering! But lo and behold, with YDLinux on the machine, it is as stable as granite.
Oakland can use, oh, say $20million of that. That's all. Geez.
Oh yeah, and can it stop dirty bombs in suitcases, or monitor Oakland's ports for suitcase nukes? Nope.
Ballistics, while scary, are not our biggest problem.
"A Beowulf Cluster of these!!!"
What? No one has posted that already?
Photo blogging is already Boring...
Check out my Video Blog - WAY more interesting...
http://www.m3blog.com
Now, if only we could get a really good streaming, universal video codec!!! (dirac, perhaps?)
Yeah, I have been playing with a vBlog (video blog) here: m3blog.com, and my original idea was to quickly post unedited video quickly.
However, I quickly found out that is was more fun to do a little editing, as people weren't watching my raw posts, they quickly grew bored! And it wasn't very hard to do little quick edits, especially time-shifting, to make events seem like they took place before or after other a certain point.
After 4 years, my shared libraries have all been linked to...! Finally I can start using it!
Just Kidding...!
I actually love Open Office though. I prefer the Calc spreadsheet to Excel!
I'm with you, but you know, my users a t work will run ANYTHING...
Users can be psychotic sometimes...!
But i was running Lockout, and I couldn't access port 80
I realize that good information access can really help people make advances in their lives, but really, economic inequality is a much bigger problem that the Digital Divide.
Perhaps it is more important to defeat one-sided trade agreements such as the FTAA and the WTO agreement on agriculture, which puts economic power in the hands of the industrialized north.
If more people had access to fair wages, self-sufficient farming methods and nutritious food, people wouldn't need to work so hard at creating Microsoft-funded landfill-bound consumer boxes...!
whom you'll all remember from Star Trek: First Contact and Enterprise's "Broken Bow" episode as Dr. Zefram Cochrane
Uhhh no, i don't recall...
Yeah, i wonder if it will have one of those burgundy phones for when it gets stuck...
First organization to actually help people wins! 300 million for a damn gravity probe goes a long way toward our underfunded public school system
After a base my Slackware Current Install:
(1) FireFox
(2) Mplayer
(3) Xmame
(4) XMMS
(5) Ethereal
(6) Blender
(7) OpenOffice.org
(8) XCDroast
(9) Audacity
(10) THe newest version of GIMP!
Sounds like you have some kind of sexual insecurity.
I enjoyed reading about Turing's sexuality in Crypto*, what a shame that in real life he was hurt professionaly by his sexual orientation
Gay rights are an important issue, don't pretend like it's not. Are you also of the opinion that there is no race problem in the world? SHould we not talk about it? I say, talk about it as much as possible.
Maybe you should take some social science classes, put down the sci-fi and go outside.