Google's BigQuery Vs. Hadoop: a Matchup

← Back to Stories (view on slashdot.org)

Google's BigQuery Vs. Hadoop: a Matchup

Posted by samzenpus on Monday April 15, 2013 @06:29AM from the in-this-corner dept.

Nerval's Lobster writes "Ready to 'Analyze terabytes of data with just a click of a button?' That's the claim Google makes with its BigQuery platform. But is BigQuery really an analytics superstar? It was unveiled in Beta back in 2010, but recently gained some improvements such as the ability to do large joins. In the following piece, Jeff Cogswell compares BigQuery to some other analytics and OLAP tools, and hopefully that'll give some additional context to anyone who's thinking of using BigQuery or a similar platform for data. His conclusion? In the end, BigQuery is just another database. It can handle massive amounts of data, but so can Hadoop. It's not free, but neither is Hadoop once you factor in the cost of the hardware, support, and the paychecks of the people running it. The public version of BigQuery probably isn't even used by Google, which likely has something bigger and better that we'll see in five years or so."

37 comments

Min score:

Reason:

Sort:

Hadoop is much better and stable by stinkyworm · 2013-04-15 06:30 · Score: 1, Interesting

With Google's tendency to randomly quit working on products and techs I would never use them. That is why Hadoop is much better option.
1. Re:Hadoop is much better and stable by Anonymous Coward · 2013-04-15 06:37 · Score: 0
  
  Hah. Not saying this BigQuery thing is any better, but Hadoop could greatly improve their reputation by not emulating PHP in the passing-your-own-unit-tests realm.
2. Re:Hadoop is much better and stable by Anonymous Coward · 2013-04-15 08:00 · Score: 0
  
  yeah, Google has a giant dart board where they randomly decide what to stop working on next. I hear google.com will be shut down next month too! Oh wait, maybe Reader was quit for one of the reasons they ALREADY STATED? Butt hurt much?
  I bet you spell Microsoft with a $ too, and can explain to me why this is also Sonys fault, cuz rootkit.
  *looks at UID*
  oh wait.. nevermind....
3. Re:Hadoop is much better and stable by Anonymous Coward · 2013-04-15 08:19 · Score: 0
  
  Looks at UID... coward
4. Re:Hadoop is much better and stable by Anonymous Coward · 2013-04-15 09:49 · Score: 0
  
  Google decides to stop allowing their customers to use products after an average of 1,459 days:
  http://www.guardian.co.uk/technology/2013/mar/22/google-keep-services-closed
  This means that in less than four years on average we can expect Google to no longer allow us access to our data or do any processing on that data. If you're working with worthless, short-term data then using Google makes sense. If you need a product for more than four years, then history has proven Google to be unreliable.
5. Re:Hadoop is much better and stable by Anonymous Coward · 2013-04-15 09:58 · Score: 0
  
  No, their decision isn't completely random. Your dart board analogy is not correct. There is a random component that they've talked about in the past, but if the metrics that they've made public are correct then the random part is less than 50% of the reason for their decision. Out of the 39 products they've killed so far, there was over three years on average between the announcement and the decision to no longer allow public access to the product. For a big data project that typically has a life span of less than two years, depending on Google's somewhat random and frequent killing of products, it is not a major risk. Only eight of their products screwed over their customers by revoking access in a period of less than two years.
  Again, you're wrong about the dart board part, or were you trying to make a funny analogy about Google's giving up on Dart? It is so dead that they haven't published a roadmap.
6. Re:Hadoop is much better and stable by PylonHead · 2013-04-15 10:07 · Score: 3, Insightful
  
  You understand that that number is flawed, right? He only figures in the average lives of products that Google has killed. It's kind of like looking at all the people who died of heart attacks, finding out they lived to an average of 48 years old, and then telling the general population that, on average, they're going to die of a heart attack when they're 48 years old.
  But please, jump on the anti-google circle jerk. It seems to be the thing to do at the moment.
  
  --
  # (/.);;
  - : float -> float -> float =
7. Re:Hadoop is much better and stable by Dahamma · 2013-04-15 10:39 · Score: 1
  
  Big whoosh on that one! His entire post was obviously sarcasm, and was pointing out the GP post was a typical anti-Google article troll...
8. Re:Hadoop is much better and stable by markhb · 2013-04-16 04:50 · Score: 1
  
  I just turned 48, you insensitive clod!
  
  --
  Save Maine's economy: write stuff down. All comments are exclusively my own, not my employer.
Pathetic summary by geek · 2013-04-15 06:34 · Score: 4, Insightful

"The public version of BigQuery probably isn't even used by Google, which likely has something bigger and better that we'll see in five years or so"
With in-depth analysis like that who needs the full article.........
1. Re:Pathetic summary by Animats · 2013-04-15 07:07 · Score: 2
  
  With in-depth analysis like that who needs the full article.........
  Right. This is just a troll for a short blog posting. There are no benchmarks or examples at all. This looks like "sponsored content".
click of a button? by schlachter · 2013-04-15 06:42 · Score: 1

That's so 2008. Wake me up when I can process terabytes of data with the sound of my voice, the wave of my hand, or the wave pattern of my brain. ;P

--
My God can beat up your God. Just kidding...don't take offense. I know there's no God.
1. Re:click of a button? by gandhi_2 · 2013-04-15 07:22 · Score: 1
  
  meh.
  the triggering of my cron job.
  
  --
  THL phish sticks
or you can by nimbius · 2013-04-15 06:45 · Score: 0, Flamebait

avoid the protracted outages, painful licensing, free access by federal authorities and data mining by a private multinational and just do this instead http://hypertable.org/

--
Good people go to bed earlier.
wrong link by Anonymous Coward · 2013-04-15 06:45 · Score: 0

wrong link in article
test by Anonymous Coward · 2013-04-15 07:01 · Score: 0

test to see how this works.
1. Re:test by Anonymous Coward · 2013-04-15 11:23 · Score: 0
  
  You tested idiot-positive! Congrats!
Splunk by Tuki · 2013-04-15 07:01 · Score: 1

IMHO, Splunk crushes all of these solutions. While it is not "free", it has an incredibly short time to value. The first time I stood it up in production, it took less than two hours - most of our time was spent checking our work. Now, I can quickly analyze large volumes of data, and only have to manage a single software component. I love it!

--
robots obey what the children say - TMBG
1. Re:Splunk by Anonymous Coward · 2013-04-15 07:16 · Score: 0
  
  You are totally clueless.
2. Re:Splunk by mr_don't · 2013-04-15 07:35 · Score: 1
  
  What are some of the cost/performance metrics of Splunk when data gets large (common for game developers).
  How does Splunk do on data sizes in the 500 Gig range? And how much does it cost?
3. Re:Splunk by Tuki · 2013-04-16 01:51 · Score: 1
  
  There are big tech companies processing over 100 TBs a day. There are also game developers like Zynga that use Splunk for "big data" analytics. As far as cost, I am not sure what to say... contact your local sales guy ;). We process 500 GB a day, but incrementally upgraded our license. I'd have to dig up the costs of each...
  
  --
  robots obey what the children say - TMBG
Re:Google is too ubiquitous by Dancindan84 · 2013-04-15 07:09 · Score: 2

What, you think the big telecoms are more benevolent? With the big telecoms they honestly don't care how good their service is most of the time. They'll bill you either way, and your other option(s) are either non-existent or more of the same with a different name.
If Google's model is making me the product, at least they have an investment in keeping their service up. I can't view their ads if I'm offline. Plus competition is going to drive relative costs down and service up. If they can monetize the fact that I did a search trying to figure out what this, "Harlem Shake" thing is on April 3rd, they're welcome to. Anything important I do is encrypted.

--
"Always forgive your enemies; nothing annoys them so much." - Oscar Wilde
Re:Google? Really? by Anonymous Coward · 2013-04-15 07:32 · Score: 0

You, sir, should be commended for having common sense. Sadly, too many others do not and will worship at the alter of the ad company because they are cool.
Article contains plenty of misleading comments by mr_don't · 2013-04-15 07:34 · Score: 5, Insightful

First of all, this article isn't a comparison or matchup - it's just a speculative post by someone who has done very little research and obviously lacks domain knowledge in the space. There is no mention of use cases, data sizes, performance, costs.
Hadoop is an open-source framework for distributed data processing, specifically an implementation of the MapReduce framework. BigQuery is a hosted service that allows you to run queries over massive datasets via an API. There are tools built on top of Hadoop that allow for fast querying over large datasets (Impala), and there are even tools that are not Hadoop based that provide this as well (Spark + Shark). However, actually using these tools is a whole different game - the author makes so mention of how many nodes/VM are required to compare the query performance of BigQuery.
Then there's data sizes. The author makes a strange claim that BigQuery "queries don’t run instantly; one of the samples took 3.3 seconds to grind through 3.49 Gigabytes of data. But that’s clearly fine for quick lookups." Huhn? What tool(s) are you comparing against? BigQuery allows users to run full table aggregate ad-hoc queries over really really big datasets (i.e. terabytes). In public talks, Google has demonstrated that it is possible to run regular expression match queries, with sums and aggregations, over several terabytes of data in under a minute. In order to do this with a MapReduce-based system, what needs to be done - perhaps use something like Hive, or write a custom MapReduce function - and what is the performance in this case? For the same use case, what is the cost of using some of the "OLAP" tools that the author describes? Would love to see some benchmarks.
Re: "In the end, BigQuery is just another database."
Huhn? BigQuery is not a database at all - it doesn't support CRUD operations on data - rather it is an append-only analytics tool. And conversely, databases, relational or not, aren't really the right tools for full table scan ad-hoc queries over many terabytes, which is what BigQuery is designed to do. BigQuery is a developer's product, and one that can be integrated with existing web apps via RESTful API. Hadoop has it's own development role and story (and tools like Cascading are really great) but it's not designed as the backend for interaction via a RESTful API out of the box - it takes a bit more work to provide Hadoop as a service for developers to integrate with an application.
Re: "The public version of BigQuery probably isn't even used by Google, which likely has something bigger and better that we'll see in five years or so."
BigQuery is based on Google's internal Dremel, which is used everyday by Google. There is a very public research paper describing Dremel (much the same as how Google described MapReduce years ago). Read about what is available in Dremel versus what is available in BigQuery: http://research.google.com/pubs/pub36632.html
1. Re:Article contains plenty of misleading comments by gajop · 2013-04-15 08:05 · Score: 1
  
  This comment contains more information than the article.
  Thanks :)
2. Re:Article contains plenty of misleading comments by BlueItalian · 2013-04-16 05:46 · Score: 1
  
  Thanks for this, I was going to post pretty much the same thing. In the same vein of Dremel, my company recently open sourced a similar tool called Druid. If you wan to play with it, it's available on github at https://github.com/metamx/druid
Re:Google is too ubiquitous by Anonymous Coward · 2013-04-15 08:12 · Score: 0

So instead of a dialog, this post got a -1. Great. Slashdot really has drank the Kool-Aid.
Courtesy Devops_Borat by Drunkulus · 2013-04-15 08:55 · Score: 1

Mining of Big Data is problem solve in 2013 with zgrep.
1. Re:Courtesy Devops_Borat by Anonymous Coward · 2013-04-17 09:27 · Score: 0
  
  twat.
What are you even talking about? by sidragon.net · 2013-04-15 10:01 · Score: 2

[I]nstead of a dialog, this post got a -1.
You're talking about politics and conspiracy theories in an article about big data. Yes, that is off topic.

Why does the Internet always have to be about "monetization"? I'd like to see open, standards-compliant offerings that are truly "free" as in freedom and very low cost...
You're living in a dreamland. Like it or not, electricity, hardware, and wires cost money.

I'm hoping Firefox OS proves to be one of these. Let's hope as a non-profit...
FYI, Mozilla Foundation is funded, in large part, by Google.

Look at OpenBSD, for example. Not much better in terms of a secure server environment.
And it has scant adoption. Meanwhile, the rest of us are charging ahead and getting stuff done with steadily advancing tools rather than messing around with arcane operating systems that have 10-year-old feature sets.
1. Re:What are you even talking about? by Anonymous Coward · 2013-04-15 12:10 · Score: 0
  
  I'll step in here...
  OpenBSD does not have 10-year-old feature sets. OpenBSD has pioneered about every advance in security known to modern operating systems.
Re:Google? Really? by Anonymous Coward · 2013-04-15 10:31 · Score: 1

Bad analogy fail. Google doesn't create the ads, they serve them. Calling them an ad agency would be like calling TV networks ad agencies.
Not that it matters. The fact is Google runs the largest "cloud" computing network in the world. Of course, that doesn't necessarily make them the best platform for other businesses. But given Hadoop is based on Google's MapReduce and GFS designs, they clearly have expertise in the field, and to pretend otherwise is a complete and utter troll.
And when you have Petabytes by Anonymous Coward · 2013-04-15 11:08 · Score: 0

You use vertica
1. Re: And when you have Petabytes by Anonymous Coward · 2013-04-15 11:11 · Score: 0
  
  http://willsllc.github.com/blog/how-we-use-vertica-at-gsn/