New Google Tool To Find Trend Correlations
Kilrah_il writes "In 2008 Google found correlation between seasonal flu activity and certain search term, a finding that allowed it to track flu activity better and more rapidly than previous methods. Now, Google is offering a new tool, Google Correlate, that allows researched to do the same for other trends. 'Using Correlate, you can upload your own data series and see a list of search terms whose popularity best corresponds with that real world trend.' Of course, Google reminds us that correlation does not imply causation."
This is a wonderful tool. In the short term, it should allow a lot of people to track interesting trends.
In the long term, though, Heisenberg Rules. If I may paraphrase, "Knowledge of the model, invalidates the model."
Want a real world example today? Stock market. This is why automated make-money tools don't work nearly as well as they should.
Don't take life too seriously; it isn't permanent.
Unfortunately the service appears to be limited to US search data. Hopefully this will be extended in the future.
I'm really starting to like this company. Free web browser, free word processor (and spreadsheet?), free language translation, free nudie pics, free scanned books, free email, free Usenet reader, and now this cool Dataset research tool.
Still not sure I want to store my documents on the internet though. (1) Not secure. (2) Government can review the documents without having to ask a judge for a warrant.) But overall I guess Google is a decent company. Why pay for stuff you can get for free and legal?
My AC stalker: " I personally agree with your posts most of the time, but that won't keep me from modding you troll"
So when do they release the next product: Google Causation?
Just think of all the things I'll be able to prove with this!
From TFA: "like Google Trends but in reverse."
I8-D
This tool finds an association between categorical data, namely a search word and counts for searches of that word. "Correlation" refers to a special type of association, i.e. between two quantitative data, which, correct me if I'm wrong, this tool does not measure. Am I being pedantic here? Or should we take a stand for correct and precise useage of statistical terms?
Oh, yeah, it's not easy to pad these out to 120 characters.
Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.
http://xkcd.com/552/
Correlations are one of those simple statistical terms that lots of non-technical people like to throw around without actually knowing what it means. It's a wonderful tool that Google has provided for everyone but people need to remember what the basic assumptions are of correlations, namely a relatively normal distribution of scores and independence of observations. Independence is especially important if you're tracking search engine results because if you were to look at how many times people Google'd Randy Savage's name the day he died it would influence the subsequent day, ultimately biasing whatever other variable you decided to correlate it with.
Carl Sagan quotes get you an automatic +5 on all posts.
I assume no one bothered to proof read the summery?
"Microsoft" corresponds heavily with "Windows", "software", "updates", and the like, while Apple corresponds with "Apple Store", "large dog", "extra large dog", and ... "muleys"... wtf?
Hmmm. The trends for both Correlation and Causation have a graph similar to the ribosome example given by google. With peaks and troughs of interest that seem to match the semesters of the school year. With less interest in summer, though smaller schools in the southern hemisphere would still be running. And a massive drop off around Christmas when schools world wide would be on holidays. But the two terms have an R value less than 0.91 (I haven't bothered to work out how much less though). So I guess there is some truth in the age old saying, Correlation != Causation.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
Trends in online web search query data have been shown useful in providing models of real world phenomena. However, many of these results rely on the careful choice of queries that prior knowledge suggests should correspond with the phenomenon.
Yes, that is how science is done; hypothesis, predict, test, evaluate.
Here, we present an online, automated method for query selection that does not require such prior knowledge. Instead, given a temporal or spatial pattern of interest, we determine which queries best mimic the data. These search queries can then serve to build an estimate of the true value of the phenomenon.
So we have a backwards type of science: Evaluate, test, predict, hypothesis. Cuz hey, if there's a correlation, there must be a relation, and if there's a relation, we can build an estimate of the value of the relation, right? The marketing manager is gonna LOVE this....
sysadmins and parents of newborns get the same amount of sleep.
This is great! Now we can finally analyse what people are correlating in Google Trends that tells us what people are searching, then we can use this correlation search data to build Google Correlate Correlate, then we can use this to analyse what people are correlating on things that other people are correlating, then.. then the thing goes on and on and on..
1) Google Search
2) Google Trends
3) Google Flu Trends
4) Google Correlate
5) Google Correlate Correlate
6) Google Correlate Correlate Correlate
7) ???
8) Profit!!!
OMG my head is so dizzy now!!
>>>Explain how is local storage would be more secure than remote storage?
HUGE difference. It requires a warrant to enter my house and obtain the files. A warrant requires probable cause (we suspect he's a murderer, because we smell dead bodies), and review by an impartial judge to approve the warrant.
Remote storage is subject to random snooping by a bored FBI agent browsing through Google's or Apple's or Microsoft's servers. (Thanks to the Patriot Act.)
My AC stalker: " I personally agree with your posts most of the time, but that won't keep me from modding you troll"
With local storage, you have choices. Who can use my computer? Do I use an encrypted volume? Do I use Windows, Linux/*nix, or Mac? What program(s) do I do it with?
With Google docs, your spread sheet is in their format. Your letter is in their format. You can export it, print it, and whatever else makes you feel good. They retain your browsing and activity history. They have every email you've sent and received. In theory it's all yours privately. In reality, it's yours, and viewable by everyone at Google, assuming they have the permissions. I won't say it's *all* google employees that have access, but it is greater than 1, which more than you'd want.
If law enforcement wants to see your remote data, they serve Google with a subpoena or warrant. Google hands the information over, and you may never know. It would likely be done in a court outside of your area anyways.
If law enforcement wants to see your local data, they serve you with the warrant or subpoena. You can chose to contact an attorney immediately. You know what they took, and what they viewed.
Serious? Seriousness is well above my pay grade.
How much time until they launch Google Causation?
If you type "recession" into Google Correlate, it tells you there is a correlation factor of 0.9059 with "microsoft word 2008".
Here's a quick game. Try and find a term with the highest weekly search volume when normalized against the usual search volume for that term.
Here are a few that I tried:
http://correlate.googlelabs.com/search?e=inauguration&t=weekly# - 19.637
http://correlate.googlelabs.com/search?e=Michael+Jackson&t=weekly# - 14.537
http://correlate.googlelabs.com/search?e=Olympics&t=weekly# - 11.656
http://correlate.googlelabs.com/search?e=new+year's+eve&t=weekly# - 8.355
Also, check out the "Search by Drawing" option: http://correlate.googlelabs.com/draw - it's great. Draw your own graph and see what search terms correlate with it.
Sig Appended to the end of comments you post. 120 chars.
A proof that correlation is evidence of causation,
even though correlation does not imply causation:
http://kim.oyhus.no/CorrelationAndCausation.html
Correlated with State Obesity Rates 2009
So, why do people stop caring about autism at Christmas? http://correlate.googlelabs.com/search?e=autism&e=christmas&t=weekly#
I uploaded the closing stock prices of GOOG for the last two years. It showed fairly poor correlations with several random phrases. "Eye won't stop twitching" was my favorite.
I hope I don't catch the flu using that thing.
Try searching "cellulose". You'll see that Christmas is also a nadir, extending up to New Year. In fact, most of the correlations Google gives you have the same exact pattern: the major peak is every September - it that falls to the baseline around November. A second peak is found in January. I think what we're seeing is students searching for homework answers from the Internet. The correlated words are all those that students are likely to search, most of them being unrelated to cellulose. In effect, we're not seeing ethereal "trends" or "Zeitgeist emerging", but in 20/20 hindsight obviously, only what normal random people do daily. So, congcratulations, based on this correlation, you've developed a theory called college semesters!