Google Can Predict the Flu
An anonymous reader mentions Google Flu Trends, a newly unveiled initiative of Google.org, Google's philanthropic arm. The claim is that this Web service, which aggregates search data to track outbreaks of influenza, can spot disease trends up to 2 weeks before Centers for Disease Control data can. The NYTimes writeup begins: "What if Google knew before anyone else that a fast-spreading flu outbreak was putting you at heightened risk of getting sick? And what if it could alert you, your doctor and your local public health officials before the muscle aches and chills kicked in? That, in essence, is the promise of Google Flu Trends, a new Web tool ... unveiled on Tuesday, right at the start of flu season in the US. Google Flu Trends is based on the simple idea that people who are feeling sick will tend to turn to the Web for information, typing things like 'flu symptoms; or 'muscle aches' into Google. The service tracks such queries and charts their ebb and flow, broken down by regions and states."
Buy stock in companies that sell treatments for Beri-Beri, Trench Foot, and Jungle Rot, and then have your botnet look them on on google.
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
Do to the /. effect thousands of /.ers started googling flu symptoms causing the predictor to indicate a flu outbreak.
Thousands of hypochondriacs responded by checking themselves into hospitals complaining about flu-like symptoms.
As a capitalist, and an incubator, I've spent tens of thousands of dollars (per project) on market analyses. For me, finding if a particular good or service, even a niche or very specific on, is desired in a given area is expensive. It's often the MOST expensive thing I do before starting a business.
I've always harbored the idea that Google's grasp of data, even just raw data, is their most important resource. As they make this information available, the market will prosper. I've been able to use Google Trends (national, not local) to profit from the so-called "long-tail" and enter a business market I might otherwise not have.
When Google starts making trend data available based on region, it will be a huge boon for guys like me -- the risk takers. I'd love to know if a certain term is growing in popularity in given regions, or even in given regions at certain times (say "Where can I get vegan food?" in Chicago after 10pm but before 4am). I'd love to know if it's from a desktop or mobile, or even a Mac versus PC. By digging deeper into a customer-base's desire, Google trending can offer me a profitable business, but it can also offer the customer base more competition (or even a product that isn't readily available in their market).
The flu trending is just an eyewash to push Google's strength in raw data retention over time. That's their reason for doing it. Will it help people? Certainly. But to those anti-capitalists, this is exactly where capitalism reaches those in need, but still can provide a profit for the charitable person or company.
It doesn't predict anything reliably. Too many variables.
Simply put: If you're looking for help online for flu symptoms, that doesn't correlate with an 'outbreak' of flu.
And what defines outbreak anyway?
Well, the way flu works, if you have it, you're likely to give it to someone else. You may google about it when you don't actually have it, but how often does that happen? The number of false positive searches would probably be somewhat low, and either way they would be constant. Google serves millions of search results a day, if not more. Almost everything "random" would, over time, look constant. When non-random things happen, like people from a certain region (remember, google knows your IP) getting the flu, even a 1% increase in flu related searches is extremely significant, if it otherwise doesn't vary that much.
YOU googling for flu symptoms doesn't necessarily indicate if you have the flu, but a large increase in the number of people googling it probably does. Especially if you can compare your data to the CDC data, to check your theories.
-Taylor
Worldwide Military budgets: $2100 billion. Worldwide Space Exploration budgets: $38 billion. Really, world? Really?
A very long time. How on earth is this "interesting?" Is crazed paranoia on /. really the most interesting thing you've seen all day? I think some of the mods need to get out more.
Simply put: If you're looking for help online for flu symptoms, that doesn't correlate with an 'outbreak' of flu.
If many, many others in your area are doing the same, it just might indicate a local outbreak.
Graph this over time, and you might see trends happening.
Do this for a couple of years, and compare to actual CDC data, and you might just find it works.
Well, if you RTFA, you'll see that Google's method applied to the past four years very closely mathches trend data collected by physicians in coordination with the CDC. The proof is in the pudding
Exactly, which is why I'll be impressed when they can do this ahead of time. I'm not holding my breath. Analysing data trends in existing data and concluding you can predict them is not impressive.
These posts express my own personal views, not those of my employer
Simply put: If you're looking for help online for flu symptoms, that doesn't correlate with an 'outbreak' of flu.
I think you need to look up 'correlate' in a dictionary, you obviously have no idea what the word means. A correlation is not a one to one relation, if A correlates with B all that means is that A is more likely if B is true.
Sure, the fact that i just went and searched for flu stuff out of curiosity doesn't mean there's an outbreak near me, but people presumably perform searches on this at a pretty steady rate, and a flu outbreak ought to cause a spike in searches. The occasional false positive happen in a region, say if there's a news story on the flu, but to say there's no correlation is ridiculous.
You could I suppose argue that the correlation is too weak to pick out from the noise, however if you RTFA, it is quite clear that the correlation is quite strong enough to produce useful results.
Dogs and other animals do random things that might seem a little odd all the time. Most of the time, you don't give it a second thought, but when something unpredictable happens, like an earthquake, if you believe in the supernatural powers of dogs, then you might connect the random odd acts with the earthquake after the fact, in much the same way that plagues of locusts and floods get connected with the actions of people leading up to their occurrence, and ascribed to "punishment from God" in the bible.
if you did this for four years, then you'd be on to something.
How about all the people who haven't gotten smallpox? How about the people who haven't been crippled from polio? Or maybe the people who have avoided tetanus, measles, mumps, and rubella?
Oh right, you forgot about all of those people, even though that pretty much describes everybody.
This sort of thing has been floated around for a while under the banner of 'syndromic surveilance'. I spent most of the last three years working on a research project that involved gathering data on water quality and developing statistical software to find subtle indications of contamination. The intent was always to extend the approach to syndromic data, incorporating things like over-the-counter medicine sales, ER visits, and so forth.
Unfortunately, it turns out that none of us on the team knew enough about statistics to manage a fantasy football league. I'm now happily self-employed doing stuff absolutely unrelated to statistics. I think some of my hair has grown back, and I hardly even cringe when someone says 'generalized least squares'.
If you're interested, though, here is a paper from the CDC on the subject. I'm pretty sure they have a better idea what they're talking about. Or at any rate, they've got nicer graphics.
That's why we have a "flu season." It is very cyclic in nature. Trends will very likely be a good indicator when localized spikes of new queries provide a precursor where a previous trend can further enforce.
In other words, a spike of localized searches related to flu falling well within flu season for a given geographic locale, likely is a precursor to a growing flu outbreak. It's really not that hard to imagine - especially once you consider the incubation time of your typical flu virus.
The lead time prediction of Google's method verses the CDC's post-suffering reporting is easy to guess. The CDC's numbers measure reported cases. Google's method measures localized interest (signal), develop a metric to discern against baseline interest (noise), and apply against trend data (signal has velocity), you likely have identified a growing flu outbreak. Once you add the incubation time, it's likely Google's numbers have a strong correlation with the reported CDC numbers.
The CDC charts reported flu cases. Flu cases are only reported if you seek medical care. If you just go to the drug store and buy a bottle of NyQuil, the CDC doesn't know you had the flu. I should not have to tell you this...
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"