Slashdot Mirror


Researchers Forecast the Spread of Diseases Using Wikipedia

An anonymous reader writes Scientists from Los Alamos National Laboratory have used Wikipedia logs as a data source for forecasting disease spread. The team was able to successfully monitor influenza in the United States, Poland, Japan, and Thailand, dengue fever in Brazil and Thailand, and tuberculosis in China and Thailand. The team was also able to forecast all but one of these, tuberculosis in China, at least 28 days in advance.

61 comments

  1. Tickle! by Anonymous Coward · · Score: 0

    The wizard will cast a spell on your ass to make it tickle! Why the fuck did you go in the living room!?

  2. INteresting by cbhattarai · · Score: 1

    This is really an interesting stuff. I guess we have every single thing in WIKI.

  3. forecast using /. by Selur · · Score: 1

    wondering when they start to try to predict diseases (or may be pc sales) from /. posts

    1. Re:forecast using /. by Anonymous Coward · · Score: 0

      Not a bad idea, but don't expect that method to detect any trace of paranoia or sociopathy here.

    2. Re:forecast using /. by Anonymous Coward · · Score: 0

      obesity, shortsightedness and what else?

    3. Re:forecast using /. by Anonymous Coward · · Score: 0

      lol - there's an apsie epidemic!

  4. Sounds familiar by Anonymous Coward · · Score: 1

    Sounds familiar, hasn't someone already done that half a year or a year ago using Google search string mapping?

    1. Re:Sounds familiar by Anonymous Coward · · Score: 2, Informative

      Thought so, it was Google, and they even created a page with real-time stats.
      http://www.google.org/flutrends/us/#US

    2. Re:Sounds familiar by umghhh · · Score: 1

      it works like fighting evil regimes by clicking on 'likes' button of fb and alikes does.

    3. Re:Sounds familiar by sumdumass · · Score: 1

      Which is ancient magic compared to the power of hashtags. ... #duh

  5. How? by Qbertino · · Score: 1

    How did they do it? I started reading the linked paper, but my brain started hurting two sentences in. I couldn't extract any useful information on the 'how'.

    --
    We suffer more in our imagination than in reality. - Seneca
    1. Re:How? by Anonymous Coward · · Score: 0

      Same way economists predict market crashes: Lots and lots of predictions.

    2. Re:How? by ctrl-alt-canc · · Score: 4, Informative
      They made the assumption that if a disease is spreading somewhere, there people start looking for information about the disease on wikipedia.
      This implicitly makes some big assumptions, among which the facts that people are aware of the disease and that they have internet access.

      You can easily understand why their approach is of very limited usefulness, and scientifically questionable. I think that it is not by chance that their method fails to work when analyzing data for Uganda (where internet usage probably isn't widespread) and does not score well for China (where censorships both limits information about disease outbreaks and internet access).

      They also state in their paper: "With these constraints in mind, we used our professional judgement to select diseases and countries.", and this raised my eyebrows a lot...

      I would like to put at chance their approach by sifting wikipedia access data looking for Ebola keyword in slovenian language, and then forecast the diffusion of Ebola in Slovenia (equal to nil up to now...), but I try to use my time for testing methods that are better-posed.

      "There are three kinds of lies: lies, damned lies, and statistics."

    3. Re:How? by NoNonAlphaCharsHere · · Score: 1

      I don't think you're being fair. This research extends their ground-breaking study that searching Google for "Jennifer Lawrence iCloud hack" predicted fapping with 100% accuracy.

    4. Re:How? by Anonymous Coward · · Score: 0

      Their main assertion is that the data contained in Wikipedia logs contains information about the spread of disease. They then prove fairly substantively that it does. You may well object that under some circumstances it contains less but that doesn't really affect the validity of the result that it does contain information. Nobody is asking for 5 sigma significance on a future prediction, all they are asking for is moderately accurate forecasts, in particular early warning signals. It's not by coincidence that they compare the technique to weather forecasting. You don't need to know that a hurricane will definitely occur, you just need a warning to maybe get your boat out of the water.

    5. Re:How? by wisnoskij · · Score: 2

      That was my thought. The only way I can think of to use Wikipedia log data to predict outbreaks, would also of predicted that American was in the grip of a huge Ebola epidemic a few weeks ago. Perhaps this wiki data is just any easy way to measure media attention to a subject, which often is correlated with an epidemic? It is measuring the public's attention, not actually making a prediction.

      --
      Troll is not a replacement for I disagree.
    6. Re:How? by Rosco+P.+Coltrane · · Score: 3, Funny

      They made the assumption that if a disease is spreading somewhere, there people start looking for information about the disease on wikipedia

      Imagine the potential: if a lot of search logs contain "EBOL-AAAARGH", they'll know a particularly fast-acting variant of the virus has emerged.

      --
      "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    7. Re:How? by Rosco+P.+Coltrane · · Score: 1

      I think the most important piece of news of this story is that Wikipedia is no better than Google or Facebook, and exploits/sells search data too.

      --
      "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    8. Re:How? by Anonymous Coward · · Score: 0

      "They then prove fairly substantively that it does"

      Pearson's R between two time series...No. Either you predict the form of relationship via some derived theory or just kind of think maybe there might be something there.
      http://www.tylervigen.com/

    9. Re:How? by del_diablo · · Score: 1

      Which raises the question: If you search for the symptom keywords(Rash, Boils, Bleeding, coughing), can Wikipedia actually list diseases with those keywords?

      From experience I do know that a lot of food can be typed in a native language, and it will still go to the correct page on English Wikipedia, roughly.
      But if I start search for terms and keywords, Wikipedia tend to be worse than google.

    10. Re:How? by Anonymous Coward · · Score: 0

      He actually typed out the "aaaauuuuugh"?

      Maybe he was dictating.

    11. Re:How? by Anonymous Coward · · Score: 0

      Wikipedia's traffic stats are freely available to everyone. The information is aggregated such that the only question that can be answered is "how many times was page X viewed on day Y?"; things like which country the page is being viewed from need to be guessed based on things like what language it's written in.

    12. Re:How? by radarskiy · · Score: 1

      'They made the assumption'
      They made a hypothesis, then tested that hypothesis against the null hypothesis. This is otherwise known as science. Why do you hate science?

  6. Its happening! by Anonymous Coward · · Score: 0

    28 days later...

  7. 28 days in advance of later? by Anonymous Coward · · Score: 0

    I predict China will be ground zero for the next big zombie pandemic. Some sort of pandemic, to make Ebola look like a Hawaiian vaction.

    1. Re: 28 days in advance of later? by Anonymous Coward · · Score: 0

      The NEXT big zombie pandemic? Maybe I've been living under a rock, but when/where was the first one??

    2. Re: 28 days in advance of later? by Anonymous Coward · · Score: 2, Funny

      Look, we're onto your game. The suggestion that you've been living under a rock was a dead giveaway that you're a zombie...

  8. business plan... by Anonymous Coward · · Score: 0

    1) buy shares in pharmaceutical with unique and unprofitable vaccien for disease X
    2) make bots that automate Wikipedia searches for disease X, deploy
    3) ....
    4) PROFIT

  9. Re:Wats poppin my negroes by NoNonAlphaCharsHere · · Score: 1

    Jack Bauer found out who was there, who they worked for, and where the goddamn bomb was.

  10. How on earth... by Anonymous Coward · · Score: 0

    ...are diseases using Wikipedia? Those little rascals are getting smart.

  11. "... Spread of Diseases Using Wikipedia" by garutnivore · · Score: 1

    Wait... what? Diseases now use Wikipedia?

    1. Re:"... Spread of Diseases Using Wikipedia" by NoNonAlphaCharsHere · · Score: 1

      Why not? Viruses use Outlook.

    2. Re:"... Spread of Diseases Using Wikipedia" by Anonymous Coward · · Score: 0

      No, silly - the diseases themselves are not using Wikipedia; people are going to use Wikipedia to spread diseases.

      (I rather enjoy the triple meaning ambiguity in this headline)

    3. Re:"... Spread of Diseases Using Wikipedia" by Anonymous Coward · · Score: 0

      Yeah, but they are diseases of the mind. They spread through Wikipedia and Tumblr.

    4. Re:"... Spread of Diseases Using Wikipedia" by QilessQi · · Score: 0

      +MAX_INT. GP and Parent have officially won the thread.

    5. Re:"... Spread of Diseases Using Wikipedia" by arth1 · · Score: 1

      No, silly - the diseases themselves are not using Wikipedia; people are going to use Wikipedia to spread diseases.

      (I rather enjoy the triple meaning ambiguity in this headline)

      Wouldn't it be nice if headlines used commas and reflexive pronouns?
      Or if there were someone who checked them over before publishing, like a proofreader?

      I too read it as using Wikipedia to spread the diseases. Which is, I guess, doable, if logging gene sequences there, which someone else can splice into harmless but compatible bacteria.
      Would publishing that kind of information be illegal?

    6. Re:"... Spread of Diseases Using Wikipedia" by sumdumass · · Score: 1

      Oh noes.. when will we be able to get wikicondums and how would that work?

  12. Useless now that it's known? by fygment · · Score: 1

    Now that they've spread the word, will the approach start to be 'gamed' by big pharma or gov't trying to sow the seasonal flu panic?

    --
    "Consensus" in science is _always_ a political construct.
  13. Take that Educators! by gunner_von_diamond · · Score: 1

    And teachers always say not to use Wikipedia for research. "Wikipedia is the devil!" When used correctly Wikipedia is a valuable resource.

    1. Re:Take that Educators! by terbo · · Score: 1

      The teachers might not know about 'Talk Pages', 'Revisions', and 'What Links Here':
      things that make wikipedia much more advanced than traditional encyclopedias.

      --
      If you're interested in facts I'll tell you what they are and I'll give you sources - Chomsky on The Big Idea
    2. Re:Take that Educators! by tehcyder · · Score: 1

      The teachers might not know about 'Talk Pages', 'Revisions', and 'What Links Here':
      things that make wikipedia much more advanced than traditional encyclopedias.

      No, teachers know that lazy students will just blindly copy and paste stuff from wikipedia.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
    3. Re:Take that Educators! by Anonymous Coward · · Score: 0

      Yeah? Follow @congressedits for a while. You'll start to see just how valuable those revisions really are.

  14. Man! Wikipedia is mean. by 140Mandak262Jamuna · · Score: 1

    I thought Wikipedia was spreading just misinformation and biased information. Now they are spreading actual biological diseases using Wikipedia? I'm not surprised. Internet is a lawless frontier and anything goes there.

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
  15. umm by superwiz · · Score: 1

    Why not google trends? It's already categorized.

    --
    Any guest worker system is indistinguishable from indentured servitude.
    1. Re:umm by necro81 · · Score: 1

      Google has been working on that, it's called Flu Trends. But it hasn't really proven itself out yet. See my post below.

    2. Re:umm by superwiz · · Score: 1

      You can cross-correlate multiple medical term searches and conditions and see the trends in search over broken down by regions. It's not limited to flu. You can do it by other (some slowly-spreading) medical conditions.

      --
      Any guest worker system is indistinguishable from indentured servitude.
  16. Wikisneezia? by MagickalMyst · · Score: 0

    Wikisneezia.

    --
    Political correctness is really just herd psychology pushed by insecure people who desperately seek social conformity.
  17. Geolocation languages? by Anonymous Coward · · Score: 0

    Using linear models, language as a proxy for location

    I'm not sure language is such a good indicator for where people are located. I usually use the English pages because the length and quality of the text tend to be better. Also quite a number of pages only exist in English. I'm quite sure this "language statistic corruption" is quite widespread and that English native speakers are unaware of the great quality difference between languages. The data is likely bogus unless this is taken into account.

    Having said that, there is something odd about the article. The abstract mentions language as indicator for where people are. However the first figure has both language and country columns. Most match as expected (Polish for Poland etc), but there are exceptions. French and Haiti are in the same row, and Haiti isn't the first country I think of if people use French (that would be France). This mean they are likely using IPs too to detect geolocation. It seems natural, but the abstract doesn't mention anything about using anything other than language for this task.

  18. It's been done, sort of by necro81 · · Score: 1

    Google tried (is still trying?) to track the spread of influenza, by watching the trends in searches for information about the disease. It's a very interesting bit of work, but as I recall, failed to be meaningfully predictive. The trouble is, there are lots of prosaic reasons why someone might search out information about the flu (or any other disease) other than actually having it. Separating that noise (general interest in the flu) from the genuine signal (particular interest from people who are infected). Doesn't mean it can't work, just that it hasn't been made to work yet.

  19. Amazing by Anonymous Coward · · Score: 0

    Like wow. Diseases can figure out how to use Wikipedia to spread more quickly.

  20. This is why... by CODiNE · · Score: 1

    I always wash my hands after using Wikipedia.

    --
    Cwm, fjord-bank glyphs vext quiz
  21. Wikipedia the vector by Bruce+Perens · · Score: 1

    Like others I found the headline confusing. I read it as "Researchers are predicting the use of Wikipedia as a vector for the spread of disease". This may mean that:

    • Disinformation and ignorance are diseases.
    • Memes and computer viruses are diseases.
    • Wilipedia contains information that leads to depression.
    • Instructions on Wikipedia lead to substance abuse.
    • This is getting entertaining, fill in your own reason here.
  22. Re: Wats poppin my negroes by electrosoccertux · · Score: 1

    whose there?

  23. PRIVACY? by Anonymous Coward · · Score: 0

    Did Wikipedia provide the data? Does Wikipedia make the data public?

    Connecting page load to IP address seems like extremely sensitive information, and not something Wikipedia should record or share.

  24. google flu trends by Alphons+Clenin · · Score: 1

    google has been forecasting flu through search data for a while.

    http://www.google.org/flutrends/us/

    It doesn't work perfectly though:

    http://www.nature.com/news/when-google-got-flu-wrong-1.12413

    1. Re:google flu trends by Fpdx · · Score: 1

      yes, but google does not share its log files!

      Google published a Nature paper out of it. AFAIK the data (google queries) on which that research is based is kept well secret. Therefore it is not possible to validate what they did. Science cannot be based on secret data, and the journal Nature in this case published an advertising ("how awesome is google"), not a scientific paper ("these are the data, this is our method, check out our conclusions").

      As they athors here say, approaches from closed sources like google limit a lot the efficiency of this kind of approach. So they choose a free software thinking: wikipedia because the data is public + their software is free software. Good work.