Slashdot Mirror


Using Twitter Data To Approximate a Telephone Survey

cremeglace writes "A team led by a computer scientist at Carnegie Mellon University has used text-analysis software to detect tweets pertaining to various issues — such as whether President Barack Obama is doing a good job — and measure the frequency of positive or negative words ranging from 'awesome' to 'sucks.' The results were surprisingly similar to traditional surveys. For example, the ratio of Twitter posts expressing either positive or negative sentiments about President Obama produced a 'job approval rating' that closely tracked the big Gallup daily poll across 2009. The analysis also produced classic economic indicators like consumer confidence." By averaging several days' worth of tweets on presidential job approval, the researchers got results that correlated 79% with daily Gallup polling. Lead researcher Noah Smith said, "The results are noisy, as are the results of polls. Opinion pollsters have learned to compensate for these distortions, while we're still trying to identify and understand the noise in our data. Given that, I'm excited that we get any signal at all from social media that correlates with the polls." Here is CMU's press release.

54 of 68 comments (clear)

  1. In fiction... by emkyooess · · Score: 1

    Has anyone read Neal Stephenson's novel "Interface"? We're getting oh so close to it.

  2. New tweet trend by BagOBones · · Score: 1

    No thank-you, please take me off your list... click

    --
    EA David Gardner -"... but the consumers have proven that actually what they want is fun."
  3. Demographics Anyone by mrtwice99 · · Score: 4, Insightful

    It seems that the age demographics of twitter users wouldn't be very representative of the population as a whole.

    1. Re:Demographics Anyone by tpstigers · · Score: 3, Insightful

      Let's not forget all other demographics. Ethnicity, gender, income, employment - just to name a few. Twitter is an amazing resource, but it's hardly representative of humanity or the nation. That said - it can still yield useful data if its limitations are taken into account.

    2. Re:Demographics Anyone by Yold · · Score: 2, Insightful

      not to mention economic indicators; most poor people don't have iPhones that they can tweet their every whim. Some people also don't twitter their political views. This whole thing screams selection bias.

    3. Re:Demographics Anyone by antifoidulus · · Score: 4, Insightful

      Hell even a telephone poll, provided they picked landlines out of a phone book are increasingly less representative of the population as a whole. Young people are abandoning land-line phones to go cell phone only, most of them unlisted. I wonder how the pollsters are adapting to these demographics.

      Hell, I have been considering even getting rid of the phone part of the cell phone and going data only, with Skype et al, is there even any point in paying the $30 or $40 a month for voice service?

    4. Re:Demographics Anyone by jonadab · · Score: 1

      > This whole thing screams selection bias.

      How is that different from phone polls?

      --
      Cut that out, or I will ship you to Norilsk in a box.
    5. Re:Demographics Anyone by Bearhouse · · Score: 2, Funny

      Hell, I have been considering even getting rid of the phone part of the cell phone and going data only, with Skype et al, is there even any point in paying the $30 or $40 a month for voice service?

      I've thought about that two, but:
      1. You've got to support non-tech people 'calling in', (OK, you've got SkypeIn', but
      2. Whadda you do when you cut your leg off @ home, and either Skype or your local data link is down? POTS is very reliable..

    6. Re:Demographics Anyone by silverglade00 · · Score: 1

      2. Whadda you do when you cut your leg off @ home, and either Skype or your local data link is down?

      Agreed. This has happened to me five times already.

    7. Re:Demographics Anyone by asukasoryu · · Score: 2, Interesting

      I think telephone polls only reach one demographic - people willing to take a survey over the phone, who don't instantly hang up on random callers, who don't have anything better to do, and who think other people give a crap what they think. This demographic does not represent me and I doubt it covers most of the US.

      --
      There are more things in heaven and earth than are dreamt of in your philosophy.
    8. Re:Demographics Anyone by Timmmm · · Score: 1

      You still have a phone number... Just get a PAYG SIM with a data plan, e.g. three have 3 GB/month for £5/month.

      Not sure how it would work with paying for receiving calls in the US. It's a pretty crazy system if you ask me - what happens if someone calls you and you have no money on your phone?

    9. Re:Demographics Anyone by CapnStank · · Score: 1

      @#2 I'm not sure if you're aware of this or not but in the case of a 911 emergency cellular phones without a SIM or account are still capable of dialing out. (At least phone's I'm aware of can). Basically if you're stranded somewhere without a land-line and your account is frozen you can still dial 911 and it will go through. You can't dial other numbers however. Also a thing to note is that your phone will be more aggressive when fetching a signal. I've been able to get a 911 connection when my phone reported "no service". On top of that if you're not a preferred carrier (your carrier rents towers from a larger corporation) you sometimes will not be able to call out if the towers are 'occupied', but with 911 it forces your connectivity.

    10. Re:Demographics Anyone by Phyvo · · Score: 1

      The thing is, even if a telephone poll can never represent the people who don't like taking telephone polls, is not taking telephone polls really correlated with other opinions, e.g. Obama approval? If it isn't than the difference between poll haters and poll participants will be insignificant. If it is, I'm no statistician, but it might be possible to measure the difference and apply it to a telephone poll.

      My guess is that they probably have been doing this already for quite awhile, if it's at all possible.

    11. Re:Demographics Anyone by CrazeeCracker · · Score: 1

      I don't know what makes you say this, unless you're suggesting that the "people who have listed phone numbers" demographic is somehow not representative of the population as a whole, but telephone polls, if done properly, have the benefit of taking completely random, and thus fairly representative samples. (Again, if done correctly, i.e. large enough sample base, using proper selection algorithms, and evaluating the data sensibly.)

      --
      Of course I didn't RTFA.
    12. Re:Demographics Anyone by acohen1 · · Score: 2, Insightful

      Indeed. The "people who have listed phone numbers" demographic is most certainly not representative. I don't have one, neither do 3/4 or more of my friends. I think the only ones who do are homeowners, everyone else has just a cell. So there are certainly age and economic status issues here.

    13. Re:Demographics Anyone by Shotgun · · Score: 1

      It does however cover "people who want to see what these telephone poles are about".

      The lady asks me "Are you more likely to vote Democrat or Republican?".

      I say, "Neither. I'm Libertarian."

      "You must choose either Democrat or Republican."

      "But I wouldn't vote for either of those yahoos. Your survey is flawed."

      And so it went for half an hour. It was fun.

      --
      Aah, change is good. -- Rafiki
      Yeah, but it ain't easy. -- Simba
    14. Re:Demographics Anyone by Bearhouse · · Score: 1

      @#2 I'm not sure if you're aware of this or not but in the case of a 911 emergency cellular phones without a SIM or account are still capable of dialing out. (At least phone's I'm aware of can). Basically if you're stranded somewhere without a land-line and your account is frozen you can still dial 911 and it will go through. You can't dial other numbers however. Also a thing to note is that your phone will be more aggressive when fetching a signal. I've been able to get a 911 connection when my phone reported "no service". On top of that if you're not a preferred carrier (your carrier rents towers from a larger corporation) you sometimes will not be able to call out if the towers are 'occupied', but with 911 it forces your connectivity.

      Your phone is not "more aggressive when fetching a signal"; it's just that 911 calls will be routed via ANY carrier's network.

  4. this is going to be obsolete almost immediately by the+gnat · · Score: 4, Insightful

    I'm guessing it will take no more than a month for a combination of "conservative" and "progressive" blogs to rev up their teams of dittoheads to start flooding Twitter with politically themed messages, thus totally skewing the results. Same principle as Google-bombing, I guess. As someone who already views Twitter as almost entirely content-free, I can't say I'm particularly dismayed by this possibility. . . but anything that encourages the self-absorbed political zealots of this country can't possibly be good.

    1. Re:this is going to be obsolete almost immediately by tonycheese · · Score: 1

      Exactly my first though. Observing something will always change how that something behaves.

    2. Re:this is going to be obsolete almost immediately by causality · · Score: 1

      I'm guessing it will take no more than a month for a combination of "conservative" and "progressive" blogs to rev up their teams of dittoheads to start flooding Twitter with politically themed messages

      Well, sure, that is to be expected. Those two groups have much arguing to do about the purpose for which the size and power of the federal government should be expanded. Twitter could be an important growth area for them.

      --
      It is a miracle that curiosity survives formal education. - Einstein
    3. Re:this is going to be obsolete almost immediately by rm999 · · Score: 2, Insightful

      I was thinking of a solution to the selection bias problem that I think would also help with this issue. The researchers could "profile" different users by looking at their history. New users (with little history) and frequent but consistent users (several negative messages about a candidate a day, effectively users that tweet very little useful information) can be discounted, while more dynamic users that change their opinions in interesting ways and correlate with polls can be counted more.

      Pollsters often weight their results to improve accuracy, and this would be no different. It would also remove obvious attempts to influence the results.

    4. Re:this is going to be obsolete almost immediately by Shakrai · · Score: 1

      I'm guessing it will take no more than a month for a combination of "conservative" and "progressive" blogs to rev up their teams of dittoheads to start flooding Twitter with politically themed messages, thus totally skewing the results.

      I love seeing them rally the troops to get everybody to go and vote on the unscientific polls that pop up all over the internet. I suppose one should never discount their importance. As an example, I'm sure that CNN's current quick vote poll, "Do you agree with President Obama's choice of Elena Kagan for the Supreme Court?" will determine the success or failure of the nomination process.

      --
      I want peace on earth and goodwill toward man.
      We are the United States Government! We don't do that sort of thing.
    5. Re:this is going to be obsolete almost immediately by Xtifr · · Score: 4, Insightful

      anything that encourages the self-absorbed political zealots of this country can't possibly be good.

      I dunno. Encouraging them to wast their time on Twitter instead of doing things that might have an actual impact on the world sounds like a pretty good idea to me! :)

    6. Re:this is going to be obsolete almost immediately by djupedal · · Score: 1

      Let's see - a phone survey attempts to approximate a walk-out poll, which attempts to approximate a voting trend which attempts to approximate the outcome of an election.

      So what I'm hearing is that we'll be voting via www.twitface-bloggerspewTMI.com any day now, is that about right?

    7. Re:this is going to be obsolete almost immediately by sunderland56 · · Score: 1

      Observing something will always change how that something behaves.

      Astute observation, Doctor Heisenberg.

    8. Re:this is going to be obsolete almost immediately by bertoelcon · · Score: 1

      If both sides do it it might end up as no gain either way.

      --
      Anything can be found funny, from a certain point of view.
    9. Re:this is going to be obsolete almost immediately by OrwellianLurker · · Score: 1

      I'm guessing it will take no more than a month for a combination of "conservative" and "progressive" blogs to rev up their teams of dittoheads to start flooding Twitter with politically themed messages, thus totally skewing the results. Same principle as Google-bombing, I guess. As someone who already views Twitter as almost entirely content-free, I can't say I'm particularly dismayed by this possibility. . . but anything that encourages the self-absorbed political zealots of this country can't possibly be good.

      It will be similar to how most polls are biased in how they present questions and the results are manipulated to prove points.

      --
      'Political power grows out of the barrel of a gun.' - Mao Tse-tung
    10. Re:this is going to be obsolete almost immediately by yukk · · Score: 1

      Observing something will always change how that something behaves.

      Astute observation, Doctor Heisenberg.

      Well done, Dr Obvious.

      --
      The trouble with the rat race is that even if you win, you're still a rat." Lily Tomlin
    11. Re:this is going to be obsolete almost immediately by doug141 · · Score: 1

      I was surprised to see your prediction of both conservative and progressive attempts to skew results. According to examples on wikipedia, Google-bombing and Googlewashing are propaganda tools historically used almost exclusively by progressives. http://en.wikipedia.org/wiki/Google_bomb

    12. Re:this is going to be obsolete almost immediately by russotto · · Score: 1

      I was surprised to see your prediction of both conservative and progressive attempts to skew results. According to examples on wikipedia, Google-bombing and Googlewashing are propaganda tools historically used almost exclusively by progressives. http://en.wikipedia.org/wiki/Google_bomb

      Makes sense; conservatives usually stick to editing Wikipedia.

  5. Brb making multiple twitter accounts by Anonymous Coward · · Score: 3, Interesting

    Just like traditional pollsters, social media researchers will have to address how representative Twitter users are of the general population. And unlike telephone surveys, small groups of people can wildly skew the results of Internet data,

    Yes I did STFA (Skim the fucking article).

    It mentioned the two main problems I see with this, cheating the system and whether twitter really is a large enough sample and a random enough sample to be considered a viable alternative.

    Twitter has a whole range of people who don't actually use the damned thing. As with any poll though, people are going to say that the minority polled is what everyone says.

    "The American people want to do x! Our poll says 80% of the American people want it!" No. No it doesn't. It just means 80% of the people you polled want it.

    I despise how easy it is to use statistics and polls to manipulate people.

    1. Re:Brb making multiple twitter accounts by deniable · · Score: 3, Funny

      We had a report on net filtering here the other night. 95% want filters, 80% oppose them. I conclude that at least 50% are confused.

  6. 79% is not fantastic by Ed+Peepers · · Score: 5, Informative

    I've collaborated on research using Twitter traffic as a predictor so I applaud their efforts, but a 79% correlation with telephone responses is not as high as it sounds. For example, the minimum acceptable correlation for interrater reliability is typically 80%.

    Put simply, the Twitter data can only account for about two thirds of the variation in phone responses. That's useful but there's still a lot of unexplained variance -- we have a long way to go.

    1. Re:79% is not fantastic by tonycheese · · Score: 1

      Maybe I just don't understand inter-rater reliability, but where did you get 2/3 from? 79% is pretty much 4/5, not anywhere near 2/3.

    2. Re:79% is not fantastic by Anonymous Coward · · Score: 3, Informative

      Inter-rater reliability is sometimes taken as the correlation between two raters' scores. Reliability is a different concept from variance explained, which is equal to the square of the correlation. Twitter can predict 0.79 * 0.79 = 62% of the variance in phone responses.

    3. Re:79% is not fantastic by pesto · · Score: 1

      I'm not sure what a "79% correlation" even means. The way to describe correlations is to provide estimated correlation coefficients. It appears that even the original article uses this bizarre percentage notation ("r = 63.5%"), which suggests that perhaps the authors don't understand correlation as well as they think they do. Sigh. This is what happens when computer scientists try statistics without any training...

  7. Other applications by optimarcusprime · · Score: 1

    I bet you could use this same system to assemble data about all kinds of interesting subjects. TV show viewership, web application downtime, top news articles by reader interest, etc. Really cool.

  8. 79% by barfy · · Score: 1

    That is NOT nearly correlated? That is BARELY correlated. And will not get you meaningful results. This will also stop being meaningful as soon as it is publicized people are paying attention to the content of the tweets.

    Cripes.

    1. Re:79% by Shotgun · · Score: 1

      Are you making a claim that telephone surveys give meaningful results when the surveyors try to shoehorn respondents to accept answers that are on their multiple choice questionaire?

      --
      Aah, change is good. -- Rafiki
      Yeah, but it ain't easy. -- Simba
  9. Big surprise by Cryacin · · Score: 5, Insightful

    The noise coming from one group of twits is the same as the noise coming from another group of twits.

    Film at 11.

    --
    Science advances one funeral at a time- Max Planck
    1. Re:Big surprise by Garywit · · Score: 1

      This is nice. keep it up... Colon cleanser

  10. Now that the word is out it's useless. by pecosdave · · Score: 2, Interesting

    As soon as something like this comes to light it's only a short matter of time until turfers screw it up. Turfers are like spammers, as soon as there's a new medium they abuse it into uselessness.

    --
    The preceding post was not a Slashvertisement.
  11. from "awesome" to "sucks" by byrdfl3w · · Score: 1

    ...were they doing a poll on windows 7?

    1. Re:from "awesome" to "sucks" by selven · · Score: 1

      So something like this?

    2. Re:from "awesome" to "sucks" by byrdfl3w · · Score: 1

      I think I just went from "facetious" to "frightened"! Classic.

  12. Twitter Users =/= Average American by Anonymous Coward · · Score: 4, Insightful

    As someone who spends a lot of their time working to uncover endogeneities in statistical analysis, I feel that analyzing Tweets will never be a viable measure of general American opinions.

    Remember when The Literary Digest predicted Alf Landon would crush FDR in the 1936 presidential election based on a poll of its subscribers? Okay, you don't *remember* that, but you've probably heard of it. Same problem here.

    The readers of Literary Digest were not representative of the average American in 1936.
    The users of Twitter are not representative of the average American in 2010.

    Twitter polling is no better than straw polling, which is usually worse than nothing.

  13. Both "McCain" and "Obama" mean Obama is good? by AthanasiusKircher · · Score: 2, Interesting

    Umm... from TFA:

    Likewise, both the Twitter-derived sentiments and the traditional polls reflected declining approval of President Obama's job performance during 2009, with a 72 percent correlation between them.

    Okay... not a great correlation, but let's continue....

    But the researchers found that their sentiment analysis did not correlate as well with election polling during 2008. For instance, increased mentions of "Obama" tended to correlate with rises in Barack Obama's polling numbers, but increased mentions of "McCain" also correlated with rises in Obama's popularity.

    WTF? Is all of this built on how many times "Obama" or "McCain" is uttered on Twitter? And, given the obvious skewed demographics on Twitter (i.e., younger people, which tended to poll way toward Obama), increased conversations about McCain probably were bad in general.

    Well, how do they explain this? Ah, the next sentence....

    Improved computational methods for understanding natural language, particularly the unusual lexicon of microblogs, will be necessary before Twitter feeds can be reliably mined to predict elections, the researchers concluded.

    Ah yes, the "unusual lexicon of microblogs," which probably consisted of sentiments like "I luv Obama!" and "McCain too old - WTF?"

    Perhaps if they bothered to measure more than "mentions" of a candidates name, the data might have some (albeit still vague) meaning...

    If this is the best stuff from the study which they actually mention in a press release, how much crap results are they not reporting?

  14. it doesent suck by kyle222234 · · Score: 1

    hopefully these polling algorithms will take into account when i'm saying "microsoft doesen't suck" that i'm actually meaning that they do suck, and not attempting to derive my meaning from my literal translation. If they profile my use of irony they should be fine.

  15. I'm excited that we get any signal at all by aegl · · Score: 1

    "I'm excited that we get any signal at all from social media"

    s/excited/amazed/

  16. Twitter as a telephone survey ??? by XRedHat · · Score: 1

    OK - Maybe I don't get it - WHY are we using a Twitter survey to be representative of a telephone survey ? WTF - are we stupid ? A Twitter survery is one thing; a telephone survey is another... Never the twain shall meet... MS

  17. Modern Technology Approximates Mediocrity by Sir+Realist · · Score: 1

    Wow! You mean modern technology has progressed to the point where it can approximate the results of totally inaccurate guesswork derived from people so sad that they'll actually stay on the line when phoned up by a total stranger?

    Truly, technology amazes me. (Or humanity does. I can never quite keep that straight.)

  18. Didn't they already do this? by senorbum · · Score: 1

    HP already did something very similar. You can google predicted box office success twitter, or simply view HP's report. http://arxiv.org/PS_cache/arxiv/pdf/1003/1003.5699v1.pdf So congrats on being behind CMU? I guess the concept is slightly different making it new research, but not new enough to really merit a whole lot of discussion.

  19. From TFS by OrwellianLurker · · Score: 1

    measure the frequency of positive or negative words ranging from 'awesome' to 'sucks.'

    So what about "Obama doesn't suck!" or "McCain is as awesome as my grandmother who isn't awesome at all!"

    Damn.

    --
    'Political power grows out of the barrel of a gun.' - Mao Tse-tung
  20. Slashdot must be slipping by tehcyder · · Score: 2, Funny

    There weren't enough mentions of Apple either in the summary or the comments posted.

    --
    To have a right to do a thing is not at all the same as to be right in doing it