Using Twitter Data To Approximate a Telephone Survey
cremeglace writes "A team led by a computer scientist at Carnegie Mellon University has used text-analysis software to detect tweets pertaining to various issues — such as whether President Barack Obama is doing a good job — and measure the frequency of positive or negative words ranging from 'awesome' to 'sucks.' The results were surprisingly similar to traditional surveys. For example, the ratio of Twitter posts expressing either positive or negative sentiments about President Obama produced a 'job approval rating' that closely tracked the big Gallup daily poll across 2009. The analysis also produced classic economic indicators like consumer confidence." By averaging several days' worth of tweets on presidential job approval, the researchers got results that correlated 79% with daily Gallup polling. Lead researcher Noah Smith said, "The results are noisy, as are the results of polls. Opinion pollsters have learned to compensate for these distortions, while we're still trying to identify and understand the noise in our data. Given that, I'm excited that we get any signal at all from social media that correlates with the polls." Here is CMU's press release.
Just like traditional pollsters, social media researchers will have to address how representative Twitter users are of the general population. And unlike telephone surveys, small groups of people can wildly skew the results of Internet data,
Yes I did STFA (Skim the fucking article).
It mentioned the two main problems I see with this, cheating the system and whether twitter really is a large enough sample and a random enough sample to be considered a viable alternative.
Twitter has a whole range of people who don't actually use the damned thing. As with any poll though, people are going to say that the minority polled is what everyone says.
"The American people want to do x! Our poll says 80% of the American people want it!" No. No it doesn't. It just means 80% of the people you polled want it.
I despise how easy it is to use statistics and polls to manipulate people.