Slashdot Mirror


Researchers Develop an Internet Truth Machine

Hugh Pickens writes "Will Oremus writes that when something momentous is unfolding—the Arab Spring, Hurricane Sandy, Friday's horrific elementary school shooting in Connecticut—Twitter is the world's fastest, most comprehensive, and least reliable source of breaking news and in ongoing events like natural disasters, the results of Twitter misinformation can be potentially deadly. During Sandy, for instance, some tweets helped emergency responders figure out where to direct resources. Others provoked needless panic, such as one claiming that the Coney Island hospital was on fire, and a few were downright dangerous, such as the one claiming that people should stop using 911 because the lines were jammed. Now a research team at Yahoo has analyzed tweets from Chile's 2010 earthquake and looked at the potential of machine-learning algorithms to automatically assess the credibility of information tweeted during a disaster. A machine-learning classifier developed by the researchers uses 16 features to assess the credibility of newsworthy tweets and identified the features that make information more credible: credible tweets tend to be longer and include URLs; credible tweeters have higher follower counts; credible tweets are negative rather than positive in tone; and credible tweets do not include question marks, exclamation marks, or first- or third-person pronouns. Researchers at India's Institute of Information Technology also found that credible tweets are less likely to contain swear words (PDF) and significantly more likely to contain frowny emoticons than smiley faces. The bottom line is that an algorithm has the potential to work much faster than a human, and as it improves, it could evolve into an invaluable 'first opinion' for flagging news items on Twitter that might not be true writes Oremus. 'Even that wouldn't fully prevent Twitter lies from spreading or misleading people. But it might at least make their purveyors a little less comfortable and a little less smug.'"

6 of 87 comments (clear)

  1. Cultural bias? by Anonymous Coward · · Score: 5, Insightful

    This is really interesting research, but it's also based on one event in one country.

    Conclusions based on what may be language or cultural norms (such as "did you phrase in the positive or the negative") might not translate to other locales well (e.g. Hurricane Sandy in the US).

    But, then, that's what's great about science. Testable predictions we can apply to data.

    1. Re:Cultural bias? by jfengel · · Score: 5, Insightful

      It's a popular denier meme: 1998 was a very hot year and if you start your data series there you can show an overall decline.

      Viewed on any other scale, this artifact goes away. But it doesn't matter how many times you tell deniers about that; they know what story they want to tell and will continue to cherry pick the data to tell it.

  2. Rating individual tweets, accurate? by JaredOfEuropa · · Score: 5, Insightful

    So it provides a first opinion on first posts, sort of. Neat, but I do wonder how accurate this is going to be to vet individual tweets. Twitter trolls may get wise to this and game the system to get their stuff past this filter. A bit like phishers learning how to spell. In the end, the best check is still independent verification, for example by other people tweeting the same thing (not just retweeting of course). If this system could automatically group and cross-verify tweets from multiple sources on the same subject, that would be a step in the right direction.

    --
    If construction was anything like programming, an incorrectly fitted lock would bring down the entire building...
  3. Chile's Earthquake by thejynxed · · Score: 5, Interesting

    It's interesting to note, that a seismology student at a university in Chile finally had enough nonsense from false information over Twitter, etc about earthquakes, that he directly wired a big batch of seismographs to directly post their results via Twitter. The last I knew, they had over 1 million followers, and this particular student has been getting big thank yous from residents of the country.

    --
    @Mindless Drivel: 100% of Twitter posts ever Tweeted.
  4. Reliable by Anonymous Coward · · Score: 5, Funny

    Twitter is the world's fastest, most comprehensive, and least reliable source of breaking news

    Twitter has dethroned Fox News?!?

  5. Gaming Reliability/Credibility Assessment by girlinatrainingbra · · Score: 5, Interesting

    Of course, in just the same way that spammers can game Bayesian spam filters or rule-matching pattern filters by knowing what the rules are, given a known set of rules that attempt to assess credibility of tweet allows someone to tweak their tweets in order to be assessed as having high credibility:
    1 -- max out your tweet length
    2 -- include an URL [doesn't say whether to use a link shrtnr ;>(]
    3 -- use a Twitter account with a high number of followers
    4 -- use a negative tone
    5 -- no question marks or exclamation points
    6 -- use 2nd person (same as don't use 1st or 3rd person)
    7 -- don't use swear words
    8 -- use a sad emoticon
    .
    Example to maximize this:
    a - break into / hack a high follower account (e.g. justinbieber) and tweet: cat > finaltweet
    You should know Mayan Calendar sez: world ending this week. Confirmed@ http://netcraft.calendar.mayan/ you go hug loved 1s now. :>( beebs
    wc finaltweet
    1 20 139 finaltweet

    First iteration was:
    gia@sodium$ cat > count2
    You should know that Mayan Calendar says : world ending within week. Confirmed by http://netcraft.calendar.mayan/ , you should hug loved ones now. :>( -- beebs
    gia@sodium$ wc count2
    1 25 159 count2

    Please note that the "[netcraft.calendar.mayan]" was inserted by /.'s /-code and is not part of the wc wordcount :>(