Slashdot Mirror


Deep Neural Networks for Bot Detection (arxiv.org)

From a research paper on Arxiv: The problem of detecting bots, automated social media accounts governed by software but disguising as human users, has strong implications. For example, bots have been used to sway political elections by distorting online discourse, to manipulate the stock market, or to push anti-vaccine conspiracy theories that caused health epidemics. Most techniques proposed to date detect bots at the account level, by processing large amount of social media posts, and leveraging information from network structure, temporal dynamics, sentiment analysis, etc. In this paper [PDF], we propose a deep neural network based on contextual long short-term memory (LSTM) architecture that exploits both content and metadata to detect bots at the tweet level: contextual features are extracted from user metadata and fed as auxiliary input to LSTM deep nets processing the tweet text.

3 of 39 comments (clear)

  1. Wew by negRo_slim · · Score: 4, Insightful

    Putting way to much confidence in bots ability to do any of those things listed in the summary.

    --
    On the Oregon Cost born and raised, On the beach is where I spent most of my days
    1. Re:Wew by sg_oneill · · Score: 2

      If you read the actual paper you'd know exactly how much confidence one can place. (Hint, its extremely high). 96% on a single tweet text read, up to over 99% once network , metadata and other factors are taken into account.

      6 CONCLUSIONS
      Given the prevalence of sophisticated bots on social media platforms such as Twitter, the need for improved, inexpensive bot detection methods is apparent. We proposed a novel contextual LSTM architecture allowing us to use both tweet content and metadata to
      detect bots at the tweet level. From a single tweet, our model can achieve an extremely high accuracy exceeding 96% AUC.
      We show that the additional metadata information, though a weak predictor of the nature of a Twitter account per se, when exploited by LSTM decreases the error rate by nearly 20%. In addition to this, we propose methods based on synthetic minority oversampling that yield a near perfect user-level detection accuracy
      (> 99% AUC).

      So how much should we distrust this? Unless that under 1% really upsets you, I'd say "Almost completely"

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
  2. Hmm by TFlan91 · · Score: 2

    How does this resolve the case of my political uncle posting extreme ideas every week or two.

    Anyone outside the family would rightfully think he's a bot. He isn't, he's just that uncle.

    The first amendment protections required for a system like this would make it far too cumbersome for practical use. Yea, Twitter is proving the opposite case with their manual interventions, but there must be a middle ground