Slashdot Mirror


Deep Neural Networks for Bot Detection (arxiv.org)

From a research paper on Arxiv: The problem of detecting bots, automated social media accounts governed by software but disguising as human users, has strong implications. For example, bots have been used to sway political elections by distorting online discourse, to manipulate the stock market, or to push anti-vaccine conspiracy theories that caused health epidemics. Most techniques proposed to date detect bots at the account level, by processing large amount of social media posts, and leveraging information from network structure, temporal dynamics, sentiment analysis, etc. In this paper [PDF], we propose a deep neural network based on contextual long short-term memory (LSTM) architecture that exploits both content and metadata to detect bots at the tweet level: contextual features are extracted from user metadata and fed as auxiliary input to LSTM deep nets processing the tweet text.

17 of 39 comments (clear)

  1. NO COLLUSION! more than 1 time in a thought.. by Anonymous Coward · · Score: 1

    That should be a pretty high ranking flag in the algorithm seed data.

  2. Wew by negRo_slim · · Score: 4, Insightful

    Putting way to much confidence in bots ability to do any of those things listed in the summary.

    --
    On the Oregon Cost born and raised, On the beach is where I spent most of my days
    1. Re:Wew by sg_oneill · · Score: 2

      If you read the actual paper you'd know exactly how much confidence one can place. (Hint, its extremely high). 96% on a single tweet text read, up to over 99% once network , metadata and other factors are taken into account.

      6 CONCLUSIONS
      Given the prevalence of sophisticated bots on social media platforms such as Twitter, the need for improved, inexpensive bot detection methods is apparent. We proposed a novel contextual LSTM architecture allowing us to use both tweet content and metadata to
      detect bots at the tweet level. From a single tweet, our model can achieve an extremely high accuracy exceeding 96% AUC.
      We show that the additional metadata information, though a weak predictor of the nature of a Twitter account per se, when exploited by LSTM decreases the error rate by nearly 20%. In addition to this, we propose methods based on synthetic minority oversampling that yield a near perfect user-level detection accuracy
      (> 99% AUC).

      So how much should we distrust this? Unless that under 1% really upsets you, I'd say "Almost completely"

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    2. Re: Wew by Sique · · Score: 1

      We will get a bot arms race, where bots are fighting bots, and we humans get left alone. So all back to normal, except for trillions of CPU cycles wasted.

      --
      .sig: Sique *sigh*
    3. Re:Wew by IamTheRealMike · · Score: 1

      We should distrust it completely, as the paper gives no examples of any of the tweets or accounts they classified as being "bots". None whatsoever. Lots and lots of stats about their model and many implausible claims of it being perfect, but nothing that could be used to actually verify their claims.

      Indeed their claims are completely implausible. Extraordinary claims require extraordinary evidence and they provide none.

    4. Re: Wew by HiThere · · Score: 1

      Well...not exactly back to normal. The faker bots will improve their ability to fool people into thinking they're real.

      Also, the intent is probably impossible for even a superhuman AI to accomplish (except by judging something like volume of posts, which ordinary recipients don't have access to). A twitter post often doesn't contain enough information to decide whether it was posted by a human or by a bot. As the faker bots improve, they'll be able to handle longer segments of connected text, and possibly to ever respond reasonably. (Eliza, Parry, Doctor, etc. show that ordinary human responses are often shallow enough to easily fake. And Eliza was *supposed* to be a counter-example.)

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    5. Re:Wew by sg_oneill · · Score: 1

      We should distrust it completely, as the paper gives no examples of any of the tweets or accounts they classified as being "bots". None whatsoever. Lots and lots of stats about their model and many implausible claims of it being perfect, but nothing that could be used to actually verify their claims.

      Or alternatively you could have read the paper and seen that it used the Cresci/De Pietro/Petrocchi/et..al dataset which is publically available and has been for a while now.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
  3. Hmm by TFlan91 · · Score: 2

    How does this resolve the case of my political uncle posting extreme ideas every week or two.

    Anyone outside the family would rightfully think he's a bot. He isn't, he's just that uncle.

    The first amendment protections required for a system like this would make it far too cumbersome for practical use. Yea, Twitter is proving the opposite case with their manual interventions, but there must be a middle ground

    1. Re:Hmm by DNS-and-BIND · · Score: 1

      Then he gets wrongfully accused and his rantings stop. There's no great loss to civilization. It's not worth it to let 100 guilty men go free than accuse a single innocent. Those are Enlightenment values - the same ones that created racism and justified slavery. They're as yesterday's news as your uncle.

      --
      Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    2. Re:Hmm by cascadingstylesheet · · Score: 1

      How does this resolve the case of my political uncle posting extreme ideas every week or two.

      Anyone outside the family would rightfully think he's a bot. He isn't, he's just that uncle.

      The first amendment protections required for a system like this would make it far too cumbersome for practical use. Yea, Twitter is proving the opposite case with their manual interventions, but there must be a middle ground

      This is all about squelching unapproved opinions. Can't have people (or bots) "disparaging" Hillary Clinton, for example. We indict people for that now.

  4. Re:wow by Sir+Lurkalot · · Score: 1

    Interesting...

  5. End result- really good bots. by Maxo-Texas · · Score: 1

    Antagonistic neural networks improves the quality of both networks.

    The detector will get better and the fake will get better. Quickly.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  6. cowboyneal exposed by SchroedingersCat · · Score: 1

    I always suspected that about CowboyNeal. Now we will know the truth.

  7. Misleading definition by CustomSolvers2 · · Score: 1

    detecting bots, automated social media accounts governed by software but disguising as human users

    The expression "bot" is used to describe a wide variety of software applications, not just those emulating people in social media. In fact, the most common bots are the ones used by a big number of sites to retrieve information from internet for different purposes (e.g., search engines retrieving what they are showing to their users); they are also called crawlers or spiders. Here you can find a detailed list of active ones (I am the proud father of one of them :)).

    So, a better version of the summary would have been:

    detecting the social media bots disguised as human users

    --
    Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
    1. Re:Misleading definition by CustomSolvers2 · · Score: 1

      It is short for 'robot'. It is used for physical robots too

      Sure. I meant that, even within this specific context of software/internet, that expression is commonly used for much more than just the referred malware-like subtype.

      --
      Custom Solvers 2.0 = Alvaro Carballo Garcia = varocarbas.
  8. Deep Neural Networks for Bot Detection Evasion by aglider · · Score: 1

    Easy, isn't it?

    --
    Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
  9. Overkill by An+dochasac · · Score: 1

    You don't need deep neural networks when this will do:

    egrep 'MAGA|NO COLLUSION||FAKE NEWS|LIBTARD' > /russian_bots.txt