an alias is not an fake account, is an alias, if you have 200 aliases in Slashdot, 199 are probably fake accounts... or not, that's the problem here, how you identify a fake user...
this not always work but is very usfeul, Facebook does a good job at it (that's a use of the face detection algorithm they have), but, for example, many fake facebook accounts use pictures from VK and are not in FB, so they can't detect it. Twitter simply does nothing, they let anyone steal pictures and names from real users.
I started finding bots on twitter since a few years, first as a hobby, but then i write some code and start to find patterns. I even ended in the local news because my findings. The bots are evolving because the bot creators need to keep them alive and working more and more, there is a huge business and it gives a lot of money.
My actual software has a catalog of more than 50k users with political affiliations (from Argentina), some 10k fake accounts, and fake accounts are more important than bots. The problem is: a bot is detectable because it follows predictable patterns, but a fake account used by a human is... very human like. So you can't detect it, is not so obvious, if you say something to them they answer you, and is a real human there.
Fake accounts are the real problem, so my research moved from bots to fakes, still capturing bots (easy part), but identifying fakes is the most hard job here.
And i'm only working with Argentina accounts because we have a very active political twitter bubble, and because Twitter has limits in it's API, i think if i move to a bigger country the thing will be amazingly huge. My actual database has 10GB worth of tweets, many of them a nice feed for Machine Learning, my next development:P
sorry for my limited english;)
THIS is one of the main problems: the noise
an alias is not an fake account, is an alias, if you have 200 aliases in Slashdot, 199 are probably fake accounts... or not, that's the problem here, how you identify a fake user...
this not always work but is very usfeul, Facebook does a good job at it (that's a use of the face detection algorithm they have), but, for example, many fake facebook accounts use pictures from VK and are not in FB, so they can't detect it. Twitter simply does nothing, they let anyone steal pictures and names from real users.
I started finding bots on twitter since a few years, first as a hobby, but then i write some code and start to find patterns. I even ended in the local news because my findings. The bots are evolving because the bot creators need to keep them alive and working more and more, there is a huge business and it gives a lot of money. My actual software has a catalog of more than 50k users with political affiliations (from Argentina), some 10k fake accounts, and fake accounts are more important than bots. The problem is: a bot is detectable because it follows predictable patterns, but a fake account used by a human is... very human like. So you can't detect it, is not so obvious, if you say something to them they answer you, and is a real human there. Fake accounts are the real problem, so my research moved from bots to fakes, still capturing bots (easy part), but identifying fakes is the most hard job here. And i'm only working with Argentina accounts because we have a very active political twitter bubble, and because Twitter has limits in it's API, i think if i move to a bigger country the thing will be amazingly huge. My actual database has 10GB worth of tweets, many of them a nice feed for Machine Learning, my next development :P
sorry for my limited english ;)