Machine Learning Susses Out Social-Network Fraud
CowboyRobot writes "Machine learning techniques can be used to detect fraud and spies on social networks based on certain features, such as the number of followers and the number of devices used to access the network. Certain characteristics of social-network accounts have a high correlation with fraud and can be used to differentiate between real and fake accounts, a researcher presenting at the SOURCE Boston Conference said this week. Using machine learning techniques, Vicente Diaz, a senior security analyst with security software firm Kaspersky Lab, found that seven characteristics of Twitter profiles could identify fraudulent accounts 91% of the time. The number of devices from which a user accesses the service, the ratio of followers to people following an account, the average number of tweets to each person, and the number of tweets to an unknown receiver are all features that correlate strongly to fraudulent accounts, he says."
In related news, social network machine learning fraud bots get algorithm update based on current fraud detection algorithms.
AC raises a good point: people are pretty good at ignoring spam. I just ignored it. Is this a really big problem on social network sites? The article says somewhere between 9 and 20% of user accounts on facebook are for spam. Who the hell is adding random people as friends they've never heard of before, then can't tell spam from actual communication?
My guess is this is annoying for facebook and advertising firms who are paying money for sanctioned spamming, and they want to make sure they're not advertising to spam accounts. I mean, companies are, I guess, dropping serious money on their social media pages and accounts. To find out the only people who are following those accounts are other advertisers must really make them stop and wonder what the hell they're doing. Hilarious.
found that seven characteristics of Twitter profiles could identify fraudulent accounts 91% of the time.
Taking the 91% number as accurate for argument's sake, what are the false positive and false negative rates? Even a 1% false positive or false negative rate would be quite a lot of accounts when you consider how many millions of twitter accounts there are out there.
Most of the information I put on my facebook account is noise. I didn't really attend 10 different universities, speak 15 different languages, or was born in that other country.
The only people that care about this are marketers. But even then, does it matter if the account is real or not? I haven't seen any good evidence that social marketing can directly relate to in-store or online purchases. Its all a scam.
Fix this crap, slashdot!
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
"So I would be a fraud if I had a facebook account."
Precisely. There are several things wrong with trying to actually use this in the real world.
(1) 91% is not nearly good enough. Period.
(2) Even if it were 99.9% accurate, it would still not be good enough. Because it runs into the base rate fallacy.
(3) Similar but not related to the base rate fallacy, is that a statistical correlation between datasets of millions says nothing about an individual account.