Poison Attacks Against Machine Learning
mikejuk writes "Support Vector Machines (SVMs) are fairly simple but powerful machine learning systems. They learn from data and are usually trained before being deployed. SVMs are used in security to detect abnormal behavior such as fraud, credit card use anomalies and even to weed out spam. In many cases they need to continue to learn as they do the job and this raised the possibility of feeding it with data that causes it to make bad decisions. Three researchers have recently demonstrated how to do this with the minimum poisoned data to maximum effect. What they discovered is that their method was capable of having a surprisingly large impact on the performance of the SVMs tested. They also point out that it could be possible to direct the induced errors so as to produce particular types of error. For example, a spammer could send some poisoned data so as to evade detection for a while. AI based systems may be no more secure than dumb ones."
Why the hell is the only link in the summary to that rather useless "I Programmer" website? The summary here at Slashdot is basically the content of the entire linked "article"!
Here is a much more useful link for anyone interested in reading the actual paper: http://arxiv.org/abs/1206.6389v1
Universities should run a number of psychology experiments to see how this can be done to human intelligence to see how susceptible it is compared to AI. Or you could just study people who tune in to .
On this side of the human / AI line, we call this propaganda. It has historically proved very effective, specially if you can control all of the "training data."
There's already a whole subfield of machine learning which concern itself with these problems. It's called "adversarial machine learning".
The approaches are very different from usual software security. Instead of busying oneself with patching holes in software or setting up firewalls, adversarial machine learning re-design the algorithms completely, using game theory and other techniques. The premise is "How can we make an algorithm that works in an environment full of enemies that try to mislead it?" It's a refreshing change from the usual software-security paradigm, which is all about fencing the code into some supposedly 'safe' environment.
I have this mental image that in the future not everyone will be able to pass as human (i.e., routinely solve captchas), and the ones who can may be able to rent out that service to those who can't.
So if you know the algorithm and training data, and you can feed the system new data with manipulated labels then you can confuse it. It's a little early to panic about your spam filter. Hopefully everyone realizes that if you let the spammers tell your computer what is and is not spam, they can cause it to let their spam through.
Support Vector Machines are just a way of performing unsupervised data partitioning/clustering. i.e. you feed a bunch of data vectors into the algorithm and it determines how to split the data into a number of clusters where the members of each cluster are similar to each other and less similar to members of other clusters.
e.g. you feed it (number of wheels, weight) pairs of a lot of vehicles and it might automatically split the data into 3 clusters - light 2-wheeled vehicles, heavy 4-wheeled ones, and very heavy 4-wheeled ones. If you then labelled these clusters as "bikes", "cars" and "trucks" you could in the future use the clustering rules to determine the category a new data point falls into.
This isn't Artificial Intelligence - it's just a data mining/classification technique.
From the article, if you have access to the training data and know the learning algorithm, you can game the machine learning (SVM,not AI) system. How is that anything but self-evident, non-news?!
"Consensus" in science is _always_ a political construct.
I have this mental image that in the future not everyone will be able to pass as human (i.e., routinely solve captchas), and the ones who can may be able to rent out that service to those who can't.
The good thing is that us non-humans can then travel all around the world really cheap. I, personally, belong in healthcare products as a natural Fleshlight-substitute!
Stop talking about how easy it is to poison data collection efforts; you're going to kill the golden goose of those who insist that analyzing social data can allow you to pinpoint psychopaths and other "problematic" individuals before that goose ever takes to the air (on the wings of "black budget" funding, no doubt).
Orwell: "In a Time of Universal Deceit, telling the Truth is a Revolutionary Act"
Comment removed based on user account deletion