Many Machine Learning Studies Don't Actually Show Anything Meaningful, But They Spread Fear, Uncertainty, and Doubt (theoutline.com)

Posted by msmash on Friday September 15, 2017 @04:02AM from the reality-check dept.

Michael Byrne, writing for the Outline: Here's what you need to know about every way-cool and-or way-creepy machine learning study that has ever been or will ever be published: Anything that can be represented in some fashion by patterns within data -- any abstract-able thing that exists in the objective world, from online restaurant reviews to geopolitics -- can be "predicted" by machine learning models given sufficient historical data. At the heart of nearly every foaming news article starting with the words "AI knows ..." is some machine learning paper exploiting this basic realization. "AI knows if you have skin cancer." "AI beats doctors at predicting heart attacks." "AI predicts future crime." "AI knows how many calories are in that cookie." There is no real magic behind these findings. The findings themselves are often taken as profound simply for having way-cool concepts like deep learning and artificial intelligence and neural networks attached to them, rather than because they are offering some great insight or utility -- which most of the time, they are not.

1 of 98 comments (clear)

Min score:

Reason:

Sort:

Re:It really is like human intelligence. by hey! · 2017-09-15 05:12 · Score: 3, Interesting

The usual methodology for training is you start with a big sample of data and you randomly divide the data into records into two subsets; the first you use to train the model and the second you use to test the results of the training.
If there is no statistical difference between your training and testing groups, a better-than-random performance on the test data indicates that your algorithm actually learned something about the original universe of data. At that point you have the same problem you always have in statistics when you try to use your results: is some set of data you encounter in the wild so to speak really comparable to the data you build the model on?
One of the advantages of regression learning is that a classification your model produces is rebuttable. This is very important in a world where some courts are using proprietary software in sentencing to classify people by how likely they are to re-offend.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.