Slashdot Mirror


Misleading Results From Widely-Used Machine-Learning Data Analysis Techniques (bbc.com)

Long-time Slashdot reader kbahey writes: The increased reliance on machine-learning techniques used by thousands of scientists to analyze data, is producing results that are misleading and often completely wrong, according to the BBC.

Dr. Genevera Allen from Rice University in Houston said that the increased use of such systems was contributing to a "crisis in science".

She warned scientists that if they didn't improve their techniques they would be wasting both time and money. Her research was presented at the American Association for the Advancement of Science in Washington.


This is the oft-discussed 'reproducibility problem' in modern science.

The BBC writes that this irreproducibility happens when experiments "aren't designed well enough to ensure that the scientists don't fool themselves and see what they want to see in the results." But machine learning now has apparently become part of the problem.

Dr. Allen asks "If we had an additional dataset would we see the same scientific discovery or principle...? Unfortunately the answer is often probably not.â

1 of 23 comments (clear)

  1. Not surprised by Anonymous Coward · · Score: 4, Interesting

    I worked as a ML researcher in a science lab. Was often asked for results they wanted rather than good methodology, which I pushed back hard on, but the lab frequently contracted out analysis and then chose which results they liked best for publication. They got a few publications in Nature. Don't trust the ML results of any science paper unless they fully present and you understand their data, methodology, and statistics, and even then take things with a grain of salt.