MRI Software Bugs Could Upend Years Of Research (theregister.co.uk)
An anonymous reader shares a report on The Register: A whole pile of "this is how your brain looks like" MRI-based science has been invalidated because someone finally got around to checking the data. The problem is simple: to get from a high-resolution magnetic resonance imaging scan of the brain to a scientific conclusion, the brain is divided into tiny "voxels". Software, rather than humans, then scans the voxels looking for clusters. When you see a claim that "scientists know when you're about to move an arm: these images prove it", they're interpreting what they're told by the statistical software. Now, boffins from Sweden and the UK have cast doubt on the quality of the science, because of problems with the statistical software: it produces way too many false positives. In this paper at PNAS, they write: "the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results."
It's a matter of time before this happens with global warming, too. It's well known that the temperature record is adjusted, supposedly to remove biases. However, if you look at the unadjusted data, it fits the solar cycle perfectly, with temperatures declining over the past few decades, coinciding with solar dimming. The adjustment looks like a hockey stick, though, which can explain the entirety of the supposed warming. The National Climatic Data Center once had these figures on their website, though they've conveniently been removed. However, this is an example of how systematic errors can set an entire scientific field back many years. It's a matter of time before this happens with global warming, too.
The researchers used published fMRI results, and along the way they swipe the fMRI community for their “lamentable archiving and data-sharing practices” that prevent most of the discipline's body of work being re-analysed.
So the raw data isn't being saved so that someone else can independently verify the results. No checking the computers math, no checking the researchers settings on the machine. Just blanket trust for the people and the machine, and purging of any way of poking holes in someones findings. Even if this wasn't caused by a software bug the lack of archiving the raw dataset so that it can be rerun when software improvements are made is just infuriating.
I had a university level Statistics "professor" once tell me that I didn't need to know how my calculator created a box plot, etc etc because I could just use someone else's statistics library instead of writing my own. While in general I agree that there is no point in reinventing the wheel, I felt like I ought to learn how such things work.
I do a *ton* of statistical work in my day job, and if I were to write a book or teach a class, I would recommend two things:
1) Always look at the data
2) Always write your own functions
The reason for this has to do with the basic nature of statistics. If you make a mistake in normal software, the error is usually patently visible or benign. Often times the software works fine and does its job and the results are correct, even if it has bugs.
In statistics however, if you make a mistake the results get closer to "random". Statistics is fundamentally an attempt to extract information from data, and if you make a misstep then you get less information, which is equivalent to the data being closer to random. There is no way to tell whether the output is correct - it doesn't crash, it doesn't show an obvious flaw, it just didn't give you any information.
The second thing is to always look at the data.
Many, many, *MANY* theories and research papers make simple assumptions about the data which simply aren't true, and if you can look at the data (in an appropriate visualization), you can avoid some of these pitfalls.
Researchers do linear regression, when a quick glimpse of the data would tell them that it's a curve. Economists assume that if a tiny piece of a function looks linear, the entire function is linear. People do Principle Component Analysis on data that has multiple loci of causes. People use Expectation Maximization and "guess" the number and position of causes. People reverse the conditional.
The list is endless.
You can use someone else's library for mundane things which can be checked. Using a library for a box plot is fine - if it crashes or if the output doesn't *look* right, then use a different library.
For doing actual statistical work, you should *first* code your own functions. You'll get a marvellous hands-on insight and a little intuition about what the results should be.
Once you've done that, you can look at (ie - plot) the data and use your human brain to make a judgement.
Then use the big library. If it doesn't look right, you can investigate further.
It's a matter of time before this happens with global warming, too.
Well financed "skeptics" have been busting a gut for over 20yrs trying to prove your conspiracy theory, they have done nothing but bring the word "skeptic" into disrepute.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.