Slashdot Mirror


MRI Software Bugs Could Upend Years Of Research (theregister.co.uk)

An anonymous reader shares a report on The Register: A whole pile of "this is how your brain looks like" MRI-based science has been invalidated because someone finally got around to checking the data. The problem is simple: to get from a high-resolution magnetic resonance imaging scan of the brain to a scientific conclusion, the brain is divided into tiny "voxels". Software, rather than humans, then scans the voxels looking for clusters. When you see a claim that "scientists know when you're about to move an arm: these images prove it", they're interpreting what they're told by the statistical software. Now, boffins from Sweden and the UK have cast doubt on the quality of the science, because of problems with the statistical software: it produces way too many false positives. In this paper at PNAS, they write: "the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results."

17 of 95 comments (clear)

  1. That's a Crappy Summary by damn_registrars · · Score: 5, Informative

    The research is on fMRI - the F stands for Functional. As it mentions later in the summary this is used to try to associate regions of the brain with specific functions. This is not the same as the structure of the brain itself. What we see in terms of actual brain structures - folds, regions, etc, is still very much valid. We're just not so sure about the functional assignments that we've held on to for a while now.

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
    1. Re:That's a Crappy Summary by JoeMerchant · · Score: 3, Funny

      Whenever I've read an fMRI "research" paper, it seems like the f should be standing for "full of ____", because the sample sizes are laughably small, the data are fuzzy and interpreted with a lot of handwaving, and the correlation between oxygen uptake and the fMRI signal itself is very weak, finally somebody has gotten around to calling BS on the whole field.

    2. Re:That's a Crappy Summary by damn_registrars · · Score: 2

      assuming of course that you meant reproducible and not reproductible

      Sorry, that was a typo. :)

      I figured as much, but thought I'd check in case you are involved in (or want to recruit others to partake in) some sort of cutting-edge HVAC research.

      --
      Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
  2. Re:Uhm by XXongo · · Score: 2
    Nope. Thinking about moving your arm is in one part of the brain. Actually moving your arm is in a different part of the brain.

    It's like in a robot, if you're looking at electrical signals in the servo motor controller, it doesn't matter whether there are signals in the processor core.

  3. This kind of thing is way too common in science by Anonymous Coward · · Score: 5, Interesting

    A friend of mine as worked in the social sciences (cue /. laughter, but bear with me) and they were forced by the university to use a closed source statistical package for all their data processing. So anyway, she got some really dubious results and she preferred to do her own maths, so she did, and lo! completely different results. That was the start of a research project which concluded that the closed source package contained a rounding error that basically filtered all minorities out of the data set, which is kind of sad if you're doing research on minorities.
    People trust their software too much, are too lazy to do their own maths, don't really want to have got anything to do with data processing even though that's their job, and universities force bad software on their employees. This is an institutional problem that goes way beyond MRI research.

    1. Re:This kind of thing is way too common in science by jittles · · Score: 2

      A friend of mine as worked in the social sciences (cue /. laughter, but bear with me) and they were forced by the university to use a closed source statistical package for all their data processing. So anyway, she got some really dubious results and she preferred to do her own maths, so she did, and lo! completely different results. That was the start of a research project which concluded that the closed source package contained a rounding error that basically filtered all minorities out of the data set, which is kind of sad if you're doing research on minorities. People trust their software too much, are too lazy to do their own maths, don't really want to have got anything to do with data processing even though that's their job, and universities force bad software on their employees. This is an institutional problem that goes way beyond MRI research.

      I had a university level Statistics "professor" once tell me that I didn't need to know how my calculator created a box plot, etc etc because I could just use someone else's statistics library instead of writing my own. While in general I agree that there is no point in reinventing the wheel, I felt like I ought to learn how such things work.

    2. Re:This kind of thing is way too common in science by DarthVain · · Score: 2

      Problem with closed source and science.

      Similarly there was a court case in Florida where people were suspicious of Breathalyzer results. Police use one produced by a company with closed source code. Court ordered them to open it up for inspection. They tried the "Trade Secrets" argument and refused. Court disagreed and starting fining them every day until they release the code. Once they did it was found to be horrible, and inaccurate, invalidating thousands of court cases... As it turned out they knew it was terrible, used it anyway, and was just trying to hide the fact that they were giving incorrect results much of the time for profit.

  4. Probably will happen in other science fields, too by Anonymous Coward · · Score: 2, Insightful

    It's a matter of time before this happens with global warming, too. It's well known that the temperature record is adjusted, supposedly to remove biases. However, if you look at the unadjusted data, it fits the solar cycle perfectly, with temperatures declining over the past few decades, coinciding with solar dimming. The adjustment looks like a hockey stick, though, which can explain the entirety of the supposed warming. The National Climatic Data Center once had these figures on their website, though they've conveniently been removed. However, this is an example of how systematic errors can set an entire scientific field back many years. It's a matter of time before this happens with global warming, too.

  5. Issue is likely overstated by daenris · · Score: 5, Informative

    The paper has been available as a preprint for awhile now, and my lab has discussed it internally and I've also paid attention to outside coverage. The key issue that the paper reports is that false positive rates are two high for most existing software WHEN using a specific type of test under a specific set of conditions. They show that voxelwise familywise error (FWE) correction actually seems to work reasonably or even conservatively. Cluster level FWE correction (looking for groups of voxels that are active) fails when using a very liberal cluster-defining threshold, but works reasonably well when using a more stringent cluster defining threshold. It also says nothing about the performance of another very common correction method that is frequently used in fMRI studies (false discovery rate or FDR).

    I'm not really sure how extensive the group of findings that these issues actually affect is, but it's certainly not 40,000 as is claimed in the paper's significance section. Many of the earlier papers (and even more recent) likely used uncorrected statistical tests, so are suspect for entirely different reasons from this issue. Of the ones that use correction, the findings in this paper only call into question the results for those that are using FWE cluster correction with a cluster defining threshold that is too liberal (likely > 0.001, the paper's findings suggest that at 0.001 the familywise error rate is in the ballpark of the desired 5%). Those using a cluster defining threshold of p=0.001 or lower are likely fine, and those using a different correction method like FDR are unknown as to my knowledge there isn't currently any similar paper on that correction method.

    You can also check out this technical report by some other big names in imaging that basically says that this result is known and expected for overly liberal cluster defining thresholds:
    http://www.fil.ion.ucl.ac.uk/s...

    1. Re:Issue is likely overstated by TapeCutter · · Score: 2

      Thank you, comments like yours are the reason I still come here.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  6. Re:Aha! by daenris · · Score: 2

    All of the software packages tested in the article (AFNI, FSL, SPM) are open source, including the package the authors built to do massively parallel non-parametric permutation tests (BROCCOLI).

  7. The Last Part is Important by medv4380 · · Score: 4, Insightful

    The researchers used published fMRI results, and along the way they swipe the fMRI community for their “lamentable archiving and data-sharing practices” that prevent most of the discipline's body of work being re-analysed.

    So the raw data isn't being saved so that someone else can independently verify the results. No checking the computers math, no checking the researchers settings on the machine. Just blanket trust for the people and the machine, and purging of any way of poking holes in someones findings. Even if this wasn't caused by a software bug the lack of archiving the raw dataset so that it can be rerun when software improvements are made is just infuriating.

    1. Re:The Last Part is Important by guruevi · · Score: 3, Informative

      That is because MRI data (at least in the US) is protected by HIPAA. You can reconstruct enough identifiable features from raw data plus you have to record quite a number of other features (age, weight etc. for radiation calculations) that almost all MRI data falls under HIPAA when it comes to redistributing the raw data. If you strip all that out (skull stripping, DICOM anonymize), it's no longer raw data AND it becomes very hard to distinguish things like image orientation.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
  8. Planning vs effecting by DrYak · · Score: 2

    The claim is quite dubious in that it seems to suggest that scientists know someone is going to move their arm before the person does, simply from reading MRI images.

    Not MRI image.
    But f MRI images (where f = "functional")
    In a nutshell, those image are based around the fact that hemoglobin loaded with oxygen interacts and distorts the magnetic field differently than hemoglobin which has discarded its oxygen.
    By measure these signal differences, it's possible to infer where there's more oxygen consumption, and from there try to guess which parts of the brain are working more (and thus consuming more oxygen).

    Spatial resolution of such image is "not so great" (blurier thant the brain anatomy visible on the brain itself), but still acceptable.
    Reach is very good (you can see the whole inside of the skull).
    Signal strength is very weak (very subtle variation, meaning lot's of noise).
    Temporal resolution is very poor (to begin with all MRI images take a lot of time to take, and then there's the problem is that you're not measuring brain activity directly, but you're inferring it from its indirect effect on the local blood flow).
    Still it's a useful tool under some circumstances.

    Compare it with other tools, like measuring electric (EEG) or magnetic activity from ther outside:
    Spatial resolution is absolutely shitty (you must infer what's happening from a few points scattered across the surface of the scalp)
    Reach isn't deep at all (you mostly see what's happening on the surface. Deep brain structures are too deep to be visible).
    Temporal resolution is amazing (you can measure the direct electrical output ms after ms)

    The best tool it still open-skull surgeries (using electrodes directly on the brain to measure activities or to very precisely stimulate some area), but they are a rare commodity (= you can only find volunteers to enroll into your studies among people getting brain surgery to remove tumors).
    The second best tool is the clinical description of psychiatric damage experienced by people who where victim of accident where their brain was damaged.

    EEG and fMRI are coarser tools, but much easier to setup.

    In addition to that anatomists and histologist have had tons of other tools to explore the anatomy and connections of the brain.
    (regular MRI, dissections of cadavers, study of some virus which climb along the nerves, some freezing-/cracking- based special technique of dissection, some special type of diffusion-MRI, etc. )

    I find that hard to believe except possibly in some limited cases.

    The whole central nervous system works in stage, from very low level (nerves controlling muscles or nerves fed by receptors) all the way to high-level (processing complex information).

    Most of the low-level (i.e.: most of the body, except the eyes, ears and a few other head organs) is connected to a region in the mid of the brain, roughly around where the head band of your headphone goes.
    Except for a few preprocessing done in the spine (or in the upper layers of the retina in eyes) the signal is very close to raw (1 point of connection = 1 information about a small group of receptor. Like an edge).

    Everything behind this "headband" handle signal input and perception. And the more you get away to the point where nervous tracts connects to the cortex, the more integration and convolution is done with the signal (from edges to shapes to objects like "face recognition") and combination with other signals (associative region, which aren't specific to a single sense and can't be pinpointed down to a precise simple role).

    Everything in front of this "headband" handle the signal output and motor control. It has the same overall organisation: the more you move to the front away from the "headband", the more the processing is "high-level" and "multimodal" and handles high level functi

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  9. Two basic rules of statistics by Okian+Warrior · · Score: 2, Insightful

    I had a university level Statistics "professor" once tell me that I didn't need to know how my calculator created a box plot, etc etc because I could just use someone else's statistics library instead of writing my own. While in general I agree that there is no point in reinventing the wheel, I felt like I ought to learn how such things work.

    I do a *ton* of statistical work in my day job, and if I were to write a book or teach a class, I would recommend two things:

    1) Always look at the data
    2) Always write your own functions

    The reason for this has to do with the basic nature of statistics. If you make a mistake in normal software, the error is usually patently visible or benign. Often times the software works fine and does its job and the results are correct, even if it has bugs.

    In statistics however, if you make a mistake the results get closer to "random". Statistics is fundamentally an attempt to extract information from data, and if you make a misstep then you get less information, which is equivalent to the data being closer to random. There is no way to tell whether the output is correct - it doesn't crash, it doesn't show an obvious flaw, it just didn't give you any information.

    The second thing is to always look at the data.

    Many, many, *MANY* theories and research papers make simple assumptions about the data which simply aren't true, and if you can look at the data (in an appropriate visualization), you can avoid some of these pitfalls.

    Researchers do linear regression, when a quick glimpse of the data would tell them that it's a curve. Economists assume that if a tiny piece of a function looks linear, the entire function is linear. People do Principle Component Analysis on data that has multiple loci of causes. People use Expectation Maximization and "guess" the number and position of causes. People reverse the conditional.

    The list is endless.

    You can use someone else's library for mundane things which can be checked. Using a library for a box plot is fine - if it crashes or if the output doesn't *look* right, then use a different library.

    For doing actual statistical work, you should *first* code your own functions. You'll get a marvellous hands-on insight and a little intuition about what the results should be.

    Once you've done that, you can look at (ie - plot) the data and use your human brain to make a judgement.

    Then use the big library. If it doesn't look right, you can investigate further.

  10. Re:Probably will happen in other science fields, t by TapeCutter · · Score: 3, Insightful

    It's a matter of time before this happens with global warming, too.

    Well financed "skeptics" have been busting a gut for over 20yrs trying to prove your conspiracy theory, they have done nothing but bring the word "skeptic" into disrepute.

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  11. Sounds like confirmation-bias by gweihir · · Score: 2

    I.e. people seeing what they expecting to see, not what is there. With the huge egos, (but not nearly as large skills) in people doing Neuro-"Science" these days, I am entirely unsurprised. The grand claims about what they know and how things work have been a dead giveaway for years. Things are not that simple in practice.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.