Reanalysis of Clinical Trials Finds Misleading Results
sciencehabit writes: Clinical trials rarely get a second look — and when they do, their findings are not always what the authors originally reported. That's the conclusion of a new study (abstract), which compared how 37 studies that had been reanalyzed measured up to the original. In 13 cases, the reanalysis came to a different outcome — a finding that suggests many clinical trials may not be accurately reporting the effect of a new drug or intervention. Moreover, only five of the reanalyses were by an entirely different set of authors, which means they did not get a neutral relook.
In one of the trials, which examined the efficacy of the drug methotrexate in treating systemic sclerosis—an autoimmune disease that causes scarring of the skin and internal organs—the original researchers found the drug to be not much more effective than the placebo, as they reported in a 2001 paper. However, in a 2009 reanalysis of the same trial, another group of researchers including one of the original authors used Bayesian analysis, a statistical technique to overcome the shortcomings of small data sets that plague clinical trials of rare diseases such as sclerosis. The reanalysis found that the drug was, as it turned out, more effective than the placebo and had a good chance of benefiting sclerosis patients.
In one of the trials, which examined the efficacy of the drug methotrexate in treating systemic sclerosis—an autoimmune disease that causes scarring of the skin and internal organs—the original researchers found the drug to be not much more effective than the placebo, as they reported in a 2001 paper. However, in a 2009 reanalysis of the same trial, another group of researchers including one of the original authors used Bayesian analysis, a statistical technique to overcome the shortcomings of small data sets that plague clinical trials of rare diseases such as sclerosis. The reanalysis found that the drug was, as it turned out, more effective than the placebo and had a good chance of benefiting sclerosis patients.
Isn't this generally know as The Decline Effect? It's not just clinical trials, it applies to almost everything (to varying degrees). It's also been interpreted as The Half-Life of Knowledge.
Now that is an interesting observation! Mostly, in science, when someone does an experiment that supposedly proves a theory, the next step is to document and publish every detailed step. Only when a number of peers have replicated the results can they be accepted with any confidence.
Yet in clinical trials of new drugs, it seems, only a single trial is ever done. How did that ever get accepted as proper scientific evidence?
I am sure that there are many other solipsists out there.
Almost had me there article! Until you said the most evil words known to man... "statistical technique". AKA "bullshit"
Bayesian statistics is far from bullshit.
I suggest you read up on it.
You can do some really cool stuff with it.
Testing if a coin flip is fair.
Correct images.
Filter spam
The problem is the Bayesian analysis is far from conclusive. What it does point to is that the clinical trial needs a larger sample size. Sample sizes that are too small are useless.
The problem is the Bayesian analysis is far from conclusive.
100% Wrong
What it tells you is the probability that your hypothesis is correct given your evidence and your prior knowledge.
They looked at reanalyses that had already been done for other reasons, rather than doing their own reanalyses on randomly selected trials. It occurs to me that these trials may have been subjected to reanalysis precisely *because* there were doubts about the initial analysis.
Bayesians need priors don't they? Where do they come from and what affect does the choice of prior have on the final outcome? I don't think Bayesian analysis is any more of a silver bullet than any other technique.
No, the GP is right. While BA gives you a probability distribution for the effectiveness, unless the effect is really strong (or you bad a really bad choice of priors), that distribution is going to be quite wide for a small data set. Such results are not proving that what you were testing was effective, but that there is a decent probability it might be effective given the knowledge you gain from the test, and that you should pursue a larger test. I've found it to be quite rare to have a BA result that strongly excludes a null hypothesis in a small scale test without having already been flagged as effective by simpler tests (i.e. the effects were so obvious, didn't require trying that hard to see).
This seems to highlight the reasons behind the All Trials movement: http://www.alltrials.net/
Since people interpret confidence intervals as credible intervals (and they are usually close to same for a uniform prior), any problem with "uninformed" bayesian analysis is shared by those using frequentist techniques in practice.
Anytime you re-analyze data you run into this.
Think about it. There are a million ways you can analyze any dataset. There are millions of datasets out there to analyze. There are millions of people who can independently decide to go back and do a re-analysis.
So, the issue is that if somebody goes back and does a re-analysis and the results are boring, nobody publishes. However, if the results are controversial, it gets published. Since there are so many permutations, you're guaranteed to find something exciting.
This is why you're supposed to establish your methods BEFORE you collect the data, and then stick to the methods you established to analyze the data. Otherwise your 95% confidence turns into a more realistic 1% confidence.
In practice, though, I'm sure the initial analyses are just as prone to this kind of problem. It just gets REALLY bad when you look backwards.
Frequentists vs. Bayesians
There are many things wrong with clinical trials, but this isn't one of them. Both the original article and the reanalysis use valid statistical procedures and do not contradict each other. The original analysis didn't prove absence of an effect, it merely failed to show the existence of an effect. The new analysis shows that the drug is, in fact, more effective under some (weak, reasonable) a priori assumptions.
Whether to use statistical hypothesis testing (frequentist methods) or Bayesian analysis is a long-running debate in statistics and medicine. Both techniques are mathematically valid. Statistical hypothesis testing makes fewer a priori assumptions, which is why people have traditionally trusted it more and why it is widely taught and used in science. But over the years that people have come to realize that pessimistic assumptions can be harmful, such as when you continue clinical trials too long or reject the use of life saving drugs. Although I personally think Bayesian methods are a better way of analyzing the data, I think the debate over which methods to use is the way scientific debate and change should happen: slowly and with careful re-analysis and re-examination of data and experimental results.
This is why the meta-research regarding the safety and efficacy of GMO food organisms is so important. We are constantly told by the industry and their online astroturf army that there are "thousands" of studies showing the safety of GMOs, but it turns out they're basically the same shallow study 2000 times. The most disturbing findings have come from the kind of reanalysis that this story describes. But now, those studies get shouted down because supposedly it's "settled science".
Whenever there's money involved, you have to be extra skeptical of terms like "settled science". Even if there's no obvious corruption involved, there's just too much energy and money pushing for a certain result (or perception) and it creates a kind of momentum. Yes, that includes climate change, though so far, the climate science has stood up to the copious additional scrutiny.
Good scientists aren't only skeptical about nature, they're also skeptical about themselves. We should follow their lead, not the way the Randi-style "pop-skeptics" have set up a tyranny of conventional wisdom, but by taking second and third looks.
Constructing rational biases, which is what a prior effectively is ("objective prior") isn't all that easy though, is it. There's no universal method for constructing a prior. It's a big source of potential error.
but with other studies they say researchers do have much incentive to redo trials/experiments. You don't make a name for yourself by just confirming someone else's work.
Let's compare two companies that depend on science - IBM and GlaxoSmithKline.
Let's say IBM discovers a new method of lithography for building microchips. They publish their results, and their results are replicated. More importantly, IBM gets a new, presumably better way of making microchips.
GlaxoSmithKline makes a new drug that treats a psychological illness. To some degree, because there are no objective physical tests for most psychological illnesses, the determination of effectiveness is made subjectively.
Both companies want the science to turn out right, because it makes them money. One of them has a much easier time massaging the results of any studies.
...but it's being eaten...by some...Linux or something...
A real shocker. As basically no profession has a good grip on statistics except specialized mathematicians, it is no surprise so many are wrong or misleading.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
But frequentist analyses aren't any more "objective," they just hide biases from view and include inductive biases that aren't even rationally compatible with any consistent state of belief.
With Bayesian analysis your starting point is out in the open and must be justifiable and defensible; analysts are accountable for their priors.
You can also, of course, examine what would follow from several different priors. This is much more straightforward than trying to shake the hidden biases in a frequentist model.
Try implementing that in practice on actual analysis of data.
I do it every day. Works perfectly.
An infinitely wide distribution function literary mean 1. F(x) = 1.
Do you need an example.
I have some data
x y
0.188334 2.08939
0.400133 2.26874
0.723409 2.31389
0.172104 2.00783
0.430118 2.28716
0.245059 2.03828
0.0494027 2.04421
0.221342 2.15249
0.911822 2.4686
0.461583 2.24511
I have a hypothesis that this data is linear. y = m*x + b. I want to know that probability that the parameters of m and b explain my data. I know nothing about m and b so I will set P(m)=1 and P(b)=1. Any value of m from -Inifinity to Infinity is equally probable. Any value of b from -Inifinity to Infinity is equally probable.
I want to find the probability that a line with a value of m and value of b fits my data. I need a probability distribution function that my data is explained by my evidence.
P(x_i,y_i|m,b) = Exp(-(y_i - (m*x_i + b))^2/0.01^2) A Gaussian distribution with sigma for ~10% cause my data has noise in it.
Now my
P(m,b|x_1,y_1,x_2,y_2,...x_n,y_n)=Product[Exp(-(y_i - (m*x_i + b))^2/0.01^2)]*1*1=Product[Exp(-(y_i - (m*x_i + b))^2/0.01^2)]
Look at that my uniform prior worked perfectly. At the point where P(m,b|x_1,y_1,x_2,y_2,...x_n,y_n) is maximum, a line with corresponding values of m and b best describes my data.
There's no universal method for constructing a prior. It's a big source of potential error.
Don't they just have to go into an monastery or something like that?
Faster! Faster! Faster would be better!
Uniform probability in what scale, though? Performing a Bayesian analysis with a uniform prior will generally give different results than, say, using a log scale on the dependent variable(s) and choosing a uniform prior on *that* scale. The Jeffreys prior provides a method of computing a non-informative prior that is invariant under re-parameterization, but is generally difficult to work with, and is never a uniform prior. So yeah, the concept of an uninformative prior is more complicated than "just use a uniform distribution", and analysts need to be especially careful with priors used with small sample sizes!
The problem is the Bayesian analysis is far from conclusive. What it does point to is that the clinical trial needs a larger sample size. Sample sizes that are too small are useless.
Conversely if the sample size is too large, the cost of trials, typically a few to a few tens of millions of dollars, will go up significantly. The reflex answer to that is "so what, Big Pharma can pay for it!", except that many new therapies are developed by small companies, especially drugs for conditions with a small patient sample. For example, the condition used as an example, systemic sclerosis, has an incidence of about 1 in 100,000 in the US; it's estimated about 50,000 people in the US suffer from it. I've seen clinical trials with a 46 patient sample size cost upwards of $10M, because the condition is small in incidence so even just finding patients willing to participate is very tricky. If you upped that number to 450 patients, the cost for the trials would not justify developing a drug helping these people; the developers would lose money and these people are left to suffer. So there's a bit of a balancing act that is not just statistical significance, but also cost/benefit.
Almost had me there article! Until you said the most evil words known to man... "statistical technique". AKA "bullshit"
--RAH
No, no, you're not thinking; you're just being logical. --Niels Bohr
Please push a serious statistics course through the throat of medicine students, that would benefit everyone health!
There have been recent cries for reproducible results in science.
The scope is too limited.
There should be a cry for reproducible results in any research prior to its publication.
Long and short of it for researchers: if only you can get the results and conclusions, then the results and conclusions are not publishable.
"Consensus" in science is _always_ a political construct.