Reanalysis of Clinical Trials Finds Misleading Results
sciencehabit writes: Clinical trials rarely get a second look — and when they do, their findings are not always what the authors originally reported. That's the conclusion of a new study (abstract), which compared how 37 studies that had been reanalyzed measured up to the original. In 13 cases, the reanalysis came to a different outcome — a finding that suggests many clinical trials may not be accurately reporting the effect of a new drug or intervention. Moreover, only five of the reanalyses were by an entirely different set of authors, which means they did not get a neutral relook.
In one of the trials, which examined the efficacy of the drug methotrexate in treating systemic sclerosis—an autoimmune disease that causes scarring of the skin and internal organs—the original researchers found the drug to be not much more effective than the placebo, as they reported in a 2001 paper. However, in a 2009 reanalysis of the same trial, another group of researchers including one of the original authors used Bayesian analysis, a statistical technique to overcome the shortcomings of small data sets that plague clinical trials of rare diseases such as sclerosis. The reanalysis found that the drug was, as it turned out, more effective than the placebo and had a good chance of benefiting sclerosis patients.
In one of the trials, which examined the efficacy of the drug methotrexate in treating systemic sclerosis—an autoimmune disease that causes scarring of the skin and internal organs—the original researchers found the drug to be not much more effective than the placebo, as they reported in a 2001 paper. However, in a 2009 reanalysis of the same trial, another group of researchers including one of the original authors used Bayesian analysis, a statistical technique to overcome the shortcomings of small data sets that plague clinical trials of rare diseases such as sclerosis. The reanalysis found that the drug was, as it turned out, more effective than the placebo and had a good chance of benefiting sclerosis patients.
Isn't this generally know as The Decline Effect? It's not just clinical trials, it applies to almost everything (to varying degrees). It's also been interpreted as The Half-Life of Knowledge.
Now that is an interesting observation! Mostly, in science, when someone does an experiment that supposedly proves a theory, the next step is to document and publish every detailed step. Only when a number of peers have replicated the results can they be accepted with any confidence.
Yet in clinical trials of new drugs, it seems, only a single trial is ever done. How did that ever get accepted as proper scientific evidence?
I am sure that there are many other solipsists out there.
Almost had me there article! Until you said the most evil words known to man... "statistical technique". AKA "bullshit"
Bayesian statistics is far from bullshit.
I suggest you read up on it.
You can do some really cool stuff with it.
Testing if a coin flip is fair.
Correct images.
Filter spam
The problem is the Bayesian analysis is far from conclusive. What it does point to is that the clinical trial needs a larger sample size. Sample sizes that are too small are useless.
I suggest you follow up on your convictions and shun everything made with the use of "statistical technique", AKA "bullshit".
Of course, this will mean you'll have to live in a cave and subsist on rainwater and whatever you can forage.
The problem is the Bayesian analysis is far from conclusive.
100% Wrong
What it tells you is the probability that your hypothesis is correct given your evidence and your prior knowledge.
They looked at reanalyses that had already been done for other reasons, rather than doing their own reanalyses on randomly selected trials. It occurs to me that these trials may have been subjected to reanalysis precisely *because* there were doubts about the initial analysis.
Bayesians need priors don't they? Where do they come from and what affect does the choice of prior have on the final outcome? I don't think Bayesian analysis is any more of a silver bullet than any other technique.
No, the GP is right. While BA gives you a probability distribution for the effectiveness, unless the effect is really strong (or you bad a really bad choice of priors), that distribution is going to be quite wide for a small data set. Such results are not proving that what you were testing was effective, but that there is a decent probability it might be effective given the knowledge you gain from the test, and that you should pursue a larger test. I've found it to be quite rare to have a BA result that strongly excludes a null hypothesis in a small scale test without having already been flagged as effective by simpler tests (i.e. the effects were so obvious, didn't require trying that hard to see).
Bayesians need priors don't they?
Correct
Where do they come from and what affect does the choice of prior have on the final outcome?
They come form, get this, prior knowledge. If you have complete ignorance of a system then you use a prior of 1 ie. all possibilities are equally probable. Your evidence will then show you the truth. Priors just help you get there faster.
This seems to highlight the reasons behind the All Trials movement: http://www.alltrials.net/
Since people interpret confidence intervals as credible intervals (and they are usually close to same for a uniform prior), any problem with "uninformed" bayesian analysis is shared by those using frequentist techniques in practice.
Anytime you re-analyze data you run into this.
Think about it. There are a million ways you can analyze any dataset. There are millions of datasets out there to analyze. There are millions of people who can independently decide to go back and do a re-analysis.
So, the issue is that if somebody goes back and does a re-analysis and the results are boring, nobody publishes. However, if the results are controversial, it gets published. Since there are so many permutations, you're guaranteed to find something exciting.
This is why you're supposed to establish your methods BEFORE you collect the data, and then stick to the methods you established to analyze the data. Otherwise your 95% confidence turns into a more realistic 1% confidence.
In practice, though, I'm sure the initial analyses are just as prone to this kind of problem. It just gets REALLY bad when you look backwards.
Actually... if you check the literature I am fairly certain you will find that most reports do not tell you the sample size and/or stopping rule. Since it is impossible to interpret p-values without knowledge of these, it seems extremely unlikely that statistics has played much of a role at all. We should also consider that the primary use of stats is to disprove strawmen "null hypotheses" (eg two groups of patients have exactly the same average) rather than something predicted by the researcher, thus limiting us to affirming the consequent (you cannot falsify your theory using this method). In the end I would say that most applications of statistics have made a negative contribution to science and technology.
Ronald Fisher (guy who came up with the significance test) warned us of this many years ago:
"We are quite in danger of sending highly-trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be. In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort."
Fisher, R N (1958). "The Nature of Probability". Centennial Review 2: 261–274."
http://www.york.ac.uk/depts/maths/histstat/fisher272.pdf
Frequentists vs. Bayesians
There are many things wrong with clinical trials, but this isn't one of them. Both the original article and the reanalysis use valid statistical procedures and do not contradict each other. The original analysis didn't prove absence of an effect, it merely failed to show the existence of an effect. The new analysis shows that the drug is, in fact, more effective under some (weak, reasonable) a priori assumptions.
Whether to use statistical hypothesis testing (frequentist methods) or Bayesian analysis is a long-running debate in statistics and medicine. Both techniques are mathematically valid. Statistical hypothesis testing makes fewer a priori assumptions, which is why people have traditionally trusted it more and why it is widely taught and used in science. But over the years that people have come to realize that pessimistic assumptions can be harmful, such as when you continue clinical trials too long or reject the use of life saving drugs. Although I personally think Bayesian methods are a better way of analyzing the data, I think the debate over which methods to use is the way scientific debate and change should happen: slowly and with careful re-analysis and re-examination of data and experimental results.
This is why the meta-research regarding the safety and efficacy of GMO food organisms is so important. We are constantly told by the industry and their online astroturf army that there are "thousands" of studies showing the safety of GMOs, but it turns out they're basically the same shallow study 2000 times. The most disturbing findings have come from the kind of reanalysis that this story describes. But now, those studies get shouted down because supposedly it's "settled science".
Whenever there's money involved, you have to be extra skeptical of terms like "settled science". Even if there's no obvious corruption involved, there's just too much energy and money pushing for a certain result (or perception) and it creates a kind of momentum. Yes, that includes climate change, though so far, the climate science has stood up to the copious additional scrutiny.
Good scientists aren't only skeptical about nature, they're also skeptical about themselves. We should follow their lead, not the way the Randi-style "pop-skeptics" have set up a tyranny of conventional wisdom, but by taking second and third looks.
Constructing rational biases, which is what a prior effectively is ("objective prior") isn't all that easy though, is it. There's no universal method for constructing a prior. It's a big source of potential error.
Constructing rational biases, which is what a prior effectively is ("objective prior") isn't all that easy though, is it. There's no universal method for constructing a prior. It's a big source of potential error.
No it's not. If you don't know a proper value for a prior you set it's probability equal to 1. What is so hard to understand about that?
Even a uniform prior has issues, especially in cases where you don't know how wide to set it. It might not matter for doing just a typical analysis to find most likely value with some rough error bars, but for when doing model comparisons where you need to compare the evidence factor, the width chosen for a prior can come into play. The point of Bayesian analysis is not to completely remove problems with bias in setup like selection of a model or prior, but to provide an objective frame work that clearly lays out what you started with and what you did so that others can reproduce or compare the results without missing assumptions.
but with other studies they say researchers do have much incentive to redo trials/experiments. You don't make a name for yourself by just confirming someone else's work.
Even a uniform prior has issues, especially in cases where you don't know how wide to set it.
No it doesn't. You don't even understand what a uniform prior means when you use say "how wide to set it". A uniform prior means all probabilities are equally probability. It's infinitely wide. Mathematically it means P(h) = 1. 1 times anything equals 1. It changes nothing.
This is bayes theorem.
P(h|e)=P(e|h)P(h)/P(e)
This is Bayes theorem with a uniform prior
P(h|e)=P(e|h)/P(e)
It changes NOTHING about your evidence.
Let's compare two companies that depend on science - IBM and GlaxoSmithKline.
Let's say IBM discovers a new method of lithography for building microchips. They publish their results, and their results are replicated. More importantly, IBM gets a new, presumably better way of making microchips.
GlaxoSmithKline makes a new drug that treats a psychological illness. To some degree, because there are no objective physical tests for most psychological illnesses, the determination of effectiveness is made subjectively.
Both companies want the science to turn out right, because it makes them money. One of them has a much easier time massaging the results of any studies.
...but it's being eaten...by some...Linux or something...
A real shocker. As basically no profession has a good grip on statistics except specialized mathematicians, it is no surprise so many are wrong or misleading.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
A uniform prior means all probabilities are equally probability.
I've never seen a text define a continuous uniform distribution or continuous uniform prior as must being infinitely wide. It is used synonymously with a "rectangular" distribution that has end points, with the infinitely wide case being a special case, more often than not, for illustrating something than actually being used.
Try implementing that in practice on actual analysis of data. Outside of some simple analytic cases where you can work it out with pencil and paper, an infinitely wide prior will not work in many practical applications of Bayesian analysis. Numerical methods at deriving distributions to more complex systems have very non-trivial methods of working out the evidence for model comparison, which is why you end up with all sorts of crappy approximations and comparisons that avoid calculating the actual evidence. This especially applies to methods that need to draw test values out of a distribution, where an infinitively wide prior is impossible (and even impossible for it to be as wide as your number representation due to other stability issues).
But frequentist analyses aren't any more "objective," they just hide biases from view and include inductive biases that aren't even rationally compatible with any consistent state of belief.
With Bayesian analysis your starting point is out in the open and must be justifiable and defensible; analysts are accountable for their priors.
You can also, of course, examine what would follow from several different priors. This is much more straightforward than trying to shake the hidden biases in a frequentist model.
It changes NOTHING about your evidence.
You can directly see how it factors into your calculation of the evidence term when you calculate the occam factor, which is proportional to the width of a uniform prior. If your experience is limited to simple hypothesis testing or working over a small problem with a small number of possibilities, you might not have been exposed to such things. But for those that have to deal with continuous problems and are concerned with model comparison (probably more applicable to physical sciences than biological sciences), you'll quickly run into these issues.
Try implementing that in practice on actual analysis of data.
I do it every day. Works perfectly.
An infinitely wide distribution function literary mean 1. F(x) = 1.
Do you need an example.
I have some data
x y
0.188334 2.08939
0.400133 2.26874
0.723409 2.31389
0.172104 2.00783
0.430118 2.28716
0.245059 2.03828
0.0494027 2.04421
0.221342 2.15249
0.911822 2.4686
0.461583 2.24511
I have a hypothesis that this data is linear. y = m*x + b. I want to know that probability that the parameters of m and b explain my data. I know nothing about m and b so I will set P(m)=1 and P(b)=1. Any value of m from -Inifinity to Infinity is equally probable. Any value of b from -Inifinity to Infinity is equally probable.
I want to find the probability that a line with a value of m and value of b fits my data. I need a probability distribution function that my data is explained by my evidence.
P(x_i,y_i|m,b) = Exp(-(y_i - (m*x_i + b))^2/0.01^2) A Gaussian distribution with sigma for ~10% cause my data has noise in it.
Now my
P(m,b|x_1,y_1,x_2,y_2,...x_n,y_n)=Product[Exp(-(y_i - (m*x_i + b))^2/0.01^2)]*1*1=Product[Exp(-(y_i - (m*x_i + b))^2/0.01^2)]
Look at that my uniform prior worked perfectly. At the point where P(m,b|x_1,y_1,x_2,y_2,...x_n,y_n) is maximum, a line with corresponding values of m and b best describes my data.
So, here's the problems with Bayesian analysis:
1. Who decides what the prior knowledge actually is? The researcher doing the analysis! So you could have different analysts coming to different conclusions with the same data depending on their idea of what the prior knowledge actually states.
2. You could use a reference [objective] prior, true. I wish people did that more often. But the reference prior isn't always a uniform distribution (in general it isn't a uniform distribution, although it sometimes is)--sometimes the uniform prior will place more weight on an estimate than is justified given your design and model (as opposed to the data; e.g., if some values are unknowable because of your design, it doesn't make sense to weight them as heavily). Also, reference priors can be really really difficult to compute--for some really standard problems (e.g., involving the standard normal) they're undefinable in some sense (although that's overblown IMHO--those problems make assumptions that aren't met in practice, like any real value being a valid estimate).
Of course, as sample size increases, Bayesian and frequentist estimates come up the same, but then why use a Bayesian method at all in those circumstances?
It's been shown that Bayesian estimates (not predictions) are guaranteed to be biased--you're essentially taking a gamble that you can decrease overall estimation error by decreasing random variance at a cost of a little bias.
I don't mean to knock Bayesian analysis, but the grandparent post is right: it's not like Bayesian analysis is some magic cure-all, or is inherently "better" than any other estimation method. In fact, based on the summary (I need to read the paper still), I would think this scenario would caution people against Bayesian analysis: if their conclusions are changing in a favorable way based on the inclusion of a biasing prior, would you trust those changed conclusions?
There's no universal method for constructing a prior. It's a big source of potential error.
Don't they just have to go into an monastery or something like that?
Faster! Faster! Faster would be better!
1. Who decides what the prior knowledge actually is?
Here is an examples of a priors.
I know nothing about a variable. I set that probability equal to 1.
I know that density can never be negative. I set my prior so that negative densities are have a probability of zero and positive densities have a probability of 1.
I prior is nothing more that an probability distribution.
Outside of some simple analytic cases where you can work it out with pencil and paper
P:
Do you need an example. [simple analytic example]
Now try that on a model that needs a MCMC calculation to derive the distribution because it can't be done analytically.
I know nothing about a variable. I set that probability equal to 1.
Another researcher addressing the same problem does the exact same thing. Except in their model they have some variable y that is equal to x^2 in your model. If dealing with a small sample size, applying a uniform prior to y is not going to get the same result as a uniform distribution to x.
Now try that on a model that needs a MCMC calculation to derive the distribution because it can't be done analytically.
Thats what I do every day in equilibrium reconstruction. Uniform priors work perfectly.
What algorithm do you use that allows sampling from an infinitely wide uniform probability distribution? And what algorithm do you use for model selection, considering many of them have been shown to depend on width of the prior which is problematic for comparison of models using different parameters and/or priors?
Uniform probability in what scale, though? Performing a Bayesian analysis with a uniform prior will generally give different results than, say, using a log scale on the dependent variable(s) and choosing a uniform prior on *that* scale. The Jeffreys prior provides a method of computing a non-informative prior that is invariant under re-parameterization, but is generally difficult to work with, and is never a uniform prior. So yeah, the concept of an uninformative prior is more complicated than "just use a uniform distribution", and analysts need to be especially careful with priors used with small sample sizes!
What algorithm do you use that allows sampling from an infinitely wide uniform probability distribution?
None
Start at a point in parameter space and move in the direction maximum probability.
And what algorithm do you use for model selection, considering many of them have been shown to depend on width of the prior which is problematic for comparison of models using different parameters and/or priors?
None
We already know out model. The magnet hydrodynamic equations are well established.
The problem is the Bayesian analysis is far from conclusive. What it does point to is that the clinical trial needs a larger sample size. Sample sizes that are too small are useless.
Conversely if the sample size is too large, the cost of trials, typically a few to a few tens of millions of dollars, will go up significantly. The reflex answer to that is "so what, Big Pharma can pay for it!", except that many new therapies are developed by small companies, especially drugs for conditions with a small patient sample. For example, the condition used as an example, systemic sclerosis, has an incidence of about 1 in 100,000 in the US; it's estimated about 50,000 people in the US suffer from it. I've seen clinical trials with a 46 patient sample size cost upwards of $10M, because the condition is small in incidence so even just finding patients willing to participate is very tricky. If you upped that number to 450 patients, the cost for the trials would not justify developing a drug helping these people; the developers would lose money and these people are left to suffer. So there's a bit of a balancing act that is not just statistical significance, but also cost/benefit.
Almost had me there article! Until you said the most evil words known to man... "statistical technique". AKA "bullshit"
--RAH
No, no, you're not thinking; you're just being logical. --Niels Bohr
It might not matter for doing just a typical analysis to find most likely value with some rough error bars, but for when doing model comparisons where you need to compare the evidence factor, the width chosen for a prior can come into play.
And now you say you don't do model selection... therefore you seem to be agreeing with the previous post. Issues with prior boundaries can come up when evaluating the actual value of the evidence for model selection. If you are just finding the most probably value and some error bars, as already said, then you don't need the value of the evidence.
That said, model selection is still a large issue in equilibrium reconstruction, and has little to do with the validity of MHD. Determining the number and placement of currents within an the plasma and possibly vacuum vessel if looking at error fields, etc., all represent model choice. Determining how temperature and pressure profiles should be represented is model selection (simplified model versus splines, how many knots for the splines, etc.). If at this point you haven't come across model selection work and discussion of various issues, even within equilibrium reconstruction within fusion experiments, then you are either just starting to look at the tip of an iceberg that is Bayesian anaylsis, or you are naive of the the vast amount of details and caveats of the tools you are trying to use.
All of which work perfectly well assuming uniform priors.
Start at a point in parameter space and move in the direction maximum probability.
In other words, you are using hill climbing instead of MCMC, or you don't know what you are talking about...
Maybe when someone says "X is a problem when you do Y" you shouldn't chime in with "X is never a problem (but I don't do Y)"
If you are doing model comparison it is perfectly acceptable to use a uniform prior if you no nothing a priori. In fact it's the only correct probability distribution. Bayesian statistics will just tell you given your knowledge and evidence one modeling is better than another. A uniform probability distribution is never a problem. If they are assigning probabilities to stuff they don't know about then they are using Bayesian statistics wrong.
In fact it's the only correct probability distribution.
So why does the Jeffreys prior exist and why is there so much writing around it and similar families of priors? You can find writing online by Edwin Jaynes, some up to nearly 50 years old, discussing issues with choice of prior and why a uniform prior can be problematic when you know nothing. To act as if there were no other choice is either disingenuous, or betrays a naivety of the subject considering how much writing and work there is on the issue.
Bayesian statistics will just tell you given your knowledge and evidence one modeling is better than another.
Just like calculus will "just tell you" what the solution to a differential equation is? There is a lot of math and numeric calculations involved behind getting that answer, with model selection being an entire research subject of its own. A lot of numeric methods cannot handle an infinitely uniform prior (some struggle with improper priors in general, or require extra hoops to jump through). A bunch work with uniform priors, but the explicitly results depend on the width of uniform priors even in analytic cases.
If they are assigning probabilities to stuff they don't know about then they are using Bayesian statistics wrong.
I can agree with this, but consider just assigning a uniform prior to a parameter without being aware of the potential consequences and biases as falling under that.
You can directly see how it factors into your calculation of the evidence term when you calculate the occam factor, which is proportional to the width of a uniform prior.
WRONG
The occam factor is a simplifying approximation.
From Data Analysis from D. S. Sivia Page 79-80
To proceed further analytically, let us make some simplifying approximations. Assume that, a priori, Mr B is only prepared to say that lambda must lie between the limits lambda_min and lambda_max; we can the naively assume a uniform prior with in the range:
P(L|B)=1/(L_max-L_min)
he goes on further to say.
We should not lose sight of the fact that the precise form of eqn (4.8) stems from or stated simplifying approximations; if these are not appropriate, then eons (4.2) and (4.3) will lead us to a somewhat different formula.
Basically the occum factor as derived is invalid for a true uniform prior.
So why does the Jeffreys prior exist and why is there so much writing around it and similar families of priors?
From Data Analysis: a Bayesian tutorial second edition by D.S Sivia page 79
To proceed further analytically, let us make some simplifying assumptions. Assume that, a priori, Mr B is only prepared to say that lambda must lie between limits lambda_min and lambda_max; we can then naively assign a uniform prior with in this range
P(L|B)=1/(L_max - L_min)
In short when comparing models the occum factor would give you a negative infinity to infinity in your equations. But this fundamentally comes about from a simplifying assumption to make the equations analytical. In practice this is usually not a problem but fundamentally it makes the occur factor invalid for a truly uniform prior.
> What it tells you is the probability that your hypothesis is correct given your evidence and your prior knowledge.
Not at all. How results are determined is just as influential and can invert or otherwise corrupt the bayesian analysis. Knowledge is not about statistics, it's about who's doing the statistics.
There was a recent interview on Australian ABC radio about the waste of money on a lot of clinical trials, due to experimenters being unable to enroll enough trial candidates to get a meaningful result.
Note the Show Transcript button. It's a fairly short read.
Godel_56 posting anonymously due to mod points.
if these are not appropriate,
Except that those "approximations" are quite appropriate to any analysis method that needs bounds on the prior, which includes a lot of numerical techniques. Otherwise, it just confirms what has been said, that which uniform distribution you selection can have an impact on model selection. The occam factor is not just a simplifying approximation in general, but integral to many model selection methods, whether implicitly or explicitly, and it leads to an impact on your model selection results depending on how you handle a uniform prior. This gets especially messy in problems where you can't even get analytic integrals to converge when trying to evaluate for an unbounded prior, coming back to that some methods need a proper prior and an infinite uniform prior is improper.
That has squat to do with Jeffreys prior, which deals with reparameterization. From what I remember, Sivia's book doesn't address it at all (unless there is a newer edition that I'm not familiar with). You might have to reach beyond an undergrad intro textbook (e.g. references already referred to) to be familiar with issues within the field, especially if you actually use the techniques in research. In the meantime, I'm not sure if it is worth anyone's time to discuss things if you are just going to grab quotes and subjects at random and pretend it has some relevance or shows knowledge of the topic.
Please push a serious statistics course through the throat of medicine students, that would benefit everyone health!
There have been recent cries for reproducible results in science.
The scope is too limited.
There should be a cry for reproducible results in any research prior to its publication.
Long and short of it for researchers: if only you can get the results and conclusions, then the results and conclusions are not publishable.
"Consensus" in science is _always_ a political construct.