Dozens of Recent Clinical Trials May Contain Wrong or Falsified Data, Claims Study (theguardian.com)

← Back to Stories (view on slashdot.org)

Dozens of Recent Clinical Trials May Contain Wrong or Falsified Data, Claims Study (theguardian.com)

Posted by BeauHD on Monday June 5, 2017 @03:30PM from the suspicious-reporting dept.

John Carlisle, a consultant anesthetist at Torbay Hospital, used statistical tools to conduct a review of thousands of papers published in leading medical journals. While a vast majority of the clinical trials he reviewed were accurate, 90 of the 5,067 published trials had underlying patterns that were unlikely to appear by chance in a credible dataset. The Guardian reports: The tool works by comparing the baseline data, such as the height, sex, weight and blood pressure of trial participants, to known distributions of these variables in a random sample of the populations. If the baseline data differs significantly from expectation, this could be a sign of errors or data tampering on the part of the researcher, since if datasets have been fabricated they are unlikely to have the right pattern of random variation. In the case of Japanese scientist, Yoshitaka Fuji, the detection of such anomalies triggered an investigation that concluded more than 100 of his papers had been entirely fabricated. The latest study identified 90 trials that had skewed baseline statistics, 43 of which with measurements that had about a one in a quadrillion probability of occurring by chance. The review includes a full list of the trials in question, allowing Carlisle's methods to be checked but also potentially exposing the authors to criticism. Previous large scale studies of erroneous results have avoided singling out authors. Relevant journal editors were informed last month, and the editors of the six anesthesiology journals named in the study said they plan to approach the authors of the trials in question, and raised the prospect of triggering in-depth investigations in cases that could not be explained.

5 of 66 comments (clear)

Min score:

Reason:

Sort:

Thanks for that! by ls671 · 2017-06-05 15:55 · Score: 4, Insightful

Thanks for that! Now I can use that tool to generate data for my upcoming fabricated studies.

--
Everything I write is lies, read between the lines.
"90 of the 5,067" by Nutria · 2017-06-05 15:56 · Score: 4, Insightful

That's... less than 2%. Naturally, we want it to be 0%, but 1.8% is nothing to generate scare headlines over.

--
"I don't know, therefore Aliens" Wafflebox1
1. Re:"90 of the 5,067" by ShanghaiBill · 2017-06-05 16:22 · Score: 5, Insightful
  
  That's... less than 2%. Naturally, we want it to be 0%, but 1.8% is nothing to generate scare headlines over.
  They only caught the dumb ones. It would have been easy to generate fake data that fits a known distribution. For instance, in Python, just use numpy.random.normal instead of numpy.random.uniform.
  The 2% is just the floor. The actual fraud and/or incompetence rate is likely higher.
Re:Is anyone surprised? by Dunbal · 2017-06-05 23:40 · Score: 3, Insightful

Those are only the ones that are easy to prove fake. There has been a lot of research over the years whose results simply cannot be reproduced even in an identical experiment. Big Pharma has been caught red handed several times now - at one point even publishing their studies in their own "peer review" magazine.

--
Seven puppies were harmed during the making of this post.
Re: Only in Clinical studies ..... by KGIII · 2017-06-06 01:29 · Score: 3, Insightful

Hmm...
I am not a climate scientist. I am a retired scientist. What did I do? I modeled traffic. As strange as it might sound, there is a lot of similarity between the two. I will try to give some history, as it may help this make more sense. Sorry for the lack of brevity.
In my case, I helped bring traffic modeling to the age of computers. In this process, it was learned that you could improve the model results, significantly, by increasing the amount of data available. Even seemingly trivial things can impact throughput. Simple things, such as signage fonts, can impact throughput. Even the frequency of lane markings, reflectivity of lane markings, and coloration all have an impact on throughput.
To try to put this in perspective, I was working with data sets in the full TB size, before the turn of the century. We did distributed computing, before it even really had a name.
Why is that important?
Well, traffic is a bit like climate. It is a chaotic system. To be clear, a chaotic system is not a system that is random. It appears random but, with more data, you can tease out patterns and make deterministic predictions based on a variety of variables, with some levels of consistency and success.
I am not suggesting, for the record, that the climate science models are 100% accurate. In fact, they have confidence ratings. That goes underreported, but they will tell you how confident they are in the results.
Anyhow, that's besides the point. I just want to make it clear and avoid confusion.
What is important is that you have to massage the data. You have to make corrections to the data. You have to remove outliers.
See, we'd collect data and then run it against the models. We'd compare the model output with what was really happening. Sometimes, the results are pretty close. This means you can have greater confidence in the results. Sometimes, it isn't even remotely close.
At that point, you usually start by poking at the model itself. However, you will also poke at the data. You will throw some of that data right into the trash. You will normalize the numbers, and adjust the impact factor. You will also probably swear, like a lot. You will invent whole new languages, just to swear in them.
Either way, you will massage that data until you get the results that most closely match reality. You take existing data and run your models to see how well they match reality. When you get it to the point where you're confident, you use those methods to make predictions about the future, given new variables. This will have varied confidence levels, and pinpoint accuracy isn't expected by anyone versed in the science.
The truth is, you can model all you want but some drunk guy is still going to drive, in reverse, the wrong direction down a one way street. So, you only have so much confidence in the predictions.
The whole point is, you have to massage the data. If you don't, you get horrible results that don't match reality. The expected outcome isn't certainties. The expected outcome is predictions for which you can assign a confidence level.
I suspect part of the problem is poor communication and bad journalism. I've taken some time to examine the models, methods, and reasons. I am not a climate scientist, but I have taken a reasonable amount of time to study it in a scholastic manner. You can download their data AND their models, for free, and run them yourself. You can massage that data any way you want, too. You can apply all the adjustments you want and run the models yourself - for just the cost of hardware you already own and electricity.
Anyhow, I hope this clears a few things up. Correcting and massaging data is pretty normal. It's pretty much required, if you want meaningful results. I am pretty sure the uncorrected data sets are also available. You can get so many data sets, for free. They'll even give you the models. Hell, they'll even give you the source code for the models.
I do want to make it clear, the goal isn't a perfect pre

--
"So long and thanks for all the fish."