Why Published Research Findings Are Often False
Hugh Pickens writes "Jonah Lehrer has an interesting article in the New Yorker reporting that all sorts of well-established, multiply confirmed findings in science have started to look increasingly uncertain as they cannot be replicated. This phenomenon doesn't yet have an official name, but it's occurring across a wide range of fields, from psychology to ecology and in the field of medicine, the phenomenon seems extremely widespread, affecting not only anti-psychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants. 'One of my mentors told me that my real mistake was trying to replicate my work,' says researcher Jonathon Schooler. 'He told me doing that was just setting myself up for disappointment.' For many scientists, the effect is especially troubling because of what it exposes about the scientific process. 'If replication is what separates the rigor of science from the squishiness of pseudoscience, where do we put all these rigorously validated findings that can no longer be proved?' writes Lehrer. 'Which results should we believe?' Francis Bacon, the early-modern philosopher and pioneer of the scientific method, once declared that experiments were essential, because they allowed us to 'put nature to the question' but it now appears that nature often gives us different answers. According to John Ioannidis, author of Why Most Published Research Findings Are False, the main problem is that too many researchers engage in what he calls 'significance chasing,' or finding ways to interpret the data so that it passes the statistical test of significance—the ninety-five-per-cent boundary invented by Ronald Fisher. 'The scientists are so eager to pass this magical test that they start playing around with the numbers, trying to find anything that seems worthy,'"
Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
"As the intrepid kobold companion continues his journey, he begins to wonder... if priests raises dead, why anybody die?
The article says "this phenomenon doesn't yet have an official name," [yet] but it actually does. It's called "lying".
"If you want to know what happens to you when you die, go look at some dead stuff."
Even in academia, there's an establishment and people who are powerful within that establishment are rarely challenged. A new upstart in the field will be summarily ignored and dismissed for having the arrogance to challenge someone who's widely respected. Even if that respected figure is incorrect, many people will just go along to keep their careers moving forward.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
After years of speculation, the a study has revealed that scientists are, in fact, human. The poor wages, long hours, and relative obscurity that most scientists dwell in has apparently caused widespread errors, making them almost pathetically human and just like every other working schmuck out there. Every major news organization south of the mason-dixon line in the United States and many religious organizations took this to mean that faith is better, as it is better suited to slavery, long hours, and no recognition than science, a relatively new kind of faith that has only recently received any recognition. In other news, the TSA banned popcorn from flights on fears that the strong smell could cause rioting from hungry and naked passengers who cannot be fed, go to the bathroom, or leave their seats for the duration of the flight for safety reasons....
#fuckbeta #iamslashdot #dicemustdie
I see this as one more planted article in mainstream press: "Science is there to mislead you, listen to fake news instead". The rising tide against education and critical thinking in the USA is reminiscent of the Cultural Revolution in China. It is even more ironic that the argument "against" metrics that usefully determine validity is couched in a pseudo-analytical format itself. At this point in the USA, most folks reading (even) the New yorker have no idea what a p-value is, why these things matter, and they will just recall the headline "science is wrong". And then they wonder in Detroit why they can't make $100k a year anymore pushing the button on robot that was designed overseas by someone else- you know, overseas where engineering, science, etc are still held in high regard.
I'm a scientist myself. It's quite clear from where I'm standing that to get good jobs, research grants, etc one needs plenty of published articles. Whether the conclusions of those are true or false is not something that hiring committees will delve into too much. If you are young and have a family to support, it can be tempting to take shortcuts.
This article has already been taken apart by P.Z. Myers in a blog post on Pharyngula. Here's his conclusion:
Basically, it's not like anyone's surprised at this.
NYT article is well written and informative. It's clearly not assuming that there is something wrong with scientific method, but just asks - could it be? There is excellent reply by George Musser at "Scientific American" http://cot.ag/hWqKo2
This is what I call interesting and engaging public discussion and journalism.
user@ubuntubox:~$ stfu This server is going down for shutdown NOW!
I was actually about to feed the troll. I was 2 sentences in before going "oh... right."
This is a bit of a fallacy. Bush increased stem cell research funding, fuel cell research funding, etc. He was in office for 8 years, and I believe 2001 was the first time he cut science spending. That was part of a larger goal to cut spending across the board.
How did he respond in 2002? He asked Congress to DOUBLE science spending.
http://www.scienceprogress.org/2008/01/bush-asks-congress-to-double-science-spending/
My wife showed me a great graph during the last election that tracked science spending from administration to administration and showed that historically Republicans have spent more on science than Democrats.
http://www.youtube.com/watch?v=x7Q8UvJ1wvk
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
> 'Which results should we believe?'
What a ridiculous question. How about the results that are replicated, accurately, time and time again, and not ones that aren't based off of scientific theory, or failed attempts at scientific theory?
That article is as flawed as the supposed errors it reports on. The author just "discovered" that biases exist in human cognition? The "effect" he describes is quite well understood, and is the very reason behind the controls in place in science. This is why we don't, in science, just accept the first study published, why scientific consensus is slow to emerge. Scientists understand that. It's journalists who jump on the first study describing a certain effect, and who lack the honesty to review it in the light of further evidence, not scientists.
Is it possible that there has always been error, but it is just more noticeable now given that reporting is more accurate?
Precisely. As mentioned in a Scientific American blog:
"The difficulties Lehrer describes do not signal a failing of the scientific method, but a triumph: our knowledge is so good that new discoveries are increasingly hard to make, indicating that scientists really are converging on some objective truth."
This is the natural outcome of 'publish or perish.' If keeping your job depends almost solely on getting 'results' published, you will find those results.
Discovery is more prestigious than replication. I don't see how to fix that.
If you had bothered to read the fucking article instead of jumping to some half assed conclusion you would see that the article has nothing to do with lying.
It's not "the oil companies have paid scientists to lie about science"
It's "I'm fascinated that trends I detected early in my research seem to fall apart as I continue to investigate"
Anyway.. thanks for lowering the level of discussion on /. even further, douche.
That's not a given. Particularly in the soft sciences - psychology, for instance - it is extremely difficult to control for all factors (I'm more inclined to say nearly impossible) and so replication of results can be subsumed by other effects, or even simply not work at all. You know that whole generation gap thing? That's a good example of groups of people who are different enough that the reactions they will have to certain subject matter can be polar opposites. So something that was "definitively determined" in 1960 may be statistically irrelevant among the current generation.
That's just one example of how squishy this all is. Without having to bring lying into it at all. And then, there will be liars; and there will be people who draw conclusions without scientific rigor at all, simply because it's just too difficult, expensive or time-consuming to attempt to confirm the ideas at hand. And there is the outlier personality; the one who accounts for those other few percent -- all the declarations of "this is how it is" are false for them right out of the gate.
Hard sciences simply lend themselves a lot better to repeatability. Where I think we go wrong is assigning the same certainties to the claims of the soft scientists. I have personally seen psychiatrists, best intent not in doubt, completely err in characterizing a situation to the great detriment of the people involved, because the court took the psychiatrist's word as gospel truth.
All science is an exercise in metaphor, but soft science is an exercise of metaphor that is almost always far too flexible. One place you can see this happening is the trendy / cyclic adherence to Froyd, Jung, Maslow, Rogers and so forth... the "correct" way to raise babies... Ferberizing, etc. This stuff isn't generally lies at all, but it also generally isn't "right." Good intentions do not automatically make good science.
Serious medicine is another good example. Something that might work very well for you might not work at all for me; get the wrong group of test subjects, and your results will skew or worse. This is an area that I think is fair to call a hard science, but where we just don' t know enough about the systems involved. Generally speaking, I don't think our oncologist lies to us; further, I think he's pretty well aware of the limitations of his practice and the state of knowledge that informs it; but they just don't know enough. To which I hopefully add, "yet."
On a personal level - since that's all I can really affect - I treat soft science about the same way I do astrology. If you believe it, you'll probably attempt to modify your behavior because of the predictions, which in turn may, or may not, affect your actual outcome. If you don't, it's either irrelevant or too uncertain to trust anyway. So it's low confidence all the way.
I do, however, still place very high confidence in Boyle's law for gasses. Hard science works very well. :)
I've fallen off your lawn, and I can't get up.
Before you can question the scientific method through experimentation you first must understand and utilize the scientific process. That last quote is a massive clue that the issue is that they are stepping away from the scientific process and trying to force an answer.
I'll go read the article but before I do I'll just note that in working in semiconductor manufacturing and development both the scientific process and statistical significance are at the core of resolving problems, maintaining repeatable manufacturing and developing new processes and products. And from my 20 years of experience the scientific process worked just fine and when results were not reproducible then you had more work to do but you didn't decide that science no longer worked and that the answer simply changed.
I can guarantee that if we throw away the scientific process and no longer rely of peer review and replication then all those fun little gadgets everyone enjoys these days will become a thing of the past and we'll enter into the second dark age.
The article can be viewed on a single page here: http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer?currentPage=all
Not surprisingly, most of the posts so far show no signs of having actually RTFA.
Lehrer goes through all kinds of logical contortions to try to explain something that is fundamentally pretty simple: it's publication bias plus regression to themean. He dismisses publication bias and regression to the mean as being unable to explain cases where the level of statistical significance was extremely high. Let's take the example of a published experiment where the level of statistical significance is so high that the result only had one chance in a million of occurring due to chance. One in a million is 4.9 sigma. There are two problems that you will see in virtually all experiments: (1) people always underestimate their random errors, and (2) people always miss sources of systematic error.
It's *extremely* common for people to underestimate their random errors by a factor of 2. That means the the 4.9-sigma result is only a 2.45-sigma result. But 2.45-sigma results happen about 1.4% of the time. That means that if 71 people do experiments, typically one of them will result in a 2.45-sigma confidence level. That person then underestimates his random errors by a factor of 2, and publishes it as a result that could only have happened one time in a million by pure chance.
Missing a systematic error does pretty much the same thing.
Lehrer cites an example of an ESP experiment by Rhine in which a certain subject did far better than chance at first, and later didn't do as well. Possibly this is just underestimation of errors, publication bias, and regression to the mean. There is also good evidence that a lot of Rhine's published work on ESP was tainted by his assistants' cheating: http://en.wikipedia.org/wiki/Joseph_Banks_Rhine#Criticism
Find free books.
Now science uses different math, and the results are expressed differently, even probabilistically. But in real science those probabilities are not what most think as probability. In a scanning tunneling microscope, for instance, works by the probability that a particle can jump an air gap. Though this is probabilistic, It is well understood so allows us to map atoms. There is minimal uncertainty in the outcome of the experiment.
The research talked about in the article may or may not be science. First, anything having to do with human systems is going to be based on statistics. We cannot isolate human systems in a lab. The statistics used is very hard. From discussions with people in the field, I believe it is every bit as hard as the math used for quantum mechanics. The difference is that much of the math is codified in computer applications and researchers do not necessarily understand everything the computer is doing. In effect, everyone is used the same model to build results, but may not know if the model is valid. It is like using a constant acceleration model for which a case where there is a jerk. The results will be not quite right. However, if everyone uses the faulty model, the results will be reproducible.
Second, the article talks about the drug dealers. The drug dealers are like the catholic church of Galileo's time. The purpose is not to do science, but to keep power and sell product. Science serves a process to develop product and minimize legal liability, not explore the nature of the universe. As such, calling what any pharmaceutical does as the 'scientific method' is at best misguided.
The scientific method works. The scientific method may not be comopletey applicable to fields of studies that try to find things that often, but not, always, work in a particular. The scientific method is also not resistant to group illusion. This was the basis of 'The Structure of Scientific Revolution'. The issue here, if there is one, is the lack of education about the scientific method that tends to make people give individual results more credence than is rational, or that is some sort of magic.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Did you even read the article?
This is basically about poorly designed clinical drug trials without sufficient controls. Sloppy work, even if it seemed rigorous enough at the time.
The sensationalistic "scientific method in question" stuff is pure BS, but after all this is New Yorker magazine we're talking about, so one wouldn't expect too much scientific literacy. It was the scientific method of "predict and test" that caught these erroneous results, so the method itself is fine. The "scientist" who designed a sloppy experiment is too blame, not the method.
However, I'm not sure that psychiatric drug trials even deserve to be called science in the first place. The principle of GIGO (Garbage In - Garbage Out) applies. This is touchy-feely soft science at best. How do you feel today on a scale of 1-10? Do the green pills make you happy?
I remember some time in the '80s, a doctor published some "research" that claimed to show that abused children could be identified by how they reacted to a pencil shoved into their anus. Yes, really! Unfortunately, doctors think they are scientists and for the most part, they are not, so they did not properly evaluate the methods used for this "research" The real shame of this was that some doctors actually used this "method" to identify supposedly abused children, with all the attendant hurt and distress that these false accusations caused.
Or, to put it more charitably, medicine and psychology are far describing far more complex phenomenon than we like to admit.
For example, in psychiatric genetics, there are dozens of articles every year that find a new gene associated with a common and important condition (e.g. autism, schizophrenia, depression). After each new finding comes out, there are dozens of labs that try to replicate that finding, usually one or two replicate (or partially replicate) the finding, and five or six don't replicate it. Why is it so hard to replicate these findings? Probably because there are really dozens of independent genes that contribute to these complex disorders (probably in combination with each other), and some populations tend to have mutations in one set, while other populations tend to have mutations in another set.
We're moving towards understanding, but the disorders are far more complex than the assumption that there will be a single cause.
The problem is calling these fields as "science" and these people as "scientists".
The most hilarious one is the "Science of Economics".
It's all right if the subject is too complex and we don't yet have better ways to study it. The best people available have done their best in studying the field, whichever method they adopt. There is nothing more we can ask for.
Except, just don't call it the fucking SCIENCE.
or just afraid of your wife?
Yes, in 2008, when public pressure was on and a new election was coming he gave lip service to people about getting more money. At a time when he would have no incentive to actually back up his statement.
Seriously, learn freaking politics before mashing out nonsense.
Obama has NOT CUT NASA BUDGET. He increased it 6 billion dollars.
In 2006, NASA has it's budget butchered and 3 billion cut.
The republicans have been slowly cutting science since Reagan. The more the religious right infects the republican p[arty, the more it cuts science. Who cut the metric system conversion program? republicans under the guise of 'cutting the budget'. Who removed all effort to remove oil dependence, the republicans.
Who made is so the general population could use the internet? democrats. Who increase NASA's budget? Democrats.
And comparing cuts or increase to just the president is so naive as to be called stupid.
The Kruger Dunning explains most post on
This issue is exactly why many scientists are moving towards model selection approaches instead of significance testing. Significance testing is arbitrary and silly at some level, and even Fisher knew that. The .05 cutoff is just something he pulled out of his butt one day as an arbitrary threshold that one might use for determining whether or not to provisionally believe a result, it's not some fundamental constant of the universe that has any real external justification to it. The good news is that the younger generation of scientists is increasingly comfortable with model selection, and as a result this is a problem that is in the process of correcting itself.
http://www.youtube.com/watch?v=qNxfPAF1frM
The Kruger Dunning explains most post on
First, science has always had a political aspect. Publication reviewers are always biased by conventional wisdom among their scientific peers, and they will become critical of any submitted paper that strays from that view. A lot of careers are based on following the conventional wisdom, and threats to those careers are met with political responses.
Second, the quest for statistical significance is based on serious misunderstanding of statistics among scientists. It has been so for decades. Publication editors are thoroughly ignorant of statistics if they demand statistical significance at the .95 or .99 levels as a condition of acceptance.
Results that are statistically significant may or may not be clinically significant. Both factors must be considered.
Significance levels are based on one model of statistical inference. There are other models, although those have been subjected to politics within the mathematical/statistical community. Although Bayesian statistics are now accepted (and form a critical basis in theories of signal processing, radar, and other technologies) they were rejected by the statistical community for many years. The rejection was almost completely political, because the concepts challenged the conventional wisdom.
The basic scientific method is not a problem. The major problem is the factors in publication acceptance and the related biases and pressures to adhere to the conventional wisdom. Rejection of papers based on politics or on ignorance of statistical methods is outside the scientific method and needs to be rooted out.
I think that people tend to underestimate the pervasive impact of regression toward the mean.
Even without "data snooping" (improperly reanalyzing your data post-hoc in multiple ways to find something that appears to be statistically significant), there is still going to be bias. If I do an experiment and I happen to "luck out" and get a large (i.e. larger than the "true" mean of an infinite number of observations) effect size just by chance, I am far more likely to do follow-up experiments than if I am unlucky and the effect size is small or the result is not statistically significant. If subsequent experiments asking the same question in different ways also give a statistically significant result, my belief in the phenomenon is reinforced even if the effect size is a bit smaller.
So I am far more likely to identify a real phenomenon if because of a statistical fluctuation I initially observe a larger effect size or a smaller standard error than the "true" value. And my figures from that initial study, showing a nice big effect and a small error bar are far more likely to pass peer review than if the effect size is smaller and the error bars are larger, even if the criterion for statistical significance is satisfied.
If I am unlucky, and I get a lot of variation and/or a small effect size (again, compared to the "true" value from an infinite number of experiments), there is a good chance that the experiment will go into a drawer. Perhaps I'll give up on the idea, or perhaps I'll try it again, but I'll improve the experimental design in a way that I hope will reduce the statistical variability or give me a larger effect size. Of course, if it "works," I'll pat myself on the back for solving the technical problem and go on to do follow-up studies, even though statistically speaking it may well be the case that the prettier result from the new design is itself just a statistical fluctuation.
Part of the problem is that by convention, we report a single value for effect size. Yes, some sort of estimate of standard deviation is appended, but what people remember is that single value. It simply is very hard for human beings to think in terms of statistical distributions. We tend to forget (even though we know it to be true in theory) that a statistically significant result does not show that our estimate of effect size is correct--all it tells us is that the effect size is unlikely to be zero.
Thus, we can predict, just on statistical grounds, that effect sizes will tend to decline ("regress" toward the "true" mean) over time with follow-up studies, based on the simple fact that those follow-up studies are far more likely to happen if the measured effect size was initially larger than the "true" value than if it was smaller. And as far as I know, nobody has been able to come up with any statistically rigorous way of estimating the magnitude of this unavoidable bias.
Yikes, I don't know what industry your phd biochemists are going into, but if they go work for a drug company they can make a solid 200K+.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Anyone familiar with the concept of "reality on demand" knows that it is constructed on a need-by-need basis by the Blue People. Now the Blue People are not 100% reliable. Sometimes they forget to put back key pieces of reality. This is the source of the problem of failure to reproduce results. The reproduction is being attempted in a reality which is simply too different.
Two possible solutions come to mind. Only conduct experiments where one never has to leave the room. Or maybe find a good lawyer who can negotiate a higher quality level contract with the Blue People.
Not really. This would be only true if all of those 2000 ways were statistically independent from one another. It would take a much larger dataset than most scientists deal with for there to be 2000 different ways of analyzing it, and even then they would not be statistically independent.
So the problem is not as bad as you suggest, but it is real. If I compare 20 different statistically independent measurements, one is expected to meet the p 0.05 criterion by pure random chance. There are ways of correcting for this bias, by requiring a higher criterion of statistical significance (say p 0.0025), but that also reduces the power of my study to detect a real difference.
Which is appropriate really depends upon the nature of the experiment and the question being asked. If I do 20 measurements and half of them are statistically significant, I may not much care if one of them is by chance.
If I want to minimize the likelihood of reporting an incorrect result, while maximizing the power of my study, my best bet is to decide in advance on a very few measurements and statistical tests, and stick with them. That's good for me, but it doesn't really help the reader who is looking at a bunch of different studies, because each finding reported at p = 0.05 still has one chance in 20 of being wrong. Added to that is an unknown magnitude of publication bias, because studies with significant findings are more likely to be published than those that find nothing of statistical significance.
Also: http://www.google.com/#q=peer+review+as+censorship
http://www.counterpunch.org/mazur02262010.html
http://www.suppressedscience.net/censorship-medicine.html
A key point being that keeping information from the public is not the same as modding up (or revising interactively) information like on slashdot. What would slashdot be like if every comment needed "peer review" before it was posted? Instead, slashdot uses after the fact moderation. (Nothing is perfect, of course.)
In general:
http://www.suppressedscience.net/
http://www.disciplinedminds.com/
http://www.jamesphogan.com/books/book.php?titleID=37
http://www.newciv.org/whole/schoolteacher.txt
http://www.johntaylorgatto.com/chapters/16a.htm
And from a previously posted link (from 1994 from the Vice Provost of Caltech, and it has probably gotten worse since): ..."
http://www.its.caltech.edu/~dg/crunch_art.html
"Peer review is usually quite a good way to identify valid science. Of course, a referee will occasionally fail to appreciate a truly visionary or revolutionary idea, but by and large, peer review works pretty well so long as scientific validity is the only issue at stake. However, it is not at all suited to arbitrate an intense competition for research funds or for editorial space in prestigious journals. There are many reasons for this, not the least being the fact that the referees have an obvious conflict of interest, since they are themselves competitors for the same resources. This point seems to be another one of those relativistic anomalies, obvious to any outside observer, but invisible to those of us who are falling into the black hole. It would take impossibly high ethical standards for referees to avoid taking advantage of their privileged anonymity to advance their own interests, but as time goes on, more and more referees have their ethical standards eroded as a consequence of having themselves been victimized by unfair reviews when they were authors. Peer review is thus one among many examples of practices that were well suited to the time of exponential expansion, but will become increasingly dysfunctional in the difficult future we face.
We must find a radically different social structure to organize research and education in science after The Big Crunch. That is not meant to be an exhortation. It is meant simply to be a statement of a fact known to be true with mathematical certainty, if science is to survive at all.
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
One tends to hear this sort of thing from people who don't know anything about the pharmaceutical industry, and of course this attitude is pushed very hard by people who are hawking quack cures of one sort or another, and who are thus competitors of the pharmaceutical industry.
I'm an academic pharmacologist, but I've met a lot of the people involved in industrial drug discovery, and trained more than a few of them. People tend to go into pharmacology because they are interested in curing disease and alleviating suffering. Many of them were motivated to enter the area by formative experiences with family members or other loved ones suffering from disease. They don't lose this motivation because they happen to become employed by a pharmaceutical company--indeed, many enter industry because it is there that they have the greatest opportunity to be directly involved in developing treatments that will actually cure people.
It is certainly true that pharmaceutical companies are businesses, and their decisions regarding how much to spend on treatments for different illnesses are strongly influenced by the potential profits. A potential treatment for a widespread chronic disease can certainly justify a larger investment than a one-time cure. But it can also be very profitable to be the only company with a cure for a serious disease. And it would be very bad to spend a lot of money developing a symptomatic treatment only to have somebody else find a cure. So a company passes up an opportunity for a cure at its peril. There is definitely a great deal of research going on in industry on potential cures.
The real reason why cures are rare is that curing disease is hard. Biology is complicated, and even where the cause is well understood, a cure can be hard to implement. For example, we understand in principle how many genetic diseases can be cured, but nobody in industry or academia knows how to reliably and safely edit the genes of a living person in practice. It is worth noting that the classic "folk" treatments for disease, including virtually all of the classic herbal treatments that have been found to actually be effective--aspirin, digitalis, ma huang, etc--are not cures; they are symptomatic treatments. Antibiotics were a major breakthrough in the curing of bacterial diseases, but they were not created from scratch, but by co-opting biological antibacterial weapons that were the product of millions of years of evolution. Unfortunately, for many diseases we are not lucky enough to find that evolution has already done the hardest part the research for us.
Ben Goldacre, an MD from UK, has been at the detecting pseudoscience game for a while now. I have just started reading his book, Bad Science: Quacks, Hacks, and Big Pharma Flacks. I find it refreshingly topical and well-focused on the problem: evidence-based decision making.
Similar to Goldacre's findings, my experience has been that evidence, which has been produced by some test, requires the nature of that test to be disclosed. Following the model of the scientific process, evidence requires the following before it is complete: a testable idea, a test (or series of). To facilitate TFA's issue of replication, it is often nice to include the test setup, the procedure for executing the test, results of running the test given some inputs, etc.
--I apologize for any weirdness. I have been trying to edit this but apparently copy/paste is broken for my mode of /. viewing and Mac OS X 10.6.5 Safari 5.0.3.
Rather than trying to invent a whole load of new effects with psychological explanations I wonder whether anyone has actually looked at things using basic statistics. The only seems to occur when the author has observed a noticeable effect. A noticeable effect is far more likely to be spotted when a statistical fluctuation makes it bigger, rather than smaller. When you then repeat a measurement you will then notice a smaller effect.
The article with the ESP experiment is a dead ringer for this. A student suddenly gets far more correct than they should on average so everyone takes notice and then, over time, the number of correct guesses drops to normal. Would anyone have noticed an exceptionally unlucky streak where a student got more than normal wrong and then suddenly got "better" by approaching normal?
This simple statistical effect has been well known in particle physics for years. Discoveries are typically made on upward fluctuations of data and then you will typically see them decrease with subsequent measurements. However the change is usually within reasonable uncertainty of the previous measurement and is not there for all measurements (although less likely you can still discover something on a downward fluctuation). So how about testing the simplest, statistical hypothesis first before inventing new psychological explanations...unless you think that fundamental particles are somehow subject to the positive, upbeat nature of particle physicists!