Psychology's Replication Battle
An anonymous reader sends this excerpt from Slate:
Psychologists are up in arms over, of all things, the editorial process that led to the recent publication of a special issue of the journal Social Psychology. This may seem like a classic case of ivory tower navel gazing, but its impact extends far beyond academia. ... Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable. ...Given the stakes involved and its centrality to the scientific method, it may seem perplexing that replication is the exception rather than the rule. The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings that say that things are unrelated or that a theory is not supported. The more surprising the positive finding, the better, even though surprising findings are statistically less likely to be accurate."
good luck with that.
Perhaps they need some therapy :-)
Software engineering has a similar problem. Things that are objective to measure, such as code volume (lines of code) are often only part of the picture. The psychology of developers (perception, etc.), especially during maintenance, plays a big role, but is difficult and expensive to objectively measure.
Thus, arguments break out about whether to focus on parsimony or on "grokkability". Some will also argue that if your developers can't read parsimony-friendly code, they should be fired and replaced with those who can. This gets into tricky staffing issues as sometimes a developer is valued for their people skills or domain (industry) knowledge even if they are not so adept at "clever" code.
Thus, the "my code style can beat up your style" fights involve both easy-to-measure "solid" metrics and very difficult-to-measure factors about staffing, side knowledge, people skills, corporate politics, economics, etc.
Table-ized A.I.
Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings that say that things are unrelated or that a theory is not supported. The more surprising the positive finding, the better, even though surprising findings are statistically less likely to be accurate.
Because it's always wishful thinking and the 'findings' are always BS. About time it's called out for the non-science nonsense that it is.
That's a surprise.
"The average reporter we talk to is 27 years old......They literally know nothing." - Ben Rhodes
it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable
Duh. That's because an experiment that is not replicable has *no* value.
The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings ...
This applies to all science, not just psychology.
Once is an anomaly
Twice is a coincidence
Three times is a pattern
Falling into the 'cult' category
I think that too many "studies" set out to prove a hypothesis instead of test a hypothesis. The drive to prove something puts bias into the study and skews the outcome. No one wants to be proven wrong. This is especially important when the measurements are subjective as in many psychology studies.
No, and it shouldn't carry the same "science" label to start with. Make it "social studies" or whatever. To call it science, one tries to put it on the same level as real science, where the processes are completely different on numerous levels. It's an insult to real science. For example, when a scientist builds a collider to find a particle, and he finds one, he puts up the results so they can be verified by peers, and if the collective brainpower finds an error and puts it down, the process is considered a success. In the meantime soft "scientists" will not be verified by peers and separate studies will have to point out the results are not even replicable, and people will bitch about and defend their research and the funding of their research.
"Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable."
No, those of us that oppose the funding of this crap recognise that if you cannot replicate your "study" then it is not an experiment. If what you are doing cannot be proved (one way or the other) by experiment then IT IS NOT SCIENCE. I don't really care what it gets called and some of it may even be valuable for some values of valuable however the amount of dross that is produce by social researchers that try and call themselves scientists is truly extraordinary and a plague on our world.
"The first thing to do when you find yourself in a hole is stop digging."
Replicating scientific results (or failing to) is a good thing.
Being rude about it, as was apparently the case here, is plain old asshattery.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Define 'replicate'.
Tom? Is that you?
Do you know where I can get a copy of Dianetics? I've heard its da bomb!
And I've seen applications that are ridiculous.
Then give an example rather than just make empty allegations. Henry Ford wasn't an economist.
Sorry, real life is messy.
1 - Some replicable tests are a good idea
Some people see Aliens at Roswell when they are there at night and take drugs.
This is a replicable experiment - is it because they have taken drugs or because Aliens are sometimes there?
Generally (sadly) if you have a randomised double-blind controlled experement that controls for the likely deciding factors, you can decide whether or not it is more likely because people take drugs (happily you cannot be sure about the presence or absence of aliens)
2 - Some replicable tests are a bad idea
Do the really expensive cancer|baby-saving|altzhiemer etc drugs we use really help?
This is also replicable experiment
Give some people the drug and some a placebo.
Not too ethical even if you disclose that there might be a placebo
3 - Some things cannot be replicated
Was it right to have QE - did we have the right amount of QE
This is not replicable.
You dont get to re-run an economy for the last 6 years - all you can do is watch and measure and argue about causation afterwards.
In the scope of psychology, you get a mix of all 3 experiment types. All these questions are very good questions.
What troubles me is that there will be a growing tendency to not attempt to answer the hard ones.
1) Occam's razor already tells you it's the drugs. Unless aliens show up only when taking drugs, or we suddenly get super-alien-viewing-powers when using drugs, aliens could be there. That's (apart from being ridiculous) such a complicated model compared to the simple "your drugs give you hallucinations" model (which we even know is true) model that occam's razor can rule out the other ones.
2) Erm.. you know that this is EXACTLY how drugs are tested every day? Not unethical. Extremely common.
3) You could run a simulation.
http://neurotheory.columbia.ed...
"It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked--to make sure the other fellow can tell they have been eliminated."
In the search of positive results, and p-hacking to get there, they're failing to demonstrate scientific integrity.
To replicate an experiment, you take the description of the conditions, tasks, environment, fixed independent and dependent variables, analytical method and results provided by the original experimenter in the (peer-reviewed) paper they published.
If you can show the same results, with the same statistical significance, then it's reasonable to assume that the experiment shows a valid scientific phenomenon.
If you can't then one of the two experiments got it wrong and more work is needed.
The basic problem with social experiments, that are based on the judgement, feelings, or anything else that the studied group merely says it would / would-not do, thinks, feels, or otherwise emotes is completely subjective. Asking people how sad, happy, angry something makes them feel and rating that feeling - or the difference from previous values - has no scientific merit, as none of the terms used have any hard, scientific, definition and none of the participants have had their feelings "calibrated".
It's little different from a scientist (a proper one) measuring electric voltage by sticking their tongue across two electrodes, or measuring distance by eyeballing it. The level of accuracy and standardisation the social "sciences" have at present puts them on a par with chemical research: phlogiston, fixed air (CO2) in the 17th century.
As for being able to determine which variables are being measured - or even what all the variables are in their experiments, the social scientists have yet to discover their subject's version of fire.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Have a journal, call it Debunker's Weekly if you want, that is divided evenly between papers on replication and papers showing negative correlation at the start. Pay authors a nominal amount, according to the thoroughness of the work as judged by referees. Provide the journal free to University libraries. Submit summaries of major stories to Slashdot, The Guardian, various Skeptical societies and other places likely to raise the extreme ire of dodgy researchers. In fact, the more ire, the better.
The journal doesn't have to last long. Just long enough to force bad researchers to improve or quit, force regular journals to publish a wider range of findings to avoid humiliation, and to correct dangerously erroneous beliefs. Since there must be a stockpile of unpublished papers of this sort, you should probably be able to get six or seven bumper editions out before anyone notices the dates, and maybe another two before the journal is sued into oblivion for defamation.
That would be plenty to make some major course corrections and to "out" a few frauds.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Fixed typo.
Agreed on study size, which is why social scientists look at meta-studies of hundreds of studies performed over as much as a decade, to eliminate the noise and other transient junk.
What they really need to do, though, is examine more hypotheses. You need 7-10 additional hypotheses, not including the null hypothesis, that are orthogonal to each other and to the hypothesis being tested. This would allow you to binary subdivide the problem space, not only showing what something isn't but also showing if the models being examined are founded on sound principles.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
If you think Psychology has a replication problem, get a load of Economics.
When it comes to "hard" sciences, Economics is basically remote viewing with a political agenda.
You are welcome on my lawn.
You misrepresent what happened. Ford realized that first and foremost, people needed to be able to *afford* cars, so he designed and produced the T. Only after the car was ubiquitous did fashion exert any greater influence.
To replicate an experiment, you take the description of the conditions, tasks, environment, fixed independent and dependent variables, analytical method and results provided by the original experimenter in the (peer-reviewed) paper they published. If you can show the same results, with the same statistical significance, then it's reasonable to assume that the experiment shows a valid scientific phenomenon.
If you can't then one of the two experiments got it wrong and more work is needed.
Actually it would be at least one of the two "got it wrong".
Yeah, you are going to be seriously confused if you think "rational actor" economics assumes a Straw Vulcan who won't buy the chocolate ice cream which he likes better if the vanilla is a cent cheaper. But the fault, dear AC, is not in the economics, but your own skull.
Without replication, science doesn't build on previous results. It just thrashes around. Psychology (and theology) are like that. They change, but don't improve much.
There's a practical problem. Without repeatable scientific results, a technology cannot be built based on the science. "Science is prediction, not explanation." - Fred Hoyle.
Buy any book about abreaction therapy. That's where L. Ron Hubbard cribbed the idea for Dianetics anyway.
[To repeat an experiment about international macroeconomics] You could run a simulation.
So how would you go about falsifying the accuracy of the simulation's model?
As I read it the initial flap was over the people and journal involved in the replication lying to get the cooperation of the original researcher.
They promised to give an opportunity to review and publish a comment on their own results. They secured her cooperation, getting detailed descriptions of the methodology - far beyond what was in the publication - copies of the original film, and the like. Then, when they got differing results, they denied her the percieved-as-promised opportunity to examine their results in advance and publish a comment with them. They also published comments slamming her work, in terms like "epic fail".
The failed replication of her work might be a problem for her, carreer-wise. But massive ridicule is a much bigger one. So she cried "foul". This - along with similar acts by other replicators - is what brought support for her from other academics.
It is useful that the flap is also bringing to light other, very serious and systematic, problems with the replicability of attempts at performing actual science (or going through the motions) in social fields, creating a search for a measure of the actual reliability level of "social science" results, and exposing estimates of that to a broader audience. But let's not confuse opposition to unethical behavior by certain replicators for opposition to replication in general. (No doubt there is some of both. So let's keep the distinction in mind when evaluating the comments and actions of individuals in the social science fields.)
Meanwile, it looks like "social science" results are far less reliable than political decision-makers had thought, and this flap will give ammunition to those opposing them when they try to legislate hairbrained and oppressive schemes and foist them on the ruled classes. So some good is already coming out of it. B-)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
Unfortunately, it's not a strawman. Not exactly. The researchers themselves may not do that extrapolation, but those they convince of their finding do. The GP post, however, is historically inaccurate. The "rational market theory" originated some time before 1950, and was dominant during the 1950 and later. It has recently been challenged by people doing actual reasearch that proved it an invalid model.
IIRC it originated by an economic school that was ideologicaly comitted to the Free Market, despite the obvious fact that never throughout the course of history has there ever BEEN a free market. Some are freer than others, e.g. the market in illegal drugs, where even killing your opponents is considered a valid business move. Those markets, however, fail the normal definition because the purchaser doesn't really know what he's buying.
I think we've pushed this "anyone can grow up to be president" thing too far.
The researchers themselves may not do that extrapolation, but those they convince of their finding do.
I believe the number one lesson of economics is that people including the economists themselves have a huge capacity to rationalize all sorts of things, sometimes in very elaborate ways, when their interests are at stake. This has nothing to do with the rational market model.
The "rational market theory" originated some time before 1950, and was dominant during the 1950 and later. It has recently been challenged by people doing actual reasearch that proved it an invalid model.
Except that the model hasn't been proven to be invalid. It works well for describing stock markets, for example.
IIRC it originated by an economic school that was ideologicaly comitted to the Free Market, despite the obvious fact that never throughout the course of history has there ever BEEN a free market.
At some point, there weren't public sanitation, canals, railroads, or electronic computers either. We didn't let the obvious fact that these didn't exist at the time stop us from making them and benefiting from the results.
Since we have actually developed near free markets and they do work. Sometimes they are somewhat inappropriate, such as when the market in question generates substantial externalities (and none of the market participants have any incentive to price in those externalities). Then external regulation needs to be brought to bear.
Further, that "economic school" you refer to is the Austrian School, which is philosophy than science (for example, they had "self-evident" axioms and eschewed empirical methods). But you might find it interesting that they never assumed that participants of a market were rational in the usual sense. Or rather they didn't distinguish between rational and irrational.
For example, from one of the more famous members of the Austrian School, Ludwig von Mises ("Human Action") we have this:
Human action is necessarily always rational. The term "rational action" is therefore pleonastic and must be rejected as such. When applied to the ultimate ends of action, the terms rational and irrational are inappropriate and meaningless. The ultimate end of action is always the satisfaction of some desires of the acting man. Since nobody is in a position to substitute his own value judgments for those of the acting individual, it is vain to pass judgment on other people's aims and volitions. No man is qualified to declare what would make another man happier or less discontented. The critic either tells us what he believes he would aim at if he were in the place of his fellow; or, in dictatorial arrogance blithely disposing of his fellow's will and aspirations, declares what condition of this other man would better suit himself, the critic.
Anyway moving on, it's not surprising that a non-empirical school of philosophy ends up getting losing some ground to actual empirical studies. Their models still are valid and work for a number of important cases in the world today.
Those markets, however, fail the normal definition because the purchaser doesn't really know what he's buying.
The normal definition is an asymptotic ideal. No one ever has perfect knowledge aside from certain contrived games. But we can still consider what characteristics and to what degree other markets share with that ideal. And guess what? Markets that are pretty close behave similarly to that particular ideal.
What people frequently don't get is that despite the irrationality of markets, they behave more rationally than their participants. It's a case where mobs act smarter than most of the participants. Markets aren't entirely rational, but they are more rational than most of the alternatives for group organization or resource allocation.
There are plenty of good psychology experiments/case studies that produce a lot of really useful information and are repeatable (albeit over a very long period of time). The problem is there are also a lot of complete and utter ass psychology experiments. It is really really hard to produce a good study that provides useful results in soft sciences, and in cases of psychology, they take a very long time and sometimes a lot of money to complete. Yes, they have to account for a lot of variables and exclude them via statistical analysis, but the ones that do it right do it exceptionally well.
I used to think negatively on those types of studies until I actually took the time to read one while helping my girlfriend with a paper. I was amazed at the level of detail and the amount of effort they took to isolate the results into meaningful data.
There's a basic foundation that's roughly agreed upon, delineated by rules and best practices. Once those are mastered, then coding becomes an art form. And as art it can be subjective, defy description and all apparent rules of logic, and yet work incredibly well. If there's one thing I've learned on /. over the years from reading all the arguments between coders, it is that there is more than one way to become a master of one's craft (where coding is considered) and that coding becomes Art. Which I personally think is cool.
Here's to hot beer, cold women, and Glaswegian kisses for all.
No. The rational man theory does NOT work on the stock market. It doesn't even work well for those sections that are computer driven, because the models are always based on incorrect presumptions.
OTOH, I will agree that it OFTEN works on the stock market. This is a far different statement. But much of the stock market is driven by gambling fever, often played with "other people's money". (And in that since, since the player doesn't risk much, I suppose you could call it rational from his point of view. Even then I doubt it, though.)
I think we've pushed this "anyone can grow up to be president" thing too far.
Occam's Razor is merely a pithy statement of the principle of parsimony. It is not a law in any sense, and it "rules out" nothing. It merely suggests that the simpler explanation is more likely to be correct.
Indeed, Occam's Razor is a principle of philosophy, not of science.
When we look at physics, certainly the most rigorous of the sciences, we find that the simpler explanation is often shown to be incorrect over the long term. No "simple" explanation of "how things move", for example, would have come up with quantum mechanics, or relativity, or chaos theory.
This suggests that Occam's Razor is largely worthless as an intellectual tool.
OTOH, I will agree that it OFTEN works on the stock market. This is a far different statement.
And there we go. It's not a far different statement. I didn't claim the rational market model perfectly modeled the stock market.
But much of the stock market is driven by gambling fever, often played with "other people's money".
While this observation isn't entirely irrelevant, it remains that traders who come to markets to gamble, lose money to traders who don't. And it's still a lot better than many of the economic alternatives, such as having the above gamblers in control of a central planning bureau or rent seeking.