Psychology's Replication Battle
An anonymous reader sends this excerpt from Slate:
Psychologists are up in arms over, of all things, the editorial process that led to the recent publication of a special issue of the journal Social Psychology. This may seem like a classic case of ivory tower navel gazing, but its impact extends far beyond academia. ... Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable. ...Given the stakes involved and its centrality to the scientific method, it may seem perplexing that replication is the exception rather than the rule. The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings that say that things are unrelated or that a theory is not supported. The more surprising the positive finding, the better, even though surprising findings are statistically less likely to be accurate."
good luck with that.
Perhaps they need some therapy :-)
Software engineering has a similar problem. Things that are objective to measure, such as code volume (lines of code) are often only part of the picture. The psychology of developers (perception, etc.), especially during maintenance, plays a big role, but is difficult and expensive to objectively measure.
Thus, arguments break out about whether to focus on parsimony or on "grokkability". Some will also argue that if your developers can't read parsimony-friendly code, they should be fired and replaced with those who can. This gets into tricky staffing issues as sometimes a developer is valued for their people skills or domain (industry) knowledge even if they are not so adept at "clever" code.
Thus, the "my code style can beat up your style" fights involve both easy-to-measure "solid" metrics and very difficult-to-measure factors about staffing, side knowledge, people skills, corporate politics, economics, etc.
Table-ized A.I.
Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings that say that things are unrelated or that a theory is not supported. The more surprising the positive finding, the better, even though surprising findings are statistically less likely to be accurate.
Because it's always wishful thinking and the 'findings' are always BS. About time it's called out for the non-science nonsense that it is.
That's a surprise.
"The average reporter we talk to is 27 years old......They literally know nothing." - Ben Rhodes
it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable
Duh. That's because an experiment that is not replicable has *no* value.
The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings ...
This applies to all science, not just psychology.
Once is an anomaly
Twice is a coincidence
Three times is a pattern
Falling into the 'cult' category
... the military have constantly funded large scale psy-ops which includes information warfare, trend setting, viewpoint shifting, etc, on the people
I think that too many "studies" set out to prove a hypothesis instead of test a hypothesis. The drive to prove something puts bias into the study and skews the outcome. No one wants to be proven wrong. This is especially important when the measurements are subjective as in many psychology studies.
No, and it shouldn't carry the same "science" label to start with. Make it "social studies" or whatever. To call it science, one tries to put it on the same level as real science, where the processes are completely different on numerous levels. It's an insult to real science. For example, when a scientist builds a collider to find a particle, and he finds one, he puts up the results so they can be verified by peers, and if the collective brainpower finds an error and puts it down, the process is considered a success. In the meantime soft "scientists" will not be verified by peers and separate studies will have to point out the results are not even replicable, and people will bitch about and defend their research and the funding of their research.
"Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable."
No, those of us that oppose the funding of this crap recognise that if you cannot replicate your "study" then it is not an experiment. If what you are doing cannot be proved (one way or the other) by experiment then IT IS NOT SCIENCE. I don't really care what it gets called and some of it may even be valuable for some values of valuable however the amount of dross that is produce by social researchers that try and call themselves scientists is truly extraordinary and a plague on our world.
"The first thing to do when you find yourself in a hole is stop digging."
Replicating scientific results (or failing to) is a good thing.
Being rude about it, as was apparently the case here, is plain old asshattery.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Honestly, does anyone even care? I mean, we should care, but we don't.
There are criminals, violent ones getting away with some "counseling", while a lot of people get to spend the better part of life institutionalized or so drugged up you could mistake them for brain dead, because you know, "treating" a drooling idiot is easier(cheaper) than someone needing constant attention.
Software engineering has a similar problem. Things that are objective to measure, ...
I see all too often opinion being expressed as fact - even by CS professors when I was in school.
The psychological climate clearly calls for a shift to climate psychology.
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
Dan Ariely, Daniel Kahneman, and few others have done extensive work that has shown the limitiations of how we think and how we actually perform economic activity.
The failure of the rational market economists is that they just study large, very well organized markets dominated by professionals that are now mostly run by computers - the finanancial markets (because there's a shit load of publically available data). So of course, the market looks completely rational. They then extrapolated their finding to everything else - even to people buying that new house that they just "fell in love with".
Sorry, real life is messy.
1 - Some replicable tests are a good idea
Some people see Aliens at Roswell when they are there at night and take drugs.
This is a replicable experiment - is it because they have taken drugs or because Aliens are sometimes there?
Generally (sadly) if you have a randomised double-blind controlled experement that controls for the likely deciding factors, you can decide whether or not it is more likely because people take drugs (happily you cannot be sure about the presence or absence of aliens)
2 - Some replicable tests are a bad idea
Do the really expensive cancer|baby-saving|altzhiemer etc drugs we use really help?
This is also replicable experiment
Give some people the drug and some a placebo.
Not too ethical even if you disclose that there might be a placebo
3 - Some things cannot be replicated
Was it right to have QE - did we have the right amount of QE
This is not replicable.
You dont get to re-run an economy for the last 6 years - all you can do is watch and measure and argue about causation afterwards.
In the scope of psychology, you get a mix of all 3 experiment types. All these questions are very good questions.
What troubles me is that there will be a growing tendency to not attempt to answer the hard ones.
Only one or two though.
... then it ain't science. End of story.
Tom? Is that you?
Do you know where I can get a copy of Dianetics? I've heard its da bomb!
Social Psych has been called a "soft" science for a reason. Not because it is easy, but because it's not a "do this and this happens" discipline. When I was in college (back before the web), Social Psych was not even considered a "science" by the environmental science majors. It was considered a "rocks for jocks" type group of classes for people who didn't understand the scientific method. I have a cultural anthropology with a minor in linguistics, and even the forensic anthropology majors considered social psych to be a joke. Predomininately because there was no real analysis. FA's at least had historical trending, disease propogation models, statistical excavation, linguistic drift, shard analysis, and other "tools" to cross-check their work. True, someone with a limited knowledge of FA can still make just as many errors as a social psych Phd holder... Just read the first few pages of "Clan of the Cave Bear."
Psychologists have largely relied on inferential statistics as tools for inference. Analysis of variance, t-tests, correlation, and regression are used to determine whether results are, or are not, "statistically significant." Too often the focus has been on the inference -- significant or not -- rather than on the descriptive data -- means, regression coefficients etc.
The problem is that tests of statistical significance can tell us only that the tested relationship is, or is not, plausibly due to random fluctuation or chance. For example, we can say that a correlation found to significant is unlikely to be zero. In this usage significant does not mean "important," it means not random. Binary decisions that apparent relationships in my data are random or real, do not provide much of a foundation for a developing science. Finding relationships that are not due to chance is a very small step toward real understanding.
Further, random data can easily be produced by weak manipulations, poor measurement tools, and any number of experimental glitches. Therefore, without statistically significant results, publication in a good journal is unlikely. It is easy to discount later failures to replicate obtaining non significant findings as due to problems with the replication study. Therefore, the replication study doesn't get published.
An additional problem is the challenge of obtaining adequate sample sizes to ensure the statistical power needed to assess replicability -- the vast majority of published studies are not supported by grant funds. We've known for 6 decades that even studies published in top journals are chronically underpowered -- the probability of a perfect executed replication study finding a key result to be significant is usually in the range of .5 (ouch!).
I think that the attention that these problems have gotten in many of the field's top journals may be embarrassing for the field, but it is necessary and positive step toward a better science.
http://neurotheory.columbia.ed...
"It's a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty--a kind of leaning over backwards. For example, if you're doing an experiment, you should report everything that you think might make it invalid--not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked--to make sure the other fellow can tell they have been eliminated."
In the search of positive results, and p-hacking to get there, they're failing to demonstrate scientific integrity.
This is what we get for putting up with the global warming hysterics...Science, which is by definition "repeatable experiments" suddenly gets redefined so that repeatable experiments don't matter.
Ick!
To replicate an experiment, you take the description of the conditions, tasks, environment, fixed independent and dependent variables, analytical method and results provided by the original experimenter in the (peer-reviewed) paper they published.
If you can show the same results, with the same statistical significance, then it's reasonable to assume that the experiment shows a valid scientific phenomenon.
If you can't then one of the two experiments got it wrong and more work is needed.
The basic problem with social experiments, that are based on the judgement, feelings, or anything else that the studied group merely says it would / would-not do, thinks, feels, or otherwise emotes is completely subjective. Asking people how sad, happy, angry something makes them feel and rating that feeling - or the difference from previous values - has no scientific merit, as none of the terms used have any hard, scientific, definition and none of the participants have had their feelings "calibrated".
It's little different from a scientist (a proper one) measuring electric voltage by sticking their tongue across two electrodes, or measuring distance by eyeballing it. The level of accuracy and standardisation the social "sciences" have at present puts them on a par with chemical research: phlogiston, fixed air (CO2) in the 17th century.
As for being able to determine which variables are being measured - or even what all the variables are in their experiments, the social scientists have yet to discover their subject's version of fire.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Have a journal, call it Debunker's Weekly if you want, that is divided evenly between papers on replication and papers showing negative correlation at the start. Pay authors a nominal amount, according to the thoroughness of the work as judged by referees. Provide the journal free to University libraries. Submit summaries of major stories to Slashdot, The Guardian, various Skeptical societies and other places likely to raise the extreme ire of dodgy researchers. In fact, the more ire, the better.
The journal doesn't have to last long. Just long enough to force bad researchers to improve or quit, force regular journals to publish a wider range of findings to avoid humiliation, and to correct dangerously erroneous beliefs. Since there must be a stockpile of unpublished papers of this sort, you should probably be able to get six or seven bumper editions out before anyone notices the dates, and maybe another two before the journal is sued into oblivion for defamation.
That would be plenty to make some major course corrections and to "out" a few frauds.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Fixed typo.
Agreed on study size, which is why social scientists look at meta-studies of hundreds of studies performed over as much as a decade, to eliminate the noise and other transient junk.
What they really need to do, though, is examine more hypotheses. You need 7-10 additional hypotheses, not including the null hypothesis, that are orthogonal to each other and to the hypothesis being tested. This would allow you to binary subdivide the problem space, not only showing what something isn't but also showing if the models being examined are founded on sound principles.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
If you think Psychology has a replication problem, get a load of Economics.
When it comes to "hard" sciences, Economics is basically remote viewing with a political agenda.
You are welcome on my lawn.
To replicate an experiment, you take the description of the conditions, tasks, environment, fixed independent and dependent variables, analytical method and results provided by the original experimenter in the (peer-reviewed) paper they published. If you can show the same results, with the same statistical significance, then it's reasonable to assume that the experiment shows a valid scientific phenomenon.
If you can't then one of the two experiments got it wrong and more work is needed.
Actually it would be at least one of the two "got it wrong".
Everybody wants to be a psychiatrist and psychologist. Fu#knutbook's social experiments not an exception. What a flawed science.
Without replication, science doesn't build on previous results. It just thrashes around. Psychology (and theology) are like that. They change, but don't improve much.
There's a practical problem. Without repeatable scientific results, a technology cannot be built based on the science. "Science is prediction, not explanation." - Fred Hoyle.
Buy any book about abreaction therapy. That's where L. Ron Hubbard cribbed the idea for Dianetics anyway.
[To repeat an experiment about international macroeconomics] You could run a simulation.
So how would you go about falsifying the accuracy of the simulation's model?
'Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences"'
No they're not, they're absolutly right, psychology and the entire DSM is just a psuedo scientific cult.
As I read it the initial flap was over the people and journal involved in the replication lying to get the cooperation of the original researcher.
They promised to give an opportunity to review and publish a comment on their own results. They secured her cooperation, getting detailed descriptions of the methodology - far beyond what was in the publication - copies of the original film, and the like. Then, when they got differing results, they denied her the percieved-as-promised opportunity to examine their results in advance and publish a comment with them. They also published comments slamming her work, in terms like "epic fail".
The failed replication of her work might be a problem for her, carreer-wise. But massive ridicule is a much bigger one. So she cried "foul". This - along with similar acts by other replicators - is what brought support for her from other academics.
It is useful that the flap is also bringing to light other, very serious and systematic, problems with the replicability of attempts at performing actual science (or going through the motions) in social fields, creating a search for a measure of the actual reliability level of "social science" results, and exposing estimates of that to a broader audience. But let's not confuse opposition to unethical behavior by certain replicators for opposition to replication in general. (No doubt there is some of both. So let's keep the distinction in mind when evaluating the comments and actions of individuals in the social science fields.)
Meanwile, it looks like "social science" results are far less reliable than political decision-makers had thought, and this flap will give ammunition to those opposing them when they try to legislate hairbrained and oppressive schemes and foist them on the ruled classes. So some good is already coming out of it. B-)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
My Bullshit detector went off when I saw this came from the Slate.
They are just Troll for eyeballs, looking for the worst BS to make a story from.
The main article (and in this case a *real* article - not just a few single sentence paragraphs that don't fill a screen) uses as it's primary example a study of *40 undergraduates* that is used to draw the author's 'conclusion.' Seriously!?!
Maybe if these soft science people would have had to take a few hard science courses they would have known that drawing any conclusions from a study of 40 people (let alone self-selecting undergraduates probably from the same university) might be a cause for concern about your results. The author then wonders why her study can't be replicated by others - really?!? I'm sure that if I took a random sample of 40 undergraduates from my local university I could probably conclude that everyone in the U.S. is white - can I have my doctorate now?!
This article reminds me of a Slashdot post about a year back where a government agency actively surveyed tens of thousands of people about their employment and compiled statistics from the responses. The Slashdot article itself was about a feminist taking issue with the gender diversity reported by the government agency (she thought the actual numbers were much lower than the report gave). What was her evidence? She asked her readers to send in the male/female distribution for their particular workplace. A government survey of tens of thousands of randomly selected people versus a survey of a hundred of so self-selected people who read this femenists blog; I certainly know which one I would give more credence to.
While I think there is plenty of good research that can be done in the areas of the 'soft' sciences, it is still necessary to do proper science, with proper sample sizes from a properly randomized selection of subjects, analysed with proper statistical tools, and above all with results that can be reproduced by others in the field. This particular study of undergraduates just seems to be a case of a researcher publishing as fact/theory results that wouldn't qualify as preliminary if she were doing real scientific research.
The real problem is that this lackadaisical approach to science is finding it's way into the hard sciences. In order to get funding, you need to produce results; not just results, but sexy results that you can have splashed all over the media to gain notoriety. I'm sure most of us remember the 'publish by press conference' fiasco that was cold fusion - surprise, other researchers couldn't replicate their results either.
We can only hope this is a teachable moment; remember that rigorous research, experimentation, analysis, and reproducibility are the hallmarks of the scientific method.
If it cannot be replicated, it is not Science, period. Predictability of outcomes is the only purpose of Science. Fekking off in a "sciency way" isn't Science, any more than armchair quarterbacking is Quarterbacking, or that Psychology is Science.
Additionally, attempting to bend the meaning of words in order to legitimize your personal hopes, faiths, and dreams defeats the purpose of communication, which is a shared understanding of abstract concepts. I am already quite unwilling to attempt to communicate with most of you, and this nonsense isn't helping your cause with myself, or, I suspect, other truly intelligent people.
There are plenty of good psychology experiments/case studies that produce a lot of really useful information and are repeatable (albeit over a very long period of time). The problem is there are also a lot of complete and utter ass psychology experiments. It is really really hard to produce a good study that provides useful results in soft sciences, and in cases of psychology, they take a very long time and sometimes a lot of money to complete. Yes, they have to account for a lot of variables and exclude them via statistical analysis, but the ones that do it right do it exceptionally well.
I used to think negatively on those types of studies until I actually took the time to read one while helping my girlfriend with a paper. I was amazed at the level of detail and the amount of effort they took to isolate the results into meaningful data.
A) Conclusions reached in most published research are wrong.
B) Social science is an oxymoron.
A friend of my fathers is a journalist in the IDF (Isreal Defense Forces). These last few months, he's been sending reports from his time spent on the front lines during the recent Hamas/Isreal conflict. Here is one of his many accounts. I will probably post the first ones later on, but this was particularly poignant.
https://m.facebook.com/notes/darion-scard/a-jawbreaker-for-a-toothache/10152186415126956/?ref=bookmark
There's a basic foundation that's roughly agreed upon, delineated by rules and best practices. Once those are mastered, then coding becomes an art form. And as art it can be subjective, defy description and all apparent rules of logic, and yet work incredibly well. If there's one thing I've learned on /. over the years from reading all the arguments between coders, it is that there is more than one way to become a master of one's craft (where coding is considered) and that coding becomes Art. Which I personally think is cool.
Here's to hot beer, cold women, and Glaswegian kisses for all.