Psychology's Replication Battle
An anonymous reader sends this excerpt from Slate:
Psychologists are up in arms over, of all things, the editorial process that led to the recent publication of a special issue of the journal Social Psychology. This may seem like a classic case of ivory tower navel gazing, but its impact extends far beyond academia. ... Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable. ...Given the stakes involved and its centrality to the scientific method, it may seem perplexing that replication is the exception rather than the rule. The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings that say that things are unrelated or that a theory is not supported. The more surprising the positive finding, the better, even though surprising findings are statistically less likely to be accurate."
Perhaps they need some therapy :-)
Software engineering has a similar problem. Things that are objective to measure, such as code volume (lines of code) are often only part of the picture. The psychology of developers (perception, etc.), especially during maintenance, plays a big role, but is difficult and expensive to objectively measure.
Thus, arguments break out about whether to focus on parsimony or on "grokkability". Some will also argue that if your developers can't read parsimony-friendly code, they should be fired and replaced with those who can. This gets into tricky staffing issues as sometimes a developer is valued for their people skills or domain (industry) knowledge even if they are not so adept at "clever" code.
Thus, the "my code style can beat up your style" fights involve both easy-to-measure "solid" metrics and very difficult-to-measure factors about staffing, side knowledge, people skills, corporate politics, economics, etc.
Table-ized A.I.
That's a surprise.
"The average reporter we talk to is 27 years old......They literally know nothing." - Ben Rhodes
it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable
Duh. That's because an experiment that is not replicable has *no* value.
The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings ...
This applies to all science, not just psychology.
When psychologists stop producing so many studies with obvious bias, subjective terminology, subjective conclusions, and stop arbitrarily coming to conclusions based on data flawed for those reasons, maybe it could be taken seriously. Obviously, replication is needed, too.
But so many people are fooled by it. Want a study that says video games cause people to be aggressive? There's a psychology study for you, but there's also one for your opponents. And all of them are bad science.
Falling into the 'cult' category
No, and it shouldn't carry the same "science" label to start with. Make it "social studies" or whatever. To call it science, one tries to put it on the same level as real science, where the processes are completely different on numerous levels. It's an insult to real science. For example, when a scientist builds a collider to find a particle, and he finds one, he puts up the results so they can be verified by peers, and if the collective brainpower finds an error and puts it down, the process is considered a success. In the meantime soft "scientists" will not be verified by peers and separate studies will have to point out the results are not even replicable, and people will bitch about and defend their research and the funding of their research.
Here's an analogy. You plant a dozen tulips in your garden, and observe how well they grow when you do X. Now you claim all plants will grow like that when you do X. The claim is way too broad. Even if you had a dozen identical tulips, and you grew them on the himalaya while doing X, you'd have different results.
"Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable."
No, those of us that oppose the funding of this crap recognise that if you cannot replicate your "study" then it is not an experiment. If what you are doing cannot be proved (one way or the other) by experiment then IT IS NOT SCIENCE. I don't really care what it gets called and some of it may even be valuable for some values of valuable however the amount of dross that is produce by social researchers that try and call themselves scientists is truly extraordinary and a plague on our world.
"The first thing to do when you find yourself in a hole is stop digging."
Yup, like the recent one about men not being able to 'be alone with their own thouhgs'..
That same data can also read 'Men, more willing to put up with pain' or 'Men, more curious and want to know what they may experience'
You have 5 Moderator Points!
Which Helpless Linux zealot/MS basher do you want to mod down today?
No the asshat is not saying that if you cannot get the same results it's not science (in fact the exact opposite), but rather that if you cannot demonstrate that the experiment itself is replicable then it is not science. The contention in the article that in social sciences this lack of replication of experiment may just be a reality up with which we must put IS the reason why whatever you want to call it, it is not science.
"The first thing to do when you find yourself in a hole is stop digging."
Define 'replicate'.
On the contrary, increasing the sample size to big data sizes of say 2 billion subjects would definitely fix that bias problem.
Not at all. For example, try extrapolating behavior from 2 billion young men to older women. You can have huge sample sizes and yet still have sample bias simply because you've excluded an important category (such as the people you actually wanted to study).
To replicate an experiment, you take the description of the conditions, tasks, environment, fixed independent and dependent variables, analytical method and results provided by the original experimenter in the (peer-reviewed) paper they published.
If you can show the same results, with the same statistical significance, then it's reasonable to assume that the experiment shows a valid scientific phenomenon.
If you can't then one of the two experiments got it wrong and more work is needed.
The basic problem with social experiments, that are based on the judgement, feelings, or anything else that the studied group merely says it would / would-not do, thinks, feels, or otherwise emotes is completely subjective. Asking people how sad, happy, angry something makes them feel and rating that feeling - or the difference from previous values - has no scientific merit, as none of the terms used have any hard, scientific, definition and none of the participants have had their feelings "calibrated".
It's little different from a scientist (a proper one) measuring electric voltage by sticking their tongue across two electrodes, or measuring distance by eyeballing it. The level of accuracy and standardisation the social "sciences" have at present puts them on a par with chemical research: phlogiston, fixed air (CO2) in the 17th century.
As for being able to determine which variables are being measured - or even what all the variables are in their experiments, the social scientists have yet to discover their subject's version of fire.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Have a journal, call it Debunker's Weekly if you want, that is divided evenly between papers on replication and papers showing negative correlation at the start. Pay authors a nominal amount, according to the thoroughness of the work as judged by referees. Provide the journal free to University libraries. Submit summaries of major stories to Slashdot, The Guardian, various Skeptical societies and other places likely to raise the extreme ire of dodgy researchers. In fact, the more ire, the better.
The journal doesn't have to last long. Just long enough to force bad researchers to improve or quit, force regular journals to publish a wider range of findings to avoid humiliation, and to correct dangerously erroneous beliefs. Since there must be a stockpile of unpublished papers of this sort, you should probably be able to get six or seven bumper editions out before anyone notices the dates, and maybe another two before the journal is sued into oblivion for defamation.
That would be plenty to make some major course corrections and to "out" a few frauds.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I sampled a random bit sequence just the other day. I can now assure you that a random bit stream is all ones! all friggin' ones I tell you!
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
There are plenty of good psychology experiments/case studies that produce a lot of really useful information and are repeatable (albeit over a very long period of time). The problem is there are also a lot of complete and utter ass psychology experiments. It is really really hard to produce a good study that provides useful results in soft sciences, and in cases of psychology, they take a very long time and sometimes a lot of money to complete. Yes, they have to account for a lot of variables and exclude them via statistical analysis, but the ones that do it right do it exceptionally well.
I used to think negatively on those types of studies until I actually took the time to read one while helping my girlfriend with a paper. I was amazed at the level of detail and the amount of effort they took to isolate the results into meaningful data.