Slashdot Mirror


Psychology's Replication Battle

An anonymous reader sends this excerpt from Slate: Psychologists are up in arms over, of all things, the editorial process that led to the recent publication of a special issue of the journal Social Psychology. This may seem like a classic case of ivory tower navel gazing, but its impact extends far beyond academia. ... Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable. ...Given the stakes involved and its centrality to the scientific method, it may seem perplexing that replication is the exception rather than the rule. The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings that say that things are unrelated or that a theory is not supported. The more surprising the positive finding, the better, even though surprising findings are statistically less likely to be accurate."

31 of 172 comments (clear)

  1. Easy to measure versus important by Tablizer · · Score: 3, Insightful

    Psychologists are up in arms

    Perhaps they need some therapy :-)

    a fundamental mistake: They assume that valuable science is limited to the "hard sciences."

    Software engineering has a similar problem. Things that are objective to measure, such as code volume (lines of code) are often only part of the picture. The psychology of developers (perception, etc.), especially during maintenance, plays a big role, but is difficult and expensive to objectively measure.

    Thus, arguments break out about whether to focus on parsimony or on "grokkability". Some will also argue that if your developers can't read parsimony-friendly code, they should be fired and replaced with those who can. This gets into tricky staffing issues as sometimes a developer is valued for their people skills or domain (industry) knowledge even if they are not so adept at "clever" code.

    Thus, the "my code style can beat up your style" fights involve both easy-to-measure "solid" metrics and very difficult-to-measure factors about staffing, side knowledge, people skills, corporate politics, economics, etc.

    1. Re:Easy to measure versus important by Intrepid+imaginaut · · Score: 2

      Completely different situation. In programming discussions are how to optimise the processes involved, the problem with psychology its that they aren't sure if they're working on computers or breakfast cereal boxes with a few rectangles drawn on them. The main value that psychologist bring to the table today is to fulfill the role of that good friend who isn't afraid to lay out a few home truths. Of course if you already have such a friend, the need to attend a psychologist is naturally obviated...

      So, I'm just going to leave this here.

    2. Re:Easy to measure versus important by phantomfive · · Score: 2

      the fascinating thing to me is that sometimes programmers with drastically different coding styles (say, a Lisp macro/functional style compared to an object-oriented small-objects-everywhere style), who would argue vehemently about how the other side is wrong, can still both write incredibly good code. That is, the code will get the job done, be readable, and be flexible.

      Because drastically different styles can end up with good code, I see that as a sign that we as programmers haven't figured out the elements that actually comprise good code. Some programmers do it, but they aren't able to vocalize it, and focus on syntax, etc.

      --
      "First they came for the slanderers and i said nothing."
    3. Re:Easy to measure versus important by tomhath · · Score: 3, Funny

      should both validate the idea

      Over the years we've heard that a good Waterfall process was the magic bullet with Data Flow Diagrams documenting everything before a line of code is written.. . No wait, it's Object Oriented Analysis/Design that will save the day...but no, that didn't work either - but Service Oriented Architecture is the way to go. The latest fad is whatever book sold well recently; none of it is based on any metrics or real science.

  2. "less likely to be accurate" by Vinegar+Joe · · Score: 3, Funny

    That's a surprise.

    --
    "The average reporter we talk to is 27 years old......They literally know nothing." - Ben Rhodes
  3. WTF? by Oidhche · · Score: 5, Insightful

    it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable

    Duh. That's because an experiment that is not replicable has *no* value.

    1. Re:WTF? by justthinkit · · Score: 2

      there *are* experiments that are non-replicable, but still valuable.

      I missed your examples. Could you repeat them?

      --
      I come here for the love
    2. Re:WTF? by thesandtiger · · Score: 2, Interesting

      There's different levels of replication.

      In physics, you can generally replicate an experiment vary precisely if you've got a handle on the factors that went into that experiment - control the environment, etc. You can have an almost perfect replication. Yay, science!

      In social psychology research you can't ever even approach that same level of control over the environment the experiment takes place in. The subject will be different - even if it's the same subject used in the first experiment, because people change over time/exposure. The interviewer will be different because people change over time. The dynamic between interviewer and subject will be different. The history of the subject will be different as will the history of the interviewer as will the place the interview is taking place, etc. etc. etc.

      The best such research can do is to either find that there is a tendency for x to happen in y circumstances, but it might not always be the case.

      And, actually, there is a fair amount of basic replication that goes on in many psychological studies; when I was in the field working on studies we would routinely include certain basic measures that had been used in tens of thousands of studies before and compare anticipated vs. actual outcomes.

      But even if it doesn't get replicated it actually has some value in that it would indicate that whatever the original experiment felt was a contributing factor to the main reported effect, a lack of easy replication under mostly similar circumstances indicates that that factor probably isn't as strong as hypothesized, and it cuts off a (probably) blind alley.

      --
      Since I can't tell them apart, I treat all ACs as the same person.
    3. Re:WTF? by Oligonicella · · Score: 4, Informative

      Recording supernovae

      Not an experiment.

      Dissecting passenger pigeons

      Not an experiment.

      Studying the medical complications of Thalidomide babies

      You got one.

      Any scientific analysis of an event which occurred once may not be directly replicable.

      Actually the analysis can be replicated ad nauseam.

    4. Re:WTF? by sexconker · · Score: 3, Interesting

      None of those are experiments. Experiments test hypothesis. You have to specifically DO something to test your claim and NOT do other things for control for it to be an experiment.

    5. Re:WTF? by sexconker · · Score: 2

      History is useful.

    6. Re:WTF? by HiThere · · Score: 2

      ??? Did you notice that the guy first mentioning "thought experiment" claimed to be a physicist? Moral high ground? Please tell me what "moral high ground" was involved in Einstein's famous "elevator" thought experiment.

      I will grant that there are those who misuse the term, but give him the credit for properly using it.

      OTOH, "thought experiments" in the area of psychology are, in my experience, so poorly done that they neither demonstrate nor validly support any argument. Some of them do point in interesting directions, but what people believe they will do in a situation is often very different from what they would actually do, and that renders them of at best questionable value, even when well designed.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    7. Re:WTF? by HiThere · · Score: 2

      There are valid definitions of "experiment" for which those are experiments.

      E.g., theories are often only checkable by conditions around a supernova. This means you have a theory and a prediction. You won't be able to prove everything about the theory by observing a single supernova, but you may be able to disprove it. And in science you can never prove a theory correct, you can only fail to disprove it.

      FWIW, the Higgs boson has been a terrific disappointment because it didn't prove any theories wrong. There's still hope, but it's getting smaller. This is an especial disappointment because we know our current theories are wrong, or at least incomplete, but we don't know where to look for how to change things. Everything we try seems to come out as the theories predict. Perhaps the Higgs will show SOME unexpected behavior. Perhaps we'll have to depend on gravity waves. (Ugh. If you think the Higgs was hard to measure...) Maybe the answer will lie in terms of "cosmic connections" (which is sort of like entanglement, but with posterior measurement rather than prior sharing of a state).

      But guess what....Every Higgs particle measurement is a separate non-repeatable experiment. We can't control the environment well enough to make them repeatable. Worse, so far they've all had to be done on the same (not replicable) equipment. This is clearly not optimal, but you deal with what you've got.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    8. Re:WTF? by ultranova · · Score: 3, Insightful

      You have to specifically DO something to test your claim and NOT do other things for control for it to be an experiment.

      But in that case the word "experiment" has been defined so narrowly it's no longer the sole validator of scientific theory. For example, General Relativity predicted that light would be affected by Sun's gravitational field, which was later observed during a solar eclipse, which is a naturally occurring event.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    9. Re:WTF? by Eythian · · Score: 2

      Just to add to what you're saying, thought experiments can be perfectly valid in the physical sciences. Newton had a great one determining that differently weighted things falling will fall at the same speed (all other things being equal.)

      If you assume that a light cannon ball will fall slower than a heavy one when you drop them, and then you tie them together, it stands that they must fall at a speed in the middle of what they will each fall at. But tying them together makes them effectively one object, so it'll fall faster.

      Given these both cannot be true, everything must fall at the same speed.

      This is a nice example (to me) of a though experiment that can provide useful results.

  4. Not Just Psychology by jamesl · · Score: 3, Insightful

    The reasons why are varied, but most come down to the perverse incentives driving research. Scientific journals typically view "positive" findings that announce a novel relationship or support a theoretical claim as more interesting than "negative" findings ...

    This applies to all science, not just psychology.

  5. Re:Freud's problem too by Anonymous Coward · · Score: 5, Insightful

    When psychologists stop producing so many studies with obvious bias, subjective terminology, subjective conclusions, and stop arbitrarily coming to conclusions based on data flawed for those reasons, maybe it could be taken seriously. Obviously, replication is needed, too.

    But so many people are fooled by it. Want a study that says video games cause people to be aggressive? There's a psychology study for you, but there's also one for your opponents. And all of them are bad science.

  6. So, it's not a science, it's a religion by Anonymous Coward · · Score: 3, Insightful

    Falling into the 'cult' category

  7. "Social science can be just as valuable" by Anonymous Coward · · Score: 2, Insightful

    No, and it shouldn't carry the same "science" label to start with. Make it "social studies" or whatever. To call it science, one tries to put it on the same level as real science, where the processes are completely different on numerous levels. It's an insult to real science. For example, when a scientist builds a collider to find a particle, and he finds one, he puts up the results so they can be verified by peers, and if the collective brainpower finds an error and puts it down, the process is considered a success. In the meantime soft "scientists" will not be verified by peers and separate studies will have to point out the results are not even replicable, and people will bitch about and defend their research and the funding of their research.

  8. Re:Wrong premice by martin-boundary · · Score: 2
    The other problem is sample size. Psychology sample sizes are *way* too small. In a world of 8 billion people today, anything you find out in a psychological experiment that involves at most a few hundred subjects, often less, cannot have anything universal to say. The samples are just too small.

    Here's an analogy. You plant a dozen tulips in your garden, and observe how well they grow when you do X. Now you claim all plants will grow like that when you do X. The claim is way too broad. Even if you had a dozen identical tulips, and you grew them on the himalaya while doing X, you'd have different results.

  9. Who writes this crap by awol · · Score: 5, Insightful

    "Those who oppose funding for behavioral science make a fundamental mistake: They assume that valuable science is limited to the "hard sciences." Social science can be just as valuable, but it's difficult to demonstrate that an experiment is valuable when you can't even demonstrate that it's replicable."

    No, those of us that oppose the funding of this crap recognise that if you cannot replicate your "study" then it is not an experiment. If what you are doing cannot be proved (one way or the other) by experiment then IT IS NOT SCIENCE. I don't really care what it gets called and some of it may even be valuable for some values of valuable however the amount of dross that is produce by social researchers that try and call themselves scientists is truly extraordinary and a plague on our world.

    --
    "The first thing to do when you find yourself in a hole is stop digging."
    1. Re:Who writes this crap by Intrepid+imaginaut · · Score: 5, Insightful

      The above comment is precisely why these "social sciences" need to be delegitimised and rubber-roomed until they can figure out the meaning of the phrase "scientific method". Grant them no authority in deciding government policy, massively defund them in academia, get them out of the courtrooms, and generally pillory them for the witchdoctors they are.

      If you have to ask why, you're part of the problem.

    2. Re:Who writes this crap by Anonymous Coward · · Score: 2, Insightful

      Here's my challenge to individuals such as yourself who denigrate psychological science:

      How would *you* study behavior?

      It's very easy to dismiss behavioral sciences when you're not trying to study behavior. It's a very complex, difficult topic. E.g., how do you define depression? How do you define psychosis? How do you determine whether or not early childhood interventions actually have an effect on adult outcomes?

      Maybe you would argue that behavior shouldn't be approached scientifically, but that's a cop-out and leaving human experience to philosophers.

      I'm sick of ignorant arm-chair narcissists denigrating psychology when they don't have the balls to admit they have no clue how to approach the subject because it's too hard for them to understand.

      I'm sorry for sounding harsh, but then so are the critical comments here.

      And no, neuroscience is not psychology. There's an extremely fuzzy boundary, and they overlap tremendously, but they're not the same. To find the neural substrates of depression, you have to be able to measure depression. So you either study behavior or you don't.

      Yes, there's a replication crisis in psychology, but it's the same in all of science--it's everywhere in the biomedical sciences (e.g., everyone here knows of these studies, such as the big scandal over stem cell research that was all fake). And you don't hear physics being called a sham because of all the kooks publishing their poorly thought-out theories on studies on arXiv.org.

      Get over yourself and start trying to solve the problems you belittle.

  10. Re:Freud's problem too by sjwt · · Score: 5, Insightful

    Yup, like the recent one about men not being able to 'be alone with their own thouhgs'..

    That same data can also read 'Men, more willing to put up with pain' or 'Men, more curious and want to know what they may experience'

    --
    You have 5 Moderator Points!
    Which Helpless Linux zealot/MS basher do you want to mod down today?
  11. Re:replication = good by awol · · Score: 2

    No the asshat is not saying that if you cannot get the same results it's not science (in fact the exact opposite), but rather that if you cannot demonstrate that the experiment itself is replicable then it is not science. The contention in the article that in social sciences this lack of replication of experiment may just be a reality up with which we must put IS the reason why whatever you want to call it, it is not science.

    --
    "The first thing to do when you find yourself in a hole is stop digging."
  12. Re:If you can't replicate it... by Mister+Liberty · · Score: 2

    Define 'replicate'.

  13. Re:Wrong premice by khallow · · Score: 2

    On the contrary, increasing the sample size to big data sizes of say 2 billion subjects would definitely fix that bias problem.

    Not at all. For example, try extrapolating behavior from 2 billion young men to older women. You can have huge sample sizes and yet still have sample bias simply because you've excluded an important category (such as the people you actually wanted to study).

  14. Re:Define 'replicate' by petes_PoV · · Score: 2

    To replicate an experiment, you take the description of the conditions, tasks, environment, fixed independent and dependent variables, analytical method and results provided by the original experimenter in the (peer-reviewed) paper they published.
    If you can show the same results, with the same statistical significance, then it's reasonable to assume that the experiment shows a valid scientific phenomenon.

    If you can't then one of the two experiments got it wrong and more work is needed.

    The basic problem with social experiments, that are based on the judgement, feelings, or anything else that the studied group merely says it would / would-not do, thinks, feels, or otherwise emotes is completely subjective. Asking people how sad, happy, angry something makes them feel and rating that feeling - or the difference from previous values - has no scientific merit, as none of the terms used have any hard, scientific, definition and none of the participants have had their feelings "calibrated".

    It's little different from a scientist (a proper one) measuring electric voltage by sticking their tongue across two electrodes, or measuring distance by eyeballing it. The level of accuracy and standardisation the social "sciences" have at present puts them on a par with chemical research: phlogiston, fixed air (CO2) in the 17th century.

    As for being able to determine which variables are being measured - or even what all the variables are in their experiments, the social scientists have yet to discover their subject's version of fire.

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
  15. Simple solution by jd · · Score: 2

    Have a journal, call it Debunker's Weekly if you want, that is divided evenly between papers on replication and papers showing negative correlation at the start. Pay authors a nominal amount, according to the thoroughness of the work as judged by referees. Provide the journal free to University libraries. Submit summaries of major stories to Slashdot, The Guardian, various Skeptical societies and other places likely to raise the extreme ire of dodgy researchers. In fact, the more ire, the better.

    The journal doesn't have to last long. Just long enough to force bad researchers to improve or quit, force regular journals to publish a wider range of findings to avoid humiliation, and to correct dangerously erroneous beliefs. Since there must be a stockpile of unpublished papers of this sort, you should probably be able to get six or seven bumper editions out before anyone notices the dates, and maybe another two before the journal is sued into oblivion for defamation.

    That would be plenty to make some major course corrections and to "out" a few frauds.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  16. Re:Old saying by Zero__Kelvin · · Score: 2

    "Three times is a pattern"

    I sampled a random bit sequence just the other day. I can now assure you that a random bit stream is all ones! all friggin' ones I tell you!

    --
    Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  17. The problem with soft science experiments by PPalmgren · · Score: 3, Interesting

    There are plenty of good psychology experiments/case studies that produce a lot of really useful information and are repeatable (albeit over a very long period of time). The problem is there are also a lot of complete and utter ass psychology experiments. It is really really hard to produce a good study that provides useful results in soft sciences, and in cases of psychology, they take a very long time and sometimes a lot of money to complete. Yes, they have to account for a lot of variables and exclude them via statistical analysis, but the ones that do it right do it exceptionally well.

    I used to think negatively on those types of studies until I actually took the time to read one while helping my girlfriend with a paper. I was amazed at the level of detail and the amount of effort they took to isolate the results into meaningful data.