Research Data: Share Early, Share Often
Shipud writes "Holland was recently in the news when a psychology professor in Tilburg University was found to have committed large-scale fraud over several years. Now, another Dutch psychologist is suggesting a way to avert these sort of problems, namely by 'sharing early and sharing often,' since fraud may start with small indiscretions due to career-related pressure to publish. In Wilchert's study, he requested raw data from the authors of some 49 papers. He found that the authors' reluctance to share data was associated with 'more errors in the reporting of statistical results and with relatively weaker evidence (against the null hypothesis). The documented errors are arguably the tip of the iceberg of potential errors and biases in statistical analyses and the reporting of statistical results. It is rather disconcerting that roughly 50% of published papers in psychology contain reporting errors and that the unwillingness to share data was most pronounced when the errors concerned statistical significance.'"
What did you expect?
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
"It is rather disconcerting that roughly 50% of published papers in psychology contain reporting errors"
How many errors are present in this statement? Just saying.
"Trust me I'm a scientist" isn't good enough anymore?
EOM
It'll need money to store and make available and staff to manage.
So who is going to accept an increase in taxes to allow this to happen?
...most people who do it are downright bad at it. That they might take more time and care to be good at it without the perpetual axe of publish-publish-publish and grants funding hanging over their heads is another issue all together.
It is very difficult to make a man understand something when his job depends on not understanding it. If psychology research were made to adhere to any kind of stringent scientific standard, there would be no psychology research.
Don't believe anything that hasn't been verified by an independent group of researchers.
I do research in textual web mining and from time to time I have other researchers ask me for my collections which I spider myself from copyrighted web sources. While my work is purely academic, I am covered by fair use. But since US intellectual property laws are obtuse and overbearing (imho), I cannot take the risk of sharing my collections with others for fear of running afoul of copyright law (since I can't control what is done with the collection once it is out of my hands and how do I know they would use it in a manner consistent with fair use). So it may be more than an unwillingness out of statistical fudging and more an unwillingness to become a target of copyright lawyers.
I called it a mighty Sperm Whale, she called it Finding Nemo.
One reason scientist's don't share is because if the data gets out early and gets around (damn slutty data) is that other scientist's might steal/copy/scope/whatever the data. Unless there is a great way to prevent this the suggestion proposed here will never go anywhere.
As much as I am skeptical of the feedback methods for their models, they seem pretty open with the data now.
The IPCC doesn't know about this. Or does this only apply to the "soft sciences"?
Dog is my co-pilot.
The so-called "4th-stage clinical trial" is to study patients after the drug is released to the public. There may be thousands of times more patients than the first three stages. But is can cost eight figures to finish stage 3.
Some probes like Mars Rovers, Cassini, SOHO post their data on the web within days. Others like kepler and ESA-Express have posted very little of their data. The tradition is for Principal Investigators to embargo the data one year.
Psychologist's statistical study suggesting that psychologists have possible psychological issues with sharing their psychological studies... perhaps this warrants a further psychological study of said psychologists?
I8-D
Einstein was unable to find a teaching post, and was working in a patent office when he published his annus mirabilis papers. Things have changed over the years though. John Dewey discovered a century ago how children best learned - let the child direct his own learning, and have an adult to facilitate this. This, of course, is not how children are taught. Things nowadays are very test-heavy, and becoming even more so, not as a means to help students in seeing what their deficiencies are, but as a punishment system - and the teachers, and the administrators are under the same punishment system. The carrot of reward is very vague and ill-defined and far-off. It is a system designed to try to squelch the curiosity of those handful of students who had been curious and wanted to learn. Businesses want to get into the education gravy train, and all this charter school stuff is being embraced by both parties, which isn't surprising if you look at the funding behind it.
At the university, the financial incentives are all aligned so that publishing is a necessity. If one does not publish, they do not get tenure, and then all those years of work were for naught as the academic career is over. And what gets published? An average series of experiments done by the scientific method would usually lead to either inconclusive data and results, or just wind up in a dead end. And what journal wants to publish those results after months of work? One of the most popular Phd comics is this one. It seems fairly obvious to me - the more financial incentives are tied to getting published, the more that bogus studies are going to be published. As far as the idea of honesty, integrity or whatever, these things will gradually subside for most people when they come into conflict with keeping a roof over one's head and food on the table.
...most people who do it are downright bad at it. That they might take more time and care to be good at it without the perpetual axe of publish-publish-publish and grants funding hanging over their heads is another issue all together.
I agree and I can think of something to illustrate your point.
I was listening to a This American Life episode a few weeks back and there was a story done on two people -- one a music professor and the other a respected oncologist -- who were investigating a long defunct theory that certain electromagnetic wavelengths can kill cancer cells and only cancer cells leaving healthy cells completely fine. When left to run the test, the music professor failed to maintain the control correctly and many other things. But after being corrected by the respected researcher they started getting positive sets of preliminary results. The respected researcher requested that the music professor not share this with anyone and not to attach his name to it just yet.
Well, the music professor did not follow this advice because he was so excited about the preliminary results and had, I guess, sort of felt like the respected researcher had short changed him and suppressed him. What the music professor wanted to do was blow the lid off this thing with possibly flawed data and sent it to other oncologists with the original researcher's name attached to it -- possibly misrepresenting it as flawed data. Now I can see why a researcher might fly off the handle when data is released extremely early. They were having problems recreating their own findings (with sham-control) which caused the original researcher to want to keep this very much out of the public's eye. You might claim he was just trying to save himself embarrassment but there's nothing embarrassing about finding out your hypothesis is wrong in science, I just think the best researchers avoid these "failures" and the subsequent investment of resources into them.
I think that scientists figure out how to create the most data and separate the wheat from the shaft in a very lengthy (think decades) long process whereas the first sign of a breakthrough might cause more inexperienced researchers to show the world. And the reason, as you mentioned, is probably the immediate funding they can get with it. But I think it badly neuters scientific news, the reward system and even the direction that research takes. But to release and share early on and often might just make everyone look bad when the whole background of the data is unknown to someone who receives it.
My work here is dung.
Ultimately, everyone agrees that open sharing of research data funded by the taxpayers would be A Good Thing(TM). The problem is: how do you persuade people to actually do it. Much how things like advanced safety features on cars, free college tuition, and taxes on big banks sound like great ideas, until you look at what it will actually cost to implement. Not just "cost" in terms of money for infrastructure development, data storage, and support, but in terms of persuading an entire culture to change their workflow.
In our lab, we already spend an extraordinary amount of time on administrative tasks only indirectly related to our research. Adding in a mandatory data sharing task and fielding questions from random people who wanted to use it would be a serious additional chore. Then there's the embarrassment aspect... we actually had a project a couple months ago where there was another group doing an experiment that we wanted to do, and they had software already written. So we thought, "great, we'll just ask them for the code". So we fired off an email... and after a couple weeks we finally got a reply to the effect of "this is actually my first program, and I don't feel comfortable sharing it." So we had to spend 2-3 months writing our own version to do exactly the same thing.
In my field this won't work - you share either on conferences or via a paper. Most researchers also only share on conferences when they have the corresponding manuscript more or less accepted. It's just too risky and expensive to get scooped.
The NSF is now requiring this as part of grant applications. You have to have a data management plan that includes the public deposit of both the data and results from grant funded work. Other funding orgs are following suit.
This is a fairly major project at the university I work for, both from the in-process data management perspective (keeping field researchers from storing their only copies on thumbdrives and laptops) and from the long-term repository perspective for holding the data when the grant is completed (that's what I'm involved with).
Storage is cheap. Convincing university administrators to pay for keeping it accessible is another problem, but the NSF position is helping.
If Star Trek had the internet: Captain, we've received an IM from the romulans. "Surrender or be destroyed. LOL. o.O"
Surely people aren't just going to turn over the means to get themselves charged with fraud out of the goodness of their hearts. Somehow this has to be made mandatory by the institutions or the publications that they hope to present their work (as suggested in the second linked article; and as I understand some of the top medical journals do nowadays).
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Regardless of where the data was from and what agreements covered its release, the stated purpose of the people in the emails was to keep the data out of the hands of their opponents regardless of open freedom of information requests, and Phil Jones himself said he would be "hiding behind" the agreements in order to keep the data from being released. There was quite literally a conspiracy not only to avoid normal scientific sharing of information, but legal freedom of information requests.
As scientists, they should be discredited and shamed.
As public employees, they should be fired or put in jail.
Having something to hide. In some cases it is error or bias. What other attributes are "something to hide?". And why didn't the researchers disclose them? What didn't they know, and when did they not know it ?
Gently reply
Dude, no. Ask for raw data, then ask again if it's really, really, raw.
As much as I am skeptical of the feedback methods for their models, they seem pretty open with the data now.
I thought one of the issues during the tempest was that some original sensor data was not maintained and had been lost, and only interpreted or adjusted data was currently available?
Wow, very convincing, though I wish it was shorter. I've thought for a while that most of the shrinks were in it to make $, not cure people. CBT may be legitimate, but most 'therapy' is BS.
I do research in textual web mining and from time to time I have other researchers ask me for my collections which I spider myself from copyrighted web sources. While my work is purely academic, I am covered by fair use. But since US intellectual property laws are obtuse and overbearing (imho), I cannot take the risk of sharing my collections with others for fear of running afoul of copyright law (since I can't control what is done with the collection once it is out of my hands and how do I know they would use it in a manner consistent with fair use). So it may be more than an unwillingness out of statistical fudging and more an unwillingness to become a target of copyright lawyers.
Why would that be an issue? The onus would be on the people you share the data with it do keep it in the fair use domain. An analogy would be a professor quoting some copyrighted text in a syllabus and then saying she couldn't give a copy of the syllabus to another professor (or student) because she can't control what they do with it.
There is a difference between copying a brief excerpt in a fair use context and copying the complete copyrighted work. The key point is that the fellow researchers want the complete data set, complete copies of copyrighted works. The original researcher is correct to fear legal consequences and regrettably should consult an attorney before sharing such a data set. Alternatively the original researcher should have logged the URLs where the original data was found and provided these URLs to fellow researchers, they could harvest their own copies. Admittedly some content may have been taken down or changed.
Sounds like this guy has been reading a little classic from the good Dr. Szasz.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
This does show that the pressure to overstate certainty of the results is more common in academia than is otherwise claimed. This is not limited to psychology. Human beings respond to incentives. And lack of requirement to publish data acts as an incentive to overstate certainty.
Any guest worker system is indistinguishable from indentured servitude.
In other words made up bullshit, with no cohesive theory tying it all together. So made up data is the least of their concerns...
well that is at least 50% made of iron...
Everybody Knows that Psychology is about as real a science as Astrology anyway - one only needs to look at the farce that is the DSM - the Psychology Bible - to appreciate that
If it's the median, then it means what the summary is implying.
If it's the mean, it could just be that the ones who do commit fraud are skewing the sample. And that if you subtract them, the error rate of the people who don't share is no different than the error rate of those who do share.
Not saying it's one or the other. Just pointing out that in this particular case, it's a very important distinction.
That is a gross misrepresentation. Until fairly recently, even the claim that it has gotten warmer was based on extremely sloppy by climatologists who lacked expertise in statistics (the ASA itself criticized that work). The fact that the statistical work has been cleaned up now doesn't change the fact that the original work was sloppy and criticism of it was valid. And what has been shown is just that it has been getting a little warmer on average, over some large regions, nothing else.
The second gross misrepresentation implying that the observation of man-made human warming has automatic policy implications, like the proposed reductions in CO2 emissions. In fact, the "consensus of scientists" is absolutely clear: there is no plausible scenario in which human CO2 emissions make our planet uninhabitable. Even the IPCC doesn't believe that humans will be able to melt the polar ice caps under the worst case scenario. Even their worst case forecasts amount to only a tiny fraction of what humanity has experienced over the last 20000 years anyway. Sea level rise, islands flooding, expansion of equatorial deserts, balanced by increasing habitability of northern zones, are climate changes humanity has not only coped with but thrived under.
Greenpeace estimates a cost of $156 billion for a 1m sea level rise for the US (multiply that by 4-5 for the whole world). But the IPCC predicts only a 60cm rise over a hundred years if nothing changes (and even that requires some assumptions of unobserved feedback mechanisms). The cost of sea level rise is therefore negligible even according to Greenpeace's and the IPCC's own data and predictions. The other consequences of climate change are likely even less costly. That's the so-called "climate catastrophe" we are supposedly facing.
Well, fortunately we live in a democracy in the US, and "believe me, we are the experts" is not good enough to get people to spend trillions of dollars to save a billion or so every year. That kind of technocratic b.s. may work in the EU but not in the US. Here, either the "experts" present their work in a plausible and reproducible way, or nothing's gonna happen. Handwaving, iffy statistics, and gigantic computational climate models aren't going to cut it. And insulting people isn't gonna work either.
McIntyre hasn't managed it in 12 years...
Why is this? Because denialists don't WANT to find out the answers, they only want to claim AGW is wrong. the Muir enquiry got the data and did the PCA analysis on the data and got a hockey stick in two weeks, start to finish, because they were investigating whether the complaint you just made about "unable to get the data" was valid. They were skeptical and worked to find out themselves.
You're a denialist and don't want to find out if you're right or wrong: you MUST BE RIGHT.
Wow, this thread is so full of primitive backwards prejudice, it's not even funny. Every time I see computer/physics/math nerds talk about psychology, this outdated beaten old horse comes up and gets parroted around.
Psychology was not based on anything in the past. That time is long gone and over! Psychology went from the top down, and neurology from the bottom up. Neurology failed. Hard. Because it did deliberately consider the big picture a taboo. And psychology failed, because it didn't have any underpinnings.
Nowadays, they merged, and now psychology is built on neurology, which is built on chemistry and physics.
But you emotionally and socially incompetent losers still babble about the same old shit, despite not a single one of you knowing anything about it.
I recommend checking out schema therapy, and how easily it translates to neurology. Basically it's nothing more than
1. Put the person in a environment that allows feedback loops, and introduce the type of neural input that causes problems to the person. Make it as similar as possible.
2. Keep the person inside that loop through therapeutical nurturing (basically the same thing a mother would do to a small child that is in pain). This makes the mental pain that is triggered by step 1 bearable without any meds whatsoever.
3. Do this until the weak (=repressed) neural links are triggered, resulting in the person remembering the past (even if from very early childhood!) intensively, allowing the grown-up mind to process it. This means the old links are strengthened (re-learning), and the person understands the reasons for his problems again.
4. Now with that knowledge, a simple training of the correct associations will fix even the most deep lying (in terms of the superposition of information in neural nets) mental problems. For this introduce strong positive input together with a very close simulation of the situation that caused the mental problem. Repeat until the neurons store the correct/wanted view of the world. (Beware, as this can easily be abused.)
5. If this takes too long, LSD can be used to accelerate the learning / plasticity. But be warned, because if the mental state and input is even slightly wrong, this causes serious unwanted changes in character and loss of old information. This step is only for the daring with no other hope.
This method works so well, that it even works on simulated neural nets, animals, etc! It is neurologically extremely solid, and proven to work. I can testify this from having worked with it and tried it myself. The great thing is how generic it is. Everything that is not a purely chemical or genetic problem, can be fixed with it in very short times. (The more intense, and the more painful, the quicker. The only problem is, that your therapist might not have the energy/skill to take that much pain from you.)
The only problem is, that you psychotherapist around the corner knows nothing about it, and still lives in his old learned pseudo-science from the 60s or even Freud. (Listen to the kind of terms he uses. If he uses words that fit into neurology, he's modern. If he uses words that Freud would use and that just aren't based on anything, stay the fuck away.)
[Hell, some of them still insist on strictly not even hugs, and don't understand that as long as the patient talks, he can not focus hence and not create that feedback loop! EPIC FAIL!]