Major Scientific Journal Publisher Requires Public Access To Data

← Back to Stories (view on slashdot.org)

Major Scientific Journal Publisher Requires Public Access To Data

Posted by Soulskill on Tuesday February 25, 2014 @09:35AM from the open-all-day-every-day dept.

An anonymous reader writes "PLOS — the Public Library of Science — is one of the most prolific publishers of research papers in the world. 'Open access' is one of their mantras, and they've been working to push the academic publishing system into a state where research isn't locked behind paywalls and subscription services. To that end, they've announced a new policy for all of their journals: 'authors must make all data publicly available, without restriction, immediately upon publication of the article.' The data must be available within the article itself, in the supplementary information, or within a stable, public repository. This is good news for replicating experiments, building on past results, and science in general."

7 of 136 comments (clear)

Min score:

Reason:

Sort:

Good policy by MtnDeusExMachina · 2014-02-25 09:40 · Score: 5, Interesting

It would be nice to see this result in pressure on other publishers to require similar access to data backing the papers in their journals.
Does it really say ALL data? by RogueWarrior65 · 2014-02-25 09:49 · Score: 5, Insightful

And not just the data that was cherry-picked to support the hypothesis?
U.S. funding agencies too by PvtVoid · 2014-02-25 09:53 · Score: 4, Informative

Actually, the Obama administration has mandated open data for all federally supported research. Good news indeed.
Bad news for ecologists--new license needed by Bueller_007 · 2014-02-25 10:10 · Score: 4, Insightful

This is bad news for ecologists and others with long-term data sets. Some of these data sets require decades of time and millions of dollars to produce, and the primary investigators want to use the data they've generated for multiple projects. Current data licensing for PLOS ONE (and--as far as I know-- all others who insist on complete data archiving) means that when you publish your data set, it is out there for anyone to use for free for any purpose that they wish; not just for verification of the paper in question. There are plenty of scientists out there who poach free online data sets and mine them for additional findings.
Requiring full accessibility of data makes many people reticent to publish in such a journal, because it means giving away the data they were planning on using for future publications. A scientist's publication list is linked not only to their job opportunities and their pay grade, but also to the funding that they can get for future grants. And of course those grants are linked to continuing the funding of the long-term project that produced the data in the first place.
What is needed is a new licensing model for published data that says "anyone is free to use these data to replicate the results of the current study, however it CANNOT be used as a basis for new analyses without written consent of the primary investigator of this paper or until [XX] years after publication." Journals would also need to agree that they would not accept any publications based on data that was used without consent.
It seems to me that this arrangement would satisfy the need to get data out into the public domain while respecting the scientists who produced it in the first place.
1. Re:Bad news for ecologists--new license needed by JanneM · 2014-02-25 10:29 · Score: 4, Insightful
  
  On the other hand, if I don't have your data I can't check your results. If you want to keep your data secret for a decade, you really should plan to not publish anything relying on it for that time either. Release all the papers when you release the data.
  Also, who gets to decide when a study is a replication and when it is a new result? Few replication attempts are doing exactly the same thing as the original paper, for good reason. If you want to see if it holds up you want to use different analysis or similar anyway. And "use" data? What if another group produces their own data and compares with yours? Is that "using" the data? What if they compare your published results? Is that using it?
  A partial solution, I think, is for a group such as yours to pre-plan the data use already when collecting it. So you decide from start to publish a subset of that data early and publish papers based on that. Then publish another subset for further results and so on.
  But what we really need is for data to be fully citeable. A way to publish the data as a reserach result by itself - perhaps the data, together with a paper describing it (but not any analysis). ANyone is free to use the data for their own research, but will of course cite you when they do. A good, serious data set can probably rack up more citations than just about any paper out there. That will give the producers the scientific credit it deserves.
  
  --
  Trust the Computer. The Computer is your friend.
Practicalities by Roger+W+Moore · 2014-02-25 10:17 · Score: 5, Interesting

Open data is a great idea but it is not always practical. Particle physics experiments generate petabytes of extremely complex, hard to understand data. Making this publicly accessible is extremely expensive and ultimately useless since, unless you understand the innards of the detector and how it responds to particles and spend the time to really understand the complex analysis and reconstruction code there is nothing useful that you can do with the data. In fact one of the previous experiments I worked on went to great trouble to put their data online in a heavily processed and far easier to understand format in the hope that theorists or interested members of the public would look at the data. IIRC they got about 10 hits on the site per year and 1 access to the data.

So I agree with the principle that the public should be able to access all our data but for experiments with massive, complex datasets there needs to be a serious discussion about whether this is practical given the expense and complexity of the data involved. Do we best serve the public interest if we spend 25% of our research funding on making the data available to a handful of people outside the experiments with the time, skills and interest to access it given that this loss in funds would significantly hamper the rate of progress?

Personally I would regard data as something akin to a museum collection. Museums typically own far more than they can sensibly display to the public and so they select the most interested items and display these for all to see. Perhaps we should take the same approach with scientific data. Treat it as a collection of which only the most interesting selections are displayed to/accessible by the public even though the entire collection is under public ownership.
1. Re: Practicalities by Obfuscant · 2014-02-25 11:19 · Score: 4, Informative
  
  Whether or not you make the data publically available, you have to store and make it privately available,
  I have boxes and boxes of mag tapes with data on it from past experiments. That's privately available. It will never be publicly available.
  
  putting in public access is a matter of creating a read-only user and opening a firewall port.
  It is clear that you have never done such a thing yourself. There is a bit more to it than what you claim. I've been doing it for more than twenty years, keeping a public availability to much of the data we have (but not all -- tapes are not easily made public that way), and there is a lot more to dealing with a public presence than just "a read-only user and a firewall port".
  
  The sad thing is that most scientists don't actually store their data properly, it sits on removable hard drives, cd or an older variant of portable media
  
  And now you point out the biggest issue with public access to data: the cost of making it online 24/7 so the "public" can maybe sometime come look at the data. Removable hard drives are perfectly good for storing old data, and they cost a lot less than an online raid system. For that data, that is storing it "properly".
  If you want properly managed, publicly open data for every experiment, be prepared to pay more for the research. And THEN be prepared to pay more for the archivist who has to keep those systems online for you after the grants run out. And by "you", I'm referring to you as the public.
  Researchers get X amount of dollars to do an experiment. Once that grant runs out there is no more money for maintenance of the online archive, if there was money for that in the first place. For twenty two years our online access has been done using stolen time and equipment not yet retired. When the next grant runs out, the very good question will be who is going to be maintaining the existing systems that were paid for under those grants. Do they just stop?