Major Scientific Journal Publisher Requires Public Access To Data

← Back to Stories (view on slashdot.org)

Major Scientific Journal Publisher Requires Public Access To Data

Posted by Soulskill on Tuesday February 25, 2014 @09:35AM from the open-all-day-every-day dept.

An anonymous reader writes "PLOS — the Public Library of Science — is one of the most prolific publishers of research papers in the world. 'Open access' is one of their mantras, and they've been working to push the academic publishing system into a state where research isn't locked behind paywalls and subscription services. To that end, they've announced a new policy for all of their journals: 'authors must make all data publicly available, without restriction, immediately upon publication of the article.' The data must be available within the article itself, in the supplementary information, or within a stable, public repository. This is good news for replicating experiments, building on past results, and science in general."

136 comments

Min score:

Reason:

Sort:

Good policy by MtnDeusExMachina · 2014-02-25 09:40 · Score: 5, Interesting

It would be nice to see this result in pressure on other publishers to require similar access to data backing the papers in their journals.
1. Re:Good policy by Pseudonym · 2014-02-25 11:32 · Score: 2, Interesting
  
  You know who needs to introduce this rule? The ACM.
  I'm fed up with so-called scientific papers with results based on proprietary software. It doesn't even have to be open source, though that would clearly be good for peer review. If I can't (given appropriate hardware and other appropriate caveats) run your software, I can't replicate your results. If I can't replicate your results, it's not science.
  
  --
  sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
2. Re:Good policy by kimvette · 2014-02-25 14:37 · Score: 1
  
  I'd say that if they want the data to be publicly accessible without restriction, they should make the published journals publicly accessible without restriction.
  
  --
  The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
Fantastic. by jpellino · 2014-02-25 09:40 · Score: 2

Will cut a lot of nonsense out of reading stuff into the results.

--
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
1. Re:Fantastic. by Anonymous Coward · 2014-02-25 10:56 · Score: 0
  
  Yes and no, depending on specific circumstances. It would be very nice to have access to the complete next gen sequencing data "in the raw" to be able to validate the findings of a paper independently, however just having access to the raw data does not mean that anybody can reconstruct the same information contained in 10-40 billion nucleotides split into 75-100 base pair reads. You need the proprietary software and a massively parallel mainframe to process this "data". A sequencing run takes 18-24 hours to collect raw reads, 2 weeks to 2 months to assemble and polish the results, depending on the availability of an existing scaffold to build on.
  In my case, if I collect a 4-wavelength X-ray dataset on one protein crystal, that's close to 12 Gb of gzipped images. How would PLoS like these "made freely available"?
  The intention is good, the policy is idiotic.
2. Re:Fantastic. by Anonymous Coward · 2014-02-25 14:04 · Score: 0
  
  I think they'd first check to see if your own servers can be considered a stable public repository.
3. Re:Fantastic. by turkeyfish · 2014-02-25 18:57 · Score: 1
  
  Yes, but increase the amount of reading necessary for each paper by orders of magnitude.
4. Re:Fantastic. by turkeyfish · 2014-02-25 19:37 · Score: 1
  
  And to make it worse, someone then discovers that many of the original specimens from such a study were not saved and hence the identifications used in the study can not be duplicated, making the study worthless, because no one can be sure from which organism or linneage the sequences were actually collected from.
The actual result by eexaa · 2014-02-25 09:44 · Score: 1

Tables with improvement percentage readings will be less excellent.
Does it really say ALL data? by RogueWarrior65 · 2014-02-25 09:49 · Score: 5, Insightful

And not just the data that was cherry-picked to support the hypothesis?
1. Re:Does it really say ALL data? by gzuckier · 2014-02-26 07:48 · Score: 1
  
  And not just the data that was cherry-picked to support the hypothesis?
  Well that's kind of a good point, whether or not the OP was sarcastic. Every research project generates reams of data that gets flushed for various reasons. Often the original impulse for the research was a vague wiggle in the data from another, uncontrolled experiment. Where do you draw the line? For an example, I point to all the otherwise apparently rational people who point to every single reported atmospheric CO2 concentration from the earliest days of CO2 measurement as evidence that the current CO2 level is not historically very high; apparently they'd rather believe that the level fluctuated by a factor of 10X on a daily basis back then (but oddly, doesn't fluctuate any more) than that some of the earliest measurements were in error. E. g. http://rps3.com/Files/AGW/Engr... page 19.
  
  --
  Star Trek transporters are just 3d printers.
Not such good news for getting paid. by Kaz+Kylheku · 2014-02-25 09:52 · Score: 2

Public results? Anyone can take your work and use it for something profitable, while you scrape for grants to continue.
1. Re:Not such good news for getting paid. by NotDrWho · 2014-02-25 09:59 · Score: 1
  
  Anyone can take your work and use it for something profitable
  And patent it.
  
  --
  SJW's don't eliminate discrimination. They just expropriate it for themselves.
2. Re:Not such good news for getting paid. by Anonymous Coward · 2014-02-25 12:29 · Score: 0
  
  That was the mythical ethic. That Science was supposedly based upon openness, according to the indoctrination we all received in public school. There's no reason that publicly funded research shouldn't be that open for scrutiny and as the basis for anyone's efforts. Additionally, if it can be shown that private research uses publicly developed IP, it and its derivations should all be free from restriction.
  Capitalism is broken, starting with the patent system and its preference for large overly powerful corporate interests that tend to concentrate power and seek monopolies. We need to require broader distribution of wealth faster development of technology in order to foster healthy, sustainable human systems.
3. Re:Not such good news for getting paid. by Anonymous Coward · 2014-02-25 19:53 · Score: 0
  
  Anyone can take your work and use it for something profitable
  And patent it.
  Patenting it is going to be a bit harder, since to patent something it needs to be novel. And this means, among other things, that the patent submission has to be done (even by you) before the submission to the journal.
4. Re:Not such good news for getting paid. by Anonymous Coward · 2014-02-25 23:53 · Score: 0
  
  Well, yes. And their taxes then pay for the researcher's next grant. That's how publicly-funded research works.
5. Re:Not such good news for getting paid. by Anonymous Coward · 2014-02-26 00:05 · Score: 0
  
  Mark this as ironically funny.
  How dare the public demand that the data from publicly funded science research be made available for all.
U.S. funding agencies too by PvtVoid · 2014-02-25 09:53 · Score: 4, Informative

Actually, the Obama administration has mandated open data for all federally supported research. Good news indeed.
1. Re:U.S. funding agencies too by Anonymous Coward · 2014-02-26 08:24 · Score: 0
  
  Access to data and publications for government supported research has always been a right arising under the 9th Amendment, as part of the right of public oversight over government. It's nice to see the government finally recognizing this.
Yes! by AndyKron · 2014-02-25 09:54 · Score: 2

Awesome. Simply awesome
Global warming advocates won't like that. by Anonymous Coward · 2014-02-25 09:55 · Score: 0

They have resisted showing data for years.
I hope this helps, though the warmistas have their own favorite journals...not PLOS.
1. Re:Global warming advocates won't like that. by Anonymous Coward · 2014-02-25 10:24 · Score: 0
  
  Maybe they'll just publish the shrinking volumes of just about every glacier in the world.
2. Re:Global warming advocates won't like that. by PRMan · 2014-02-25 10:52 · Score: 1
  
  You mean the ice that's so thick that they had to shut down the crab season 3 years in a row in Alaska and the fact that you can't even take boats through Northern Canada anymore?
  
  --
  Peter predicted that you would "deliberately forget" creation 2000 years ago...
3. Re:Global warming advocates won't like that. by Anonymous Coward · 2014-02-25 20:27 · Score: 0
  
  You do understand that Global warming is a Global phenomenon, right? And that shifts may occur, year by year, in where the cold and warm bits are, due to jet stream variation, so that, say, the arctic (and most of Europe) is much warmer than it usually while, at the same time, there is a huge cold front over the continental US?
And in open formats? by Anonymous Coward · 2014-02-25 10:02 · Score: 1

It would be nice also if journals got on the bandwagon and accepted open formats (OpenDocument) instead of proprietary file formats like .doc and not fully open formats like .docx.
1. Re:And in open formats? by K.+S.+Kyosuke · 2014-02-25 10:13 · Score: 1
  
  OpenDocument
  Haha, good joke!
  
  --
  Ezekiel 23:20
good and bad by eli+pabst · 2014-02-25 10:04 · Score: 3, Interesting

Will be interesting to see how this is balanced with patient privacy, in particular with the increasing numbers of human genomes being sequenced. I know a large proportion of the samples I work with in the lab have restrictions on how the data can be used/shared due to the wording of the informed consent forms. Many would certainly not allow public release of their genome sequence, so publishing in PloS (or any other journal with this policy) would be impossible. So while I think the underlying principle is good, I think an unintended consequence might be less privacy for patients wanting to participate in research (or less patients electing to participate at all).
1. Re:good and bad by canowhoopass.com · 2014-02-25 10:13 · Score: 3, Informative
  
  The linked blog specifically mentions patient privacy as an allowable exception. They also have exceptions for private third party data, and endangered species data. I suspect they want to keep the GPS locations for white rhino's hidden.
2. Re:good and bad by LourensV · 2014-02-25 10:20 · Score: 1
  
  I work with data collected by others, and those others are typically rather protective of their data for commercial reasons. I can use them for scientific purposes, but I'm not allowed to publish them in raw form. For most of these data there are no alternatives. I'd much rather publish everything of course, but that's impossible in this case, so I wonder if that means that I can't publish in PLOS any more now?
  Just to be clear, I applaud this move, we should be publishing the data, plus software and such, where possible. Anyone happen to have a spare couple of tens of millions of euro lying around? That would probably free the data I'm using...
You hit the nail on the head by Anonymous Coward · 2014-02-25 10:09 · Score: 0

This may have severe repercussions for how patient samples are collected. Especially in this day and age with so many privacy concerns left and right.
Bad news for ecologists--new license needed by Bueller_007 · 2014-02-25 10:10 · Score: 4, Insightful

This is bad news for ecologists and others with long-term data sets. Some of these data sets require decades of time and millions of dollars to produce, and the primary investigators want to use the data they've generated for multiple projects. Current data licensing for PLOS ONE (and--as far as I know-- all others who insist on complete data archiving) means that when you publish your data set, it is out there for anyone to use for free for any purpose that they wish; not just for verification of the paper in question. There are plenty of scientists out there who poach free online data sets and mine them for additional findings.
Requiring full accessibility of data makes many people reticent to publish in such a journal, because it means giving away the data they were planning on using for future publications. A scientist's publication list is linked not only to their job opportunities and their pay grade, but also to the funding that they can get for future grants. And of course those grants are linked to continuing the funding of the long-term project that produced the data in the first place.
What is needed is a new licensing model for published data that says "anyone is free to use these data to replicate the results of the current study, however it CANNOT be used as a basis for new analyses without written consent of the primary investigator of this paper or until [XX] years after publication." Journals would also need to agree that they would not accept any publications based on data that was used without consent.
It seems to me that this arrangement would satisfy the need to get data out into the public domain while respecting the scientists who produced it in the first place.
1. Re:Bad news for ecologists--new license needed by JanneM · 2014-02-25 10:29 · Score: 4, Insightful
  
  On the other hand, if I don't have your data I can't check your results. If you want to keep your data secret for a decade, you really should plan to not publish anything relying on it for that time either. Release all the papers when you release the data.
  Also, who gets to decide when a study is a replication and when it is a new result? Few replication attempts are doing exactly the same thing as the original paper, for good reason. If you want to see if it holds up you want to use different analysis or similar anyway. And "use" data? What if another group produces their own data and compares with yours? Is that "using" the data? What if they compare your published results? Is that using it?
  A partial solution, I think, is for a group such as yours to pre-plan the data use already when collecting it. So you decide from start to publish a subset of that data early and publish papers based on that. Then publish another subset for further results and so on.
  But what we really need is for data to be fully citeable. A way to publish the data as a reserach result by itself - perhaps the data, together with a paper describing it (but not any analysis). ANyone is free to use the data for their own research, but will of course cite you when they do. A good, serious data set can probably rack up more citations than just about any paper out there. That will give the producers the scientific credit it deserves.
  
  --
  Trust the Computer. The Computer is your friend.
2. Re:Bad news for ecologists--new license needed by Anonymous Coward · 2014-02-25 10:48 · Score: 0
  
  ...
  What is needed is a new licensing model for published data that says "anyone is free to use these data to replicate the results of the current study, however it CANNOT be used as a basis for new analyses without written consent of the primary investigator of this paper or until [XX] years after publication." Journals would also need to agree that they would not accept any publications based on data that was used without consent.
  It seems to me that this arrangement would satisfy the need to get data out into the public domain while respecting the scientists who produced it in the first place.
  Oh yeah, that'll work.
  Because scientists never plagiarize nor steal data. After all, they're scientists
  :-/
3. Re:Bad news for ecologists--new license needed by Anonymous Coward · 2014-02-25 11:06 · Score: 0
  
  "Release all the papers when you release all the data" is not realistic.
  I'm not going to collect data for 40 years without publishing something along the way. I won't be able to get the funding if no papers are coming out of the project over that time period.
4. Re:Bad news for ecologists--new license needed by Anonymous Coward · 2014-02-25 11:09 · Score: 0
  
  It wouldn't be difficult to force a publisher to issue a retraction for using what would amount to stolen data under that licensing agreement.
5. Re:Bad news for ecologists--new license needed by Arker · 2014-02-25 11:16 · Score: 2
  
  "What is needed is a new licensing model for published data that says "anyone is free to use these data to replicate the results of the current study, however it CANNOT be used as a basis for new analyses without written consent of the primary investigator of this paper or until [XX] years after publication." " I could not disagree more. What is needed here is to deal with the real problem - the issues that force working scientists into a position where doing good science (publishing your data) can harm your career. Slapping a band-aid on a symptom without addressing the fundamental malfunction here is guaranteed to make things worse, not better.
  
  --
  =-=-=-=-=-=-=-=-=-=-=-=-=-=-
  Friends don't let friends enable ecmascript.
6. Re:Bad news for ecologists--new license needed by Crispy+Critters · 2014-02-25 11:33 · Score: 2
  
  "There are plenty of scientists out there who poach free online data sets and mine them for additional findings."
  Right. This leads to a two-class system where the scientists that collect the data (and understand the techniques and limitations) are treated as technicians while those that perform high-level analysis of others' results get the publications. This can lead to unsound, unproductive science in may cases. Those who understand the details are not motivated, and the superficial understanding of those that write the publications leads to errors.
7. Re:Bad news for ecologists--new license needed by the+gnat · 2014-02-25 11:36 · Score: 3, Interesting
  
  Some of these data sets require decades of time and millions of dollars to produce, and the primary investigators want to use the data they've generated for multiple projects. . . There are plenty of scientists out there who poach free online data sets and mine them for additional findings.
  I work in a field (structural biology) that had this debate back when I was still in grade school: the issue was whether journals should require deposition of the molecular coordinates in a public database, or later, should these data be released immediately on publication, or could the authors keep them private for a limited time. The responses at the time were very instructive: one of the foremost proponents of data sharing was accused of trying to "destroy crystallography as we know it", to which his response was yes, of course, but how was that a bad thing? Skipping to the punchline: nearly every journal now requires immediate release of coordinates and underlying experimental data immediately upon publication, during which time the field has grown exponentially and there have been at least six Nobel prizes awarded for crystallography (at least one of which went to an early opponent of data sharing). The top-tier journals (Science, Nature) average about a paper per week reporting a new structure. Not only did the predicted dire consequences never happen, the availability of a large collection of protein structures has actually accelerated the field by making it easier to solve related sturctures (and easier to test new methods), and facilitated the emergence of protein structure prediction and design as a major field in its own right.
  The question I'm worried about: what form do the data need to take? Curating and archiving derived data (coordinates and structure factors) is already handled by the Protein Data Bank, but the raw images are a few orders of magnitude larger, and there is no public database available. Most experimental labs simply do not have the resources to make these data easily available. (The exceptions are a few structural genomics initiatives with dedicated computing support, but those are going away soon.)
8. Re:Bad news for ecologists--new license needed by Bueller_007 · 2014-02-25 11:36 · Score: 2
  
  Release all the papers when you release the data.
  Not going to happen. You need to publish during the data collection period in order to continue getting the funding you need for data collection.
  
  Few replication attempts are doing exactly the same thing as the original paper, for good reason.
  Right, but replication of the experiment is the EXACT reason that we're making the data available. If you want to use the data for something else, that's fine, but if it's data that the original author is still using, then you should contact them about it first.
  
  A partial solution, I think, is for a group such as yours to pre-plan the data use already when collecting it. So you decide from start to publish a subset of that data early and publish papers based on that. Then publish another subset for further results and so on.
  Again, this is not realistic in the overwhelming majority of cases. One of the benefits of long-term studies are the unexpected findings. Imagine that I've been collecting data on a population of lemmings over the last 20 years. It seems to me that the lemmings have been getting smaller since I first started capturing them, so one day I decide to regress body size on year and I discover that the lemmings have indeed been shrinking, and I can show that it is probably linked to changes in vegetation driven by climate change. I shouldn't have to give away my entire 20-year data set (which I had been collecting for a different purpose) for anybody to use for any purpose in order for me to get this one study out in a timely fashion.
  Besides, many researchers are already dealing with data sets that are >50 years old, and your "plan to release the data before you start collecting the data" suggestion is moot for those people with inherited data sets.
  
  But what we really need is for data to be fully citeable.
  Getting your data cited is not NEARLY the same as publishing. Not even close. To get academic positions, pay increases, grants, etc., you need authorship. No one really cares about how often your paper or your data has been cited. That info isn't even on your CV or your grant applications, so no one will even have a rough idea unless it's a particularly preeminent paper.
9. Re:Bad news for ecologists--new license needed by the+gnat · 2014-02-25 11:41 · Score: 2
  
  This leads to a two-class system where the scientists that collect the data (and understand the techniques and limitations) are treated as technicians while those that perform high-level analysis of others' results get the publications.
  Maybe in some fields, but in genomics and molecular biology, the result tends to be exactly the opposite: the experimentalists (and their collaborators) get top-tier publications, while the unaffiliated bioinformaticists mostly publish in specialty journals.
10. Re:Bad news for ecologists--new license needed by Crispy+Critters · 2014-02-25 11:46 · Score: 1
  
  Good to hear. Unfortunately, it does happen in other fields. (Should have said "can lead...")
11. Re:Bad news for ecologists--new license needed by oldhack · 2014-02-25 11:55 · Score: 1
  
  This is preposterous. Unless you self-funded your work, you don't own the data. The people who give out grants don't intend it for you to spend for your own benefit.
  
  --
  Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
12. Re:Bad news for ecologists--new license needed by Anonymous Coward · 2014-02-25 12:34 · Score: 0
  
  How is it bad? Ecologists who hoard data facilitate the demise of the environment they study. Only be establishing the makeup of systems, in order the avoid the problem of Shifting Baselines, do ecologists, who are all grant funded, establish known baselines which can be used to foster protection and inform public debate.
  Where's the down side?
13. Re:Bad news for ecologists--new license needed by Michael+Woodhams · 2014-02-25 12:52 · Score: 2
  
  There are plenty of scientists out there who poach free online data sets and mine them for additional findings.
  And this is a good thing, despite your word "poach". Analyses which would not have occurred to the original experimenters get done, and we get more science for our money. For many big data projects (e.g. the human genome project, astronomical sky surveys), giving 'poaching' opportunities is the primary purpose of the project.
  A former boss of mine once, when reviewing a paper, sent a response which was something like this:
  "This paper should absolutely be published. The analysis is completely wrong, but it is a wonderful data set, and somebody will quickly publish a correct analysis once the data is available."
  Now I need to stop wasting time on /. and return to my work in hand, which, as it happens, is 'poaching' data from
  Ingman, M., H. Kaessmann, S. Paabo, and U. Gyllenstern. 2000.
  Mitochondrial genome variation and the origin of modern humans. Nature 408:708--713.
  
  --
  Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
14. Re:Bad news for ecologists--new license needed by g01d4 · 2014-02-25 13:27 · Score: 1
  
  There are plenty of scientists out there who poach free online data sets and mine them for additional findings.
  I think the additional findings are part of what science is all about. How do scientists 'poach' something that's free? Did you think waiting many decades for the Dead Sea Scroll results was acceptable?
  If data is that expensive to collect, then its collection and publication should rank as an end in itself.
15. Re:Bad news for ecologists--new license needed by buswolley · 2014-02-25 14:45 · Score: 1
  
  hear hear!!!
  
  --
  A Good Troll is better than a Bad Human.
16. Re:Bad news for ecologists--new license needed by turkeyfish · 2014-02-25 19:12 · Score: 1
  
  This is a tall order, since scientists are held to a much higher standard than capitalists, and consequently always at a disadvantage. Scientists are expected to give away the product of their labor for free for all to use as they wish, but others are permitted to extract all the profits they may be able to get from the scientist's work, without any of the funds flowing directly to the scientist, who generated the data in the first place. One might ask why government contractors aren't likewise expected to turn all their profits and records over to the public, since their profits are derived entirely from public money?
  Perhaps scientists wouldn't be so squeamish about releasing ALL of their data just to publish a single paper, if they can be guaranteed a minimum of 50% of all profits that may derive from their work. My guess is that GOP politicians would immediately object to this a limiting the religious freedom of capitalists to worship money as they see fit. I freely admit, however, that this is just a hunch, based entirely on past performance.
17. Re:Bad news for ecologists--new license needed by turkeyfish · 2014-02-25 19:24 · Score: 1
  
  "Where's the down side?"
  Well one area of concern is how the data are used in litigation. Take a particular ecological or molecular study, any that you might think of. Say the data is curated and made available via PLOS or some other archive or entity. Now a good lawyer notices that the data are incomplete, since they do not cite the repositories of any voucher materials that would permit the reidentification of any of the species in the study. None were saved, because it was too costly. Without the vouchers, such studies are essentially useless since a case could be made that the original identifications are suspect or the original tissues were contaminated by the genes of other species not correctly identified. A good corporate lawyer will have an easy time showing any environmental studies are indefensible and incomplete and in now time, there are no environmental studies or laws that can pass a rigorous voucher test. Why weren't vouchers saved? For many of the same reasons most data is not archived for posterity and freely available: the cost in time and effort that is simply unavailable.
  Often museums and scientists would love to save the material, but can't afford to do so, since they have become no longer vast collections of well curated and intensively studied materials, but expensive headaches that the public isn't really interested in since they are more fascinated with youtube.com. Perhaps, we will all just be able to watch as humanity bends over and kisses its arse good bye.
18. Re:Bad news for ecologists--new license needed by oldhack · 2014-02-25 23:14 · Score: 1
  
  Other people give you money for research. You want the profit, fund your own work then.
  
  --
  Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
19. Re:Bad news for ecologists--new license needed by Anonymous Coward · 2014-02-26 02:41 · Score: 0
  
  There are plenty of scientists out there who poach free online data sets and mine them for additional findings.
  This is exactly what should be happening! If a dataset took decades of time and millions of dollars to produce, we damn well want more than a single person analysing it for interesting results.
  The real fix we need is for a scientist's performance to be judged by the useful datasets they've produced, as well as their publications.
20. Re:Bad news for ecologists--new license needed by __aagmrb7289 · 2014-02-26 03:11 · Score: 1
  
  This may seem naive, but are you seriously telling us, the Slashdot crowd, that you shouldn't have to release your data because *gasp* someone might do SCIENCE with it? And your suggestion is to impose a copyright/licensing scheme on it? I'm a bit surprised I'm the only person commenting on this. I do see your "continuing funding, job opportunities and pay grade" - but if everyone is doing it, then PERHAPS things might change?
21. Re:Bad news for ecologists--new license needed by hsu · 2014-02-26 05:49 · Score: 1
  
  The whole point of science is creating new information, and digging further into raw data, correlating with other data, and finding relationships and further information is very important. Monopolizing the data for one party will reduce amount of research which would benefit from that particular set of data, as there will not be anyone beyond other original creators of the data set being allowed to work on it. Keeping one to yourself might be short-sighted benefit of getting another grant, but it definitely is against public interest. If the grant system benefits people who keep their data secret, the system is completely broken.
22. Re:Bad news for ecologists--new license needed by ToddInSF · 2014-02-26 16:30 · Score: 1
  
  But he's a secret scientist, paid by public taxes. Hey, our politicians find excuses to avoid transparency, why shouldn't "science" do it too ?
  
  People have no idea how much fraud goes on in the scientific community, it's nothing new, some people are ready to DO something about it now though, before they receive too much scrutiny. Which would be *really* bad for "science" as it stands now, because people int he US think of scientists the same way they think about medical people.
  
  They put them on a mostly undeserved pedestal.
23. Re:Bad news for ecologists--new license needed by turkeyfish · 2014-02-26 18:46 · Score: 1
  
  You missed my point entirely. Many scientists work for the public and it is the public who pays, yet it is typically a handful of capitalists that profit. Why should scientists be expected to share, but not those capitalists who get to profit form the work of others?
  Perhaps the solution is for scientists to simply patent and copyright everything themselves. Now that there is electronic publishing, except for reviews is there really the need to pay publishers 7 figure salaries just to gather up the work of others, copyright it, and distribute it with no tangible benefit to the scientists that their work wouldn't gain beyond what they would have received if they published it themselves? Scientists are already doing the reviewing and the editing.
Practicalities by Roger+W+Moore · 2014-02-25 10:17 · Score: 5, Interesting

Open data is a great idea but it is not always practical. Particle physics experiments generate petabytes of extremely complex, hard to understand data. Making this publicly accessible is extremely expensive and ultimately useless since, unless you understand the innards of the detector and how it responds to particles and spend the time to really understand the complex analysis and reconstruction code there is nothing useful that you can do with the data. In fact one of the previous experiments I worked on went to great trouble to put their data online in a heavily processed and far easier to understand format in the hope that theorists or interested members of the public would look at the data. IIRC they got about 10 hits on the site per year and 1 access to the data.

So I agree with the principle that the public should be able to access all our data but for experiments with massive, complex datasets there needs to be a serious discussion about whether this is practical given the expense and complexity of the data involved. Do we best serve the public interest if we spend 25% of our research funding on making the data available to a handful of people outside the experiments with the time, skills and interest to access it given that this loss in funds would significantly hamper the rate of progress?

Personally I would regard data as something akin to a museum collection. Museums typically own far more than they can sensibly display to the public and so they select the most interested items and display these for all to see. Perhaps we should take the same approach with scientific data. Treat it as a collection of which only the most interesting selections are displayed to/accessible by the public even though the entire collection is under public ownership.
1. Re:Practicalities by NatasRevol · 2014-02-25 10:22 · Score: 0
  
  So, are you worried that everyone is going to download petabytes of data? To where, their desktops?
  Shit, that's the monthly volume of third world countries these days.
  
  --
  There are two types of people in the world: Those who crave closure
2. Re: Practicalities by guruevi · 2014-02-25 10:27 · Score: 1
  
  Unlike a museum, data doesn't require anyone to physically interact in order for it to be available. Whether or not you make the data publically available, you have to store and make it privately available, putting in public access is a matter of creating a read-only user and opening a firewall port.
  The sad thing is that most scientists don't actually store their data properly, it sits on removable hard drives, cd or an older variant of portable media (zip drive, tape) until it's forgotten about, lost, thrown out or irretrievably degraded. I would bet you that the majority of studies of even the last 3 years would not be able to present their data if asked about; maybe you'll get lucky and find an old, undocumented algorithm for MATLAB on MacOS 9 or so which they used to interpret the data but which is hopelessly useless these days.
  
  --
  Custom electronics and digital signage for your business: www.evcircuits.com
3. Re:Practicalities by RDW · 2014-02-25 10:47 · Score: 3, Informative
  
  There could be significant issues with biomedical data, too. For example, the policy gives the example of 'next-generation sequence reads' (raw genomic sequence data), but it's hard to make this truly anonymous (as legally and ethically it may have to be). For example, some researchers have identified named individuals from public sequence data with associated metadata: http://www.ncbi.nlm.nih.gov/pu...
4. Re:Practicalities by Anonymous Coward · 2014-02-25 11:02 · Score: 3, Insightful
  
  Uploading and hosting it in the first place to meet such a requirement would be an extremely difficult & costly endeavor.
  Perhaps the compromise is to include a clause that requires the author to permit others to obtain a copy and/or access the data, but only if the receiver of the data pay for the cost to transfer/access the data. This is similar to state open records access laws, where you must pay for things like the cost to make copies of documents. So in the above case, satisfying the "must permit access" clause might be as simple as permitting the researcher to come to the facility and access the data from a terminal and browse or whatever it is they do to explore/analysis the data that results from these experiments, thus no costly copying of data is required.
  If that isn't agreeable or feasible for the author/institution, then perhaps such research would simply be more appropriately published in a different journal that isn't as focused on openness and verifyability.
5. Re:Practicalities by Jane+Q.+Public · 2014-02-25 11:16 · Score: 1, Insightful
  
  Well, but.
  
  I think there's an arguable line to draw between "the entire body of data available", and the statistical sampling data that your typical paper is based on, or the specific data about a newly discovered phenomenon, for example.
  
  Exactly where that line is, I don't claim to know. But it behooves us to be reasonable, and not draw UNreasonable fixed lines in the sand.
  
  My personal opinion is: petabytes or not, if the research is publicly funded then the data belongs to the public, and must be made available in some fashion. That's a somewhat different subject than publishing a paper, but it's a related idea.
6. Re:Practicalities by Crispy+Critters · 2014-02-25 11:18 · Score: 3, Insightful
  
  "petabytes of extremely complex, hard to understand data"
  The point seems to be missed by a lot of people. RAW DATA IS USELESS. You can make available a thousand traces of voltage vs. time on your detector pins, but that is of no value whatsoever to anyone. The interpretation of these depends on the exact parameters describing the experimental equipment and procedure. How much information would someone require to replicate CERN from scratch?
  Some (maybe most, but not all) published research results can be thought of as a layering of interpretations. Something like detector output is converted to light intensity which is converted to frequency spectra and the integrated amplitudes of the peaks are calculated and are fit to a model and the parameters fit giving you a result that the amplitude of a certain emission scales with temperature squared. Which of these layers is of any value to anyone? Should the sequence of 2-byte values that comes out of the digitizer be made public?
  It is not possible to make a general statement about which layer of interpretation is the right one to be made public. Higher levels, closer to the final results, are more likely to be reusable by other researchers. However, higher levels of interpretation provide the least information for someone attempting to confirm that the total analysis is valid.
7. Re: Practicalities by Obfuscant · 2014-02-25 11:19 · Score: 4, Informative
  
  Whether or not you make the data publically available, you have to store and make it privately available,
  I have boxes and boxes of mag tapes with data on it from past experiments. That's privately available. It will never be publicly available.
  
  putting in public access is a matter of creating a read-only user and opening a firewall port.
  It is clear that you have never done such a thing yourself. There is a bit more to it than what you claim. I've been doing it for more than twenty years, keeping a public availability to much of the data we have (but not all -- tapes are not easily made public that way), and there is a lot more to dealing with a public presence than just "a read-only user and a firewall port".
  
  The sad thing is that most scientists don't actually store their data properly, it sits on removable hard drives, cd or an older variant of portable media
  
  And now you point out the biggest issue with public access to data: the cost of making it online 24/7 so the "public" can maybe sometime come look at the data. Removable hard drives are perfectly good for storing old data, and they cost a lot less than an online raid system. For that data, that is storing it "properly".
  If you want properly managed, publicly open data for every experiment, be prepared to pay more for the research. And THEN be prepared to pay more for the archivist who has to keep those systems online for you after the grants run out. And by "you", I'm referring to you as the public.
  Researchers get X amount of dollars to do an experiment. Once that grant runs out there is no more money for maintenance of the online archive, if there was money for that in the first place. For twenty two years our online access has been done using stolen time and equipment not yet retired. When the next grant runs out, the very good question will be who is going to be maintaining the existing systems that were paid for under those grants. Do they just stop?
8. Re:Practicalities by Obfuscant · 2014-02-25 11:25 · Score: 1
  
  My personal opinion is: petabytes or not, if the research is publicly funded then the data belongs to the public, and must be made available in some fashion.
  The public is currently not paying for this access. Do you want to massively increase the research funding system in the US (or whatever country) to pay for long-term management of all publicly-funded data? Or do you expect to get it for free?
  Your desire to access any and all data that was created using public money means that every research grant would need to be extended from the current length (one to three years for many of them) into decades. Someone has to pay for the system administrator, the network access, the electricity, the replacement compute/server hardware, the maintenance contracts, etc. Are you willing? Are you willing to forgo your free access when the funding agencies don't pay? I can tell you, I MIGHT work for free to keep some of the systems I created running, but I wouldn't work for free to maintain the access to the pubic for that data.
9. Re:Practicalities by Anonymous Coward · 2014-02-25 11:40 · Score: 0
  
  Your idea of practicality has nothing to do with open access, it's a justification for keeping a lid on it. It's merely a justification for the proprietary nature of business. You agree wiht the principle but prefer the prerogative which also makes it easy to promote illusion of success without proof of the opportunity to investigate the reality of the assertions made by researchers. In the case of publicly funded research, all the advantage accrues to those who receive grants, and it precludes anyone else from scrutinizing 'results' or leveraging that which has been accomplished at public expense.
  It's a furtherance of the Dole model for privatizing and leveraging public funding, institutionalizing the for-profit-model at universities which have become increasingly mercenary in their activity and firewalling the rest of the world from so as to protect US corporate advantage from competition.
10. Re:Practicalities by Sentrion · 2014-02-25 11:40 · Score: 0
  
  But if I have to spend $100k on lobbying before I get public funding, I don't want to have to share the results with freeloaders who didn't pony up the lobbying cash and didn't put the manpower into the research. The rest of society benefits from the public funds after they have bought my product. Take Google, for instance.
11. Re:Practicalities by Pseudonym · 2014-02-25 11:41 · Score: 2
  
  There's precedent for this. In many biology experiments, the "raw data" is an actual organism, like a colony of bacteria or something. There are scientific protocols for accessing that "data", but you have to be able to prove that you are an institution that can handle it. Even if the public "owns" it, technically speaking, no reputable scientist is going to send an e. coli sample to just anyone.
  So I think we all understand that, in practice, we mean different things by "public access". Sometimes that means that anyone should be able to download the data, and sometimes that means that anyone should be allowed to go there and examine it for themselves.
  
  --
  sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
12. Re:Practicalities by aurizon · 2014-02-25 11:52 · Score: 1
  
  A lot of people ignore the collateral functions of the so-called 'peer review' system administered by the publisher.
  The publication must be read by someone who knows the subject passable. If his first pass finds it acceptable, he must then select from a number of true experts in these matters (the peers or equals to the writer of the paper). He works for a living as a competent editor for that area of research. The peers he choose are sent a copy of the paper to review and criticize, if not acceptable, the comments are passed back to the author for him to respond. After his responses to fix the flaws, it goes back to the panel and so on until rejected of published. The review mechanism is needed to avoid total BS being published. The publishers have created this nice and profit by it - some say excessively, and I agree. So some way must be found to pay for this. Page fees are the initial solution - the author pays a fee, and this is spread among the experts involved.
  As for an Archive, in the USA, the Library of Congress can do this, as long as a proper indexing method is used so that the paper does not become a needle in a haystack. It should be google indexed. Perhaps Google will fund this via ads, because all the biological supply houses will place biological ads, and the same with all the other disciplines.
  In fact, this could become a gold mine for Google and at the same time serve PLOS and the research community very well. Large data bases of terabytes of particle data would not be stored, the publisher would grant access to those who wanted to down load it (a precious few will want terabytes of particle data)
  So why not someone who has a pipeline to google give them a whistle, they might leap at the chance. It is a natural fit.
13. Re: Practicalities by Anonymous Coward · 2014-02-25 12:06 · Score: 0
  
  If it was free yes, but since it's not and funding does not last forever it will go on like this for a very long time. We're not talking about GB or TB it's PB and that much is not cheap to save. If someone like Bill Gates wants to step up and sponsor it then sure, but it's not something most people can even come close to affording. In fact 1PB cost more than most homes for just 3 years of storage.
14. Re:Practicalities by Anonymous Coward · 2014-02-25 12:26 · Score: 0
  
  With respect to particle physics, you can download the datasets that are processed into published paper from the LHC already. Cern requires this of all LHC experiments and hosts that data for them. Therefore they would be ok under PLOS's requirements.
15. Re:Practicalities by Jane+Q.+Public · 2014-02-25 12:33 · Score: 1
  
  "The public is currently not paying for this access."
  I know it isn't. That was an aside, slightly off-topic, I admit.
  
  "Your desire to access any and all data that was created using public money means that every research grant would need to be extended from the current length (one to three years for many of them) into decades."
  Not if such a program were to affect only future research. After all: ex post facto laws are forbidden in the United States.
  
  "Someone has to pay for the system administrator, the network access, the electricity, the replacement compute/server hardware, the maintenance contracts, etc. Are you willing? "
  I am aware that it would cost somewhat more. But it is arguable that the benefit lost to society is worth far more.
  
  "Are you willing to forgo your free access when the funding agencies don't pay?"
  If they don't pay, then it wasn't publicly funded, was it?
  
  "I can tell you, I MIGHT work for free to keep some of the systems I created running, but I wouldn't work for free to maintain the access to the pubic for that data."
  If you are profiting on my dime, then yeah. Cough it up, bud.
  
  I didn't say the researchers should pay for it. The public (meaning of course government at some level) would be responsible for maintaining publicly-accessible archives of publicly-funded research.
16. Re:Practicalities by Jane+Q.+Public · 2014-02-25 12:40 · Score: 1
  
  "A lot of people ignore the collateral functions of the so-called 'peer review' system administered by the publisher."
  I don't see this as a stumbling block, though. There are already public-access peer-reviewed journals. They may have a way to go yet but I expect them to get better and their number to expand in the near future.
17. Re:Practicalities by aurizon · 2014-02-25 12:48 · Score: 1
  
  Too many badly reviewed articles are published by them.
18. Re:Practicalities by Anonymous Coward · 2014-02-25 13:09 · Score: 0
  
  So, are you worried that everyone is going to download petabytes of data?
  
  Yes. If they somehow put up a single file with a petabyte of data, without a doubt Timothy would manage to inadvertently link to it on the front page of slashdot.
19. Re:Practicalities by Jane+Q.+Public · 2014-02-25 13:24 · Score: 2
  
  "But if I have to spend $100k on lobbying before I get public funding, I don't want to have to share the results with freeloaders who didn't pony up the lobbying cash and didn't put the manpower into the research."
  You are describing exactly why the current system is broken.
  
  First off, if the research is worthwhile you shouldn't have to spend $100,000 to lobby for it. And I would argue that is an unethical practice: what about the little guy who is doing promising research but doesn't have the funds to lobby?
  
  Second: quite frankly I don't give a flying fuck how much you spent to get the grant. Public money is public money. If I'm paying for it, it belongs to me. Period. And I don't care even a little if you don't like that.
  
  "The rest of society benefits from the public funds after they have bought my product."
  Then go pay to get a patent on your own, and leave public funds out of it. Why should the public pay so that you can profit? Independent inventors do it all the time without public funding. What makes you so special?
  
  "Take Google, for instance."
  Is Google doing publicly-funded research? That's news to me. If so, I object very strongly.
  
  I suspect you are being sarcastic here. If you're not, I simply disagree with you. Very much.
20. Re:Practicalities by Goldsmith · 2014-02-25 13:45 · Score: 1
  
  We are paying for that access.
  I've been a government employee overseeing research grants. Nearly every single one of them has a clause built in that the data is to be organized and shared with the government and the government has unlimited rights to that data, including all publications. Almost all of them have to have a data management plan and have to describe how the grantee will ensure access to the data.
  Almost every single PI simply says "We will follow a standard data management plan." or some other nonsense. The government guys sign off on this, and that's that, there's no enforcement.
  When you buy or build equipment on a government grant, you sometimes have a choice to hang on to it or return it to the government at the end of the grant. By agreeing to be the custodian of that equipment, you agree to maintain it, free of charge, for the government. By law, no one gets ownership of free equipment from the government. The government is absolutely terrible at enforcing this.
  These legal documents researchers sign with the government have meaning. Read your contracts. That was the first thing I told my PIs. I don't think any of them did.
21. Re:Practicalities by Jane+Q.+Public · 2014-02-25 14:11 · Score: 1
  
  "Too many badly reviewed articles are published by them."
  Well, that's a pretty broad statement and I haven't seen any evidence. In any case, I repeat:
  
  "They may have a way to go yet but I expect them to get better"
22. Re:Practicalities by wealthychef · 2014-02-25 15:08 · Score: 1
  
  How hard would it be to grant exceptions to the policy? It's a good policy, no reason it can't be flexible too.
  
  --
  Currently hooked on AMP
23. Re:Practicalities by Immerman · 2014-02-25 15:17 · Score: 1
  
  What? The organism is not the data - the data is all the measurements you took of that organism and all the situations you subjected them to in order to reach the conclusions that you are publishing.
  
  --
  --- Most topics have many sides worth arguing, allow me to take one opposite you.
24. Re:Practicalities by aurizon · 2014-02-25 15:22 · Score: 1
  
  I had not seen peerj, it looks better than some of the others, and their $99 fee is encouraging, even if optimistic - what happens when the work load gets large, which can happen if they atttract many authors.. There are other journals of easy access and low editorial standard, which is the 'them' I referred to. By the use of a pool of reviewers peerj has a shot at kicking the established journals to the curb = good. In so doing peerj will improve the ecology and hopefully the lower grade journals will smarten up and improve or go away.
  I am sure the established journals will fight back, with deep pockets - they have literally billions, and may even fully match peerj and other competent free journals for five or ten years to starve them of good papers. Will they do that? When they see the buzzards circling overhead, they will find a motive.
  I am very much in favor of journals like peerj, and I have seen the harm the expensive journals in the third and even the second world have done to deprive their scholars of the books and paper they need. I am happy to see that the modern use of the internet and scanners has spread all expensive journals and books to all these less wealthy countries via scanners and e-mails. This is good.
  And while I am on that, the MIT free online university and others like the Khan academy need open source texts for free, because the journal publishers also have another empire, usually in cahoots with profs, to publish course books for $200 or more, and to make last years book obsolete and worthless, so a new book is needed.
  Course books are needed for all college years and disciplines, fully open source, update online, also free.
  Will it happen? Here in Toronto the University DEMANDS each freshman buy all his course books, and provide a receipt, or the y are not admitted to school. The prof gets a kickback and the college bookstore gets a kickback. Ever see how badly the students are victimized?
  That is why I say the entire crooked system needs to change.
  That means recognized degrees from MIT/Khan/Et al, which means an accreditation system needs to evolve, and be paid for. This will start to chip away at these monopolies.
  This will be a war, without bullets, on economic grounds. Google can become the friend of all here
25. Re: Practicalities by Anonymous Coward · 2014-02-25 15:29 · Score: 0
  
  As a molecular biologist I generate lots of sequence data and no matter what the journals choose to do, nearly all sequence data in the world if publicly available through GenBank. In addition, DNA alignments and phylogenies are frequently required to be published in an online database (eg, TreeBase). So, for many fields the journals already require this and/or you simply have to make the sequences available anyway for reviewers to be able to check your data. The real issue with this is that many scientists never finalize database submissions as you get the submission ID number without completely finishing the submission info to release it beyond yourself and the reviewers or your paper. I am slightly ashamed to admit that I have two submissions in TreeBank right now that the public cannot access because it is so obnoxious to fill out all the necessary information for public release that I have not taken the time to do it.
  Now, one thing I think I must point out that the "public" that we (scientists) are referring to is largely other scientists. Frankly, very few people in the world are interested in my sequences and would have sufficient knowledge/desire to do anything with them. The knowledge part isn't that big of a problem; anyone motivated can teach themselves whatever they want, using free resources. The desire part is the problem. It doesn't matter that we are talking about a limited "public". You must realize that other scientists need to be able to get your data to build up knowledge in the area, providing a important "whole public" service. Eg, what if the person that found that antibiotics could be purified from fungi to kill bacteria only reported this fact but not the species of fungi? Every would need to reinvent the wheel to increase our knowledge of antibiotics and not have a starting point to begin testing.
  ~Molecular Fungal Systematist
26. Re: Practicalities by guruevi · 2014-02-25 15:54 · Score: 2
  
  I actually do this for a living; Having data available for projects does require it to be on large data systems which are properly backed up etc. Heck, any halfway decent staged system (Sun used to make really good ones) will allow you to read tapes as if it were a regular network share. The problem will be (which is inevitable) that your PI is going to ask for the data 3 years after they left the institute and your tapes will be unreadable (either because they degrade or because you can't find a reader and associated busses and software)
  The mag tapes in boxes problem we fixed years ago by simply putting everything on spinning rust with ZFS. As capacity increases (we're 3 generations in now - 750GB, 2TB and now 4TB drives), the old stuff simply takes up a diminishing percentage of any expansion we put in. Individual data sets from ~10 years ago were 100MB, now they're close to 2GB, those 100MB sets aren't even a noticeable portion today whereas back in the day they filled up the entire *gasp* 3TB array.
  I do understand the grant issues, most of those grants will actually mandate a 20 year or-so archival period but never have the money for it. I've figured out that future grants will simply pay for today's "large amount" of data storage in a small overhead because 10 years from now, 2TB of storage for a study will be like today's 100MB for a study.
  
  --
  Custom electronics and digital signage for your business: www.evcircuits.com
27. Re:Practicalities by Roger+W+Moore · 2014-02-25 16:39 · Score: 1
  
  So, are you worried that everyone is going to download petabytes of data?
  No, I am worried about the cost of setting up an incredibly expensive system which can serve petabytes of data to the world and then having it sit there almost unused while the hundreds of graduate students and postdocs the money could have funded move on into careers in banking instead of going on to make a major scientific breakthrough which might benefit all society.
28. Re:Practicalities by Pseudonym · 2014-02-25 17:35 · Score: 1
  
  The idea is not mine; I'm actually paraphrasing Richard Lenski.
  
  --
  sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
29. Re:Practicalities by turkeyfish · 2014-02-25 17:57 · Score: 1
  
  Good point. However, even for data that only comes to gigabytes, all such data and the resources necessary to set up and maintain such repositories is going to cost a lot of money. Journals can demand it, but its not clear that authors will be able to pay to put it in the form that journals might like to see. There is also the question of archival costs. Any organization that accumulates such data is going to require a revenue stream to pay for it. This could well be yet another cost that needs to be given consideration, especially now that the cost of just conducting experiments and collecting and analyzing data is already extraordinarily difficult to come by as it is. Adding to these costs may well actually impede research, even though the motives are laudable. However, to the extent that such data can be archived and made available electronically all of science will benefit. PLOS doesn't really begin to address these issues. Its an old issue, that museum curators are all too familiar with, but as always still awaiting funds to actually address it properly. Just having such a good idea, isn't going to make it feasible until someone actually starts addressing the financial aspects of the issue in a realistic way, especially since the problem only gets bigger and bigger with time since data accumulates.
30. Re: Practicalities by turkeyfish · 2014-02-25 18:16 · Score: 1
  
  Asking that ALL data be saved is a very big requirement, especially for the molecular community. Although the sequences often find their way into Genbank, or at least those that are brought together from pieces of other data that may seldom gets into Genbank. To make maters worse the specimens from which the sequences are made are seldom saved and archived, so that it is often next to impossible to actually verify that the sequences in Genbank are actually from the species that are thought to have been sequenced. I know this is a major problem, since much of my time is spent trying to track down the source of such tissues so that the specimens, should they still exist, which they seldom do, can actually have their identities confirmed. In principle, if the original specimens are saved, since they are in fact the ultimate voucher that makes the data valuable in the first place and the published sequences useless without it, this will greatly benefit the scientific community. However, vouchering of specimens is even more costly than data, which evidently why the molecular community has done such a poor job of it for many species. The problem is large because there is no easy way to define the limits of what is meant by data.
31. Re:Practicalities by turkeyfish · 2014-02-25 18:55 · Score: 1
  
  "What? The organism is not the data - the data is all the measurements you took of that organism and all the situations you subjected them to in order to reach the conclusions that you are publishing."
  You simply don't understand and have a very naive view of biology and the complexity of life on planet Earth. If you don't have voucher material available to confirm the identity of the organisms under study, then there are no definite subsequent statements that one can make about any of the measures, observations, etc extracted from that species, since it may well be that the study at hand is actually based on another species entirely or a mixture of closely related species that have not been properly identified.
  For many species, this is generally not regarded as a serious issue and great pains and expense have gone into establishing particular strains or lineages for purposes of experimentation so this issue can be set aside and assumed to be answered. However, for a great many more, it is always a very serious issue, since few organisms actually come with the correct scientific identification neatly printed on their backs for all to make an easy identification. Just ask yourself, of the 30,000 species of fishes, for example, how many do you think the average scientist can readily identify? Now ask that of coleopterans or even larger or more obscure taxa. In reality, the voucher is the data, since it is the only way one can with any certainty reproduce a biological study. Once you have the identity of the species involved determined and confirmed, then you can go about studying various measurable properties. However, without that critical piece, the rest is conjecture. One needs to recognized thousands of papers have been published in which the organisms in question have been misidentified. The only way one can be sure, is to have saved voucher material.
32. Re:Practicalities by Anonymous Coward · 2014-02-25 21:57 · Score: 0
  
  Ideally, the researchers should make public the rawest of the raw data, and all the scripts and software that convert the data through layers of interpretation to produce the final result. That way, anyone can check to see if there was a bug in the program that converts the raw voltage traces to light intensities, and rerun the entire pipeline to determine whether this has any effect on the final result.
  Of course, doing the research in this totally-reproduceable way and making it all publicly available would be impractically difficult. But it's worth keeping in mind as an ideal towards which to aspire.
33. Re: Practicalities by Bongo · 2014-02-25 22:37 · Score: 1
  
  Thanks, I've been wondering about this problem for a while. I'd seen ZFS as the technical part, but didn't know what to do about the "no money" part.
34. Re:Practicalities by StripedCow · 2014-02-25 22:49 · Score: 1
  
  Making this publicly accessible is extremely expensive and ultimately useless since, unless you understand the innards of the detector and how it responds to particles and spend the time to really understand the complex analysis and reconstruction code there is nothing useful that you can do with the data.
  If the format/origin of the data is not understood by your readers, perhaps your publications need more work.
  
  --
  If Pandora's box is destined to be opened, *I* want to be the one to open it.
35. Re: Practicalities by Anonymous Coward · 2014-02-25 23:30 · Score: 0
  
  At *least* make it mandatory for the 'soft' sciences like psychology and biology. There's a lot of misinformation and manipulation of data in these fields - and the data sets are smaller and more easily scrutinized.
36. Re:Practicalities by Anonymous Coward · 2014-02-26 00:49 · Score: 0
  
  "We are paying for that access."
  No you are not.
  You're paying for the results.
  If you want access to the data as well, then you'll need to pony up for it to be made available, EVEN IF YOU NEVER WANT IT YOURSELF.
  The only way PLOS can manage this is if they are the ones with the data and make it available. I doubt whether they'd accept that, since it's fucking expensive.
37. Re:Practicalities by martin-boundary · 2014-02-26 01:35 · Score: 1
  
  It is not possible to make a general statement about which layer of interpretation is the right one to be made public. Higher levels, closer to the final results, are more likely to be reusable by other researchers. However, higher levels of interpretation provide the least information for someone attempting to confirm that the total analysis is valid.
  
  You're wrong. It is perfectly clear what needs to be published openly: whatever is necessary for someone to confirm that the total analysis is valid.
  That is the fundamental principle required for scientific progress. The fact that this statement is not specific enough to prescribe exactly what needs to be done in every scientific experiment is not a flaw. If in doubt, err on the side of caution, ie publish more than is strictly necessary to confirm that the total analysis is valid.
  Yes, this may sometimes be costly, but so what? Some experiments are costly, and that often causes scientists to think trough variations that show the same effect in a cheaper way. Such calculations can and should factor in the cost of making the necessary data available. They are not orthogonal to the science per se, because ensuring repeatability and verifiability is at the heart of the scientific method.
  So when designing an experiment, think about what it will take not just to convince the journal's referees, but also to prove repeatability against anyone who is willing to properly test it. Then implement the experiment and provide the proof, and publish the summary in a journal.
38. Re:Practicalities by Immerman · 2014-02-26 03:08 · Score: 1
  
  Ah. Good point. Fortunately DNA sequencing is getting cheaper by the day - that's about as unambiguous an identification as you can get, and can't reproduce on its own.
  
  --
  --- Most topics have many sides worth arguing, allow me to take one opposite you.
39. Re:Practicalities by NatasRevol · 2014-02-26 03:56 · Score: 1
  
  Why would a website front end be incredibly expensive? It doesn't need to be highly available with gb of bandwidth.
  Hell, pay a grad student to build it.
  
  --
  There are two types of people in the world: Those who crave closure
40. Re:Practicalities by Crispy+Critters · 2014-02-26 04:55 · Score: 1
  
  "You're wrong. It is perfectly clear what needs to be published openly: whatever is necessary for someone to confirm that the total analysis is valid."
  This is not what is under discussion. To confirm the total analysis, you need access to all the raw bits, all the calibration data underlying the analysis, all the computer codes used, copies of any written information in logs and lab books, and all the laboratory equipment as it was at the time the data was collected. Plus, you need to have all the knowledge that is in the researcher's head. And all of this tells you absolutely nothing about the validity of the research--the real question is whether the technique applied is a correct way to measure the phenomena.
  "That is the fundamental principle required for scientific progress."
  No it isn't. The fundamental principles are that results can be reproduced and that results can be used to make predictions.
  If you demand all this, the question is whether governments are going to increase their research budgets by a factor of 10 or simply eliminate all publicly-funded research.
41. Re: Practicalities by Obfuscant · 2014-02-26 05:54 · Score: 1
  
  I actually do this for a living; Having data available for projects does require it to be on large data systems which are properly backed up etc.
  So do I, and for more than twenty years. If you do, you'd know it is quite a bit more than just a hole in a firewall and a read-only login. It requires an organization that the public can navigate and understand and actually find things. That's different than the organization that the local users need since local users get a much larger view of the data and need it in faster and more direct ways. I.e., local users see a lot of files, public users see links on a web page.
  If your "public" access is just a "read-only" user that gets to wander about looking for the files he needs, then you're doing the public part wrong. And you're creating security issues for everyone because one of the first vectors for attack is almost always "get access to the system".
  
  Heck, any halfway decent staged system (Sun used to make really good ones) will allow you to read tapes as if it were a regular network share.
  Citation required. I have a thousand VHS tapes that it would be really nice to have available "as if [they] were a network share." I know of no such hardware for that format. I know of no such free hardware for any tape format, and since the grants that paid to collect the data didn't include robot tape systems at all it would have to be free for there to be any hope of this data being made publicly available. That's ignoring the money it would take to pay me to make it so.
  
  The mag tapes in boxes problem we fixed years ago by simply putting everything on spinning rust with ZFS.
  Someone pays for the time of the person who does that. "Graduate students". No, they're busy using the data to do something, not spending a year of their life digitizing video tapes.
  
  I do understand the grant issues, most of those grants will actually mandate a 20 year or-so archival period but never have the money for it.
  
  I have never seen a grant with such a mandate. Were there to be such, we'd never have to sneak it in under other grants, we'd just tell them -- "you mandated this, you pay for it."
  
  I've figured out that future grants will simply pay for today's "large amount" of data storage in a small overhead
  
  In other words, you'll fail any strict audit because you'll be doing work under one grant that is actually supporting one that has run out. And if your PI is more interested in keeping today's program manager happy by doing work on today's data than in managing last years/last decades, you'll be doing that instead.
  The public who yearns for "free data" because they think they've paid for it need to realize that no, they really haven't paid for access to the data, only for someone to collect it in a research program. There's more to it than just flipping a couple of bits to grant the public such access, and that's going to cost money. They need to know that.
42. Re:Practicalities by aestrivex · 2014-02-26 05:55 · Score: 1
  
  But if a researcher -- or an interested enthusiast -- contacted you and asked questions and then wanted to see the data, you would give it to them, right? I strongly applaud the spirit if not the implementation of the idea. Science is supposed to be publically verifiable evidence that any interested party could reproduce. And that is what it is, as long as you have the grant money and access to materials such as particle accelerators or MRI magnets. Now I realize that this type of equipment is expensive to manage and isn't easy to give everyone their fair shot at having time to use, but doing as much as possible to combat this is the way to help foster open science, no matter how esoteric. Put another way, it gives people the choice to be hands on with science, and fail in order to learn something. Which is something that all scientists have to do, they just pay inordinate amounts of money to do it at present. That is what the policy should amount to: If someone wants access to the MRI dicoms I used in my PLoS ONE paper, I should be legally required to oblige as part of the agreement to publish in their journal. Much as I am legally obliged to provide the source code of my GPL program to someone who asks, but it is ok if I wait until they ask if it is too big to host online.
43. Re: Practicalities by Anonymous Coward · 2014-02-26 06:17 · Score: 0
  
  Whether or not you make the data publically available, you have to store and make it privately available,
  I have boxes and boxes of mag tapes with data on it from past experiments. That's privately available. It will never be publicly available.
  
  putting in public access is a matter of creating a read-only user and opening a firewall port.
  It is clear that you have never done such a thing yourself. There is a bit more to it than what you claim. I've been doing it for more than twenty years, keeping a public availability to much of the data we have (but not all -- tapes are not easily made public that way), and there is a lot more to dealing with a public presence than just "a read-only user and a firewall port".
  
  The sad thing is that most scientists don't actually store their data properly, it sits on removable hard drives, cd or an older variant of portable media
  And now you point out the biggest issue with public access to data: the cost of making it online 24/7 so the "public" can maybe sometime come look at the data. Removable hard drives are perfectly good for storing old data, and they cost a lot less than an online raid system. For that data, that is storing it "properly".
  If you want properly managed, publicly open data for every experiment, be prepared to pay more for the research. And THEN be prepared to pay more for the archivist who has to keep those systems online for you after the grants run out. And by "you", I'm referring to you as the public.
  Researchers get X amount of dollars to do an experiment. Once that grant runs out there is no more money for maintenance of the online archive, if there was money for that in the first place. For twenty two years our online access has been done using stolen time and equipment not yet retired. When the next grant runs out, the very good question will be who is going to be maintaining the existing systems that were paid for under those grants. Do they just stop?
  You work in science, you should be 100% on board with public access. What is wrong with you? Life is hard for everyone.
44. Re:Practicalities by Obfuscant · 2014-02-26 10:41 · Score: 1
  
  Not if such a program were to affect only future research.
  
  I don't know what would be magic about future research that would allow a three-year grant to pay for extended, stable, long-term public access to the data that is collected under that grant. If you want someone to provide the data to you someone needs to pay for the systems and people it will require to store and distribute it. That would require a source of funding for the long-term. That would mean the three-year grant would need to be twenty years long or more, even if it is just paying for maintenance and upkeep in years 4 through 20+.
  
  I am aware that it would cost somewhat more. But it is arguable that the benefit lost to society is worth far more.
  "Benefit to society" doesn't put food on my table or clothes on my back or gas in my car. If you want access to my raw data someone needs to pay for someone to maintain it. That might not be me, but it will be someone. I mean, when that disk fails in my RAID6 array, someone has to physically replace it, and the replacement costs money (even if it is a warranty service exchange) and the person who replaces it costs money. If you pay nothing, you get two free disk failures. The third one means the data goes away.
  
  If you are profiting on my dime, then yeah. Cough it up, bud.
  I work for a living. That's not "profiting". Cough up the money to pay me to maintain your access to my data if you want it, because you're paying nothing for it right now.
  
  I didn't say the researchers should pay for it. The public (meaning of course government at some level) would be responsible for maintaining publicly-accessible archives of publicly-funded research.
  And yet you just accused me of "profiting" from this and I should "cough up" your data. That's a direct demand that researchers pay for your access. Yes, maybe government should pay for your access, and I'd be happy to have a lifetime job just maintaining all this data so you could come look it, if you ever do. But that's not what I'm paid to do, and government isn't paying for it.
45. Re:Practicalities by Obfuscant · 2014-02-26 13:06 · Score: 1
  
  If the format/origin of the data is not understood by your readers, perhaps your publications need more work.
  Sadly, "the public" is a much larger superset of people than "your readers". And journals are not the way to teach people all the various formats and origins of data, so even considering a limited subset of "your readers" your statement is wrong.
46. Re:Practicalities by turkeyfish · 2014-02-26 18:35 · Score: 1
  
  More rapid sequencing and hopefully much less expensive sequencing will greatly improve our knowledge, but the reality is that species identification will always be an issue since there are so many similar species often difficult to tell apart. So care must be taken with the identifications to ensure that the correct name is being attached to the sequences generated. The need for vouchers will be with us for a very long time to come and this may be a good thing, since it will shift the focus from simply obtaining and describing sequences or simply describing morphology toward understanding the functional, ecological, physiological, and evolutionary relationships between the different kinds of data that can be used for identification.
47. Re:Practicalities by Immerman · 2014-02-27 03:08 · Score: 1
  
  I would suggest that we are discovering that the entire concept of species is itself rather poorly defined. It's clear enough once two organisms have diverged too far to permit reproduction, but that doesn't really address the transitional phases or "intermediate" species (A+B or B+C can reproduce, but not A+C). And that traditional heuristic completely falls down in the face of asexuals who can exchange genes with other organisms that are barely related at all.
  At some point I think we're going to have to admit that useful as it is "species", like Newtons Laws, is a fundamentally flawed concept spawned from incomplete knowledge.
  
  --
  --- Most topics have many sides worth arguing, allow me to take one opposite you.
48. Re:Practicalities by Roger+W+Moore · 2014-02-27 09:59 · Score: 1
  
  Why would a website front end be incredibly expensive?
  It wouldn't so long as all you expected was a simple file system with data files but without some explanation of the data format, where to find the associated calibration database, geometry database etc. it will be of no use to anyone. So you will need to hire someone to nicely format the data, write documentation on where to find the calibration and geometry databases, etc. etc. This is before you even start to look at the cost of storing the hundreds of petabytes of data - you are looking at about $5 million for the disks alone for 100PB data. Add in RAID arrays and extra disks for the redundancy plus the power to run it all and you are looking at tens to hundreds of millions of dollars plus salaries of the staff to run and maintain it all...just to make data available that perhaps a handful of the public will look at each year.
  
  The data may belong to the public and we may have the means to make it available to them but is this the best way to spend public money? Even if the money came in addition to the normal research grants there are better things to spend it on that this.
49. Re: Practicalities by guruevi · 2014-02-28 11:38 · Score: 1
  
  The "read only user" was hyperbole but it's very close to a technical solution. To "open your data" all you need is a system that you can point to and will resolve externally. Usually, that link will be a very specific data set which is included in the paper and which will be available. How you organize it internally doesn't matter, as long as you can point to say an HTTP page with all the data in read only. There are no major security issues there because the data should already be open, it doesn't matter if someone can read all of it, that's the purpose of research data after all. There is at least one system at several institutions I know off that basically does this although it's not fit (yet) for large data sets - you upload a data set, at some point the PI says "make this public" and there you go. You could instead of re-upload to another system, have an internal link to the data. Most institutions already have this, it's a matter of making it usable and accessible.
  I was talking about data - tape robots have been around for ages for pretty much any type of tape. Poorly chosen archival solutions will never be made accessible but we've been able to do data archival in an organized fashion for at least 20 years now and there is no reason to put current data on VHS. As far as tape -> disk, again, any decent archival solution has to make sure their data can be read. Instead of reading it and verifying it and then re-storing the tapes (and yes, when I started, loading tapes in and out of the robot was a job that took me a full day), simply copy the stuff off once and for all. If you really need to have everything from the VHS tapes, there are solutions for it, not free but that's the cost of bad decisions.
  As far as requirements for grants NSF: provide reliable digital preservation, access, integration, and analysis capabilities for science and/or engineering data over a decades-long timeline; Most grants will actually require something along those lines but won't have an associated budget for it. We're IT guys, we maintain lots of stuff from decades ago that has long since run out of funds. It's the cost of IT, part of it will be funded by separate grants to maintain that data, part of it will be funded through established trusts, part of it will be funded by "upgrades" which are costed under a different grant. Or did you really think that 1TB of data costs $1500/year to maintain.
  
  --
  Custom electronics and digital signage for your business: www.evcircuits.com
This is not new at all by umafuckit · 2014-02-25 10:18 · Score: 2

Standard policy. Nature have been doing this for some time. They state: authors are required to make materials, data and associated protocols promptly available to readers without undue qualifications. So have Cell Press and Science. I stopped searching at this point, but I'm sure other major journals do the same thing.

--
soylentnews.org
1. Re:This is not new at all by umafuckit · 2014-02-25 10:20 · Score: 1
  
  Ok, sorry, I see they want the data deposited upon publication.
  
  --
  soylentnews.org
Beta-blocker by Anonymous Coward · 2014-02-25 10:30 · Score: 0

I am getting a migraine. can someone tell me how to beta-block?
1. Re:Beta-blocker by tepples · 2014-02-25 11:14 · Score: 1
  
  You could try looking for articles about beta blockers in PLOS ONE.
  (Or if you meant that beta, try this link.)
Prolific publishing by hubie · 2014-02-25 10:53 · Score: 2

one of the most prolific publishers of research papers in the world.
Their journals aren't in my field (they are all bio journals), so I have not heard of them, but is it true that they are that big? Their web site wasn't much help in terms of information on subscriptions or article numbers, or I simply missed it. Can anyone familiar with them provide any input?
Their data policy might work for the biosciences, but good luck requiring all the many TB of raw data from a particle physics experiment to be put up somewhere. And in some instances, like that one, the raw data will most likely be useless without knowing what it all means, what the detectors were, what the detector responses are, etc. etc. etc. For experiments where it takes man-months or man-years to collect and process the data, making it all available in raw format will largely be a waste of time.
In general, at least for experiments done in the lab that use specialized equipment, raw data will not be very useful if you don't understand what you're collecting or familiar with the equipment. You can end up with situations like that guy who took the Mars rover images and kept zooming in until he saw a life form.
1. Re:Prolific publishing by the+gnat · 2014-02-25 11:38 · Score: 1
  
  is it true that they are that big? Their web site [plos.org] wasn't much help in terms of information on subscriptions or article numbers, or I simply missed it. Can anyone familiar with them provide any input?
  "In 2013, PLOS ONE published 31,500 papers."
HIPAA by Anonymous Coward · 2014-02-25 10:57 · Score: 0

I guess they don't want any more publications from medicine. There is no way to truly, fully anonymize patient data. This is why the data is rarely provided, or locked behind a "prove you're a researcher" wall, or only a small subset given decade(s) later such that it would be much harder to trace.
1. Re:HIPAA by sexconker · 2014-02-25 11:10 · Score: 1
  
  I guess they don't want any more publications from medicine. There is no way to truly, fully anonymize patient data. This is why the data is rarely provided, or locked behind a "prove you're a researcher" wall, or only a small subset given decade(s) later such that it would be much harder to trace.
  WTF is this horseshit? Anonymize patients by removing all name, address, etc. info. Just keep the relevant metrics for the study.
  HIPAA does not allow a "researchers can access your private, personal data, lol" exception, so there's no fucking change from how shit runs currently.
2. Re:HIPAA by Crispy+Critters · 2014-02-25 11:44 · Score: 2
  
  Unfortunately, it has been shown already that the few details relevant to medical studies can often be used to uniquely identify individuals even after name and address are removed. "Yaniv Erlich shows how research participants can be identified from 'anonymous' DNA" http://www.nature.com/news/pri...
  Same will be true for various kinds of employment data and census data.
3. Re:HIPAA by Anonymous Coward · 2014-02-26 03:58 · Score: 0
  
  Any geographical subdivisions smaller than a State are considered identifers for HIPAA purposes. Those may be very relevant for a data set depending on the study.
  For dates, any reference for even a year is considered an identifier for anyone over 89 years of age.
  There are lots of things that might be relevant to a biomedical data set that could not be made public. Anonymizing is harder than you think.
4. Re:HIPAA by sexconker · 2014-02-26 04:52 · Score: 1
  
  Any geographical subdivisions smaller than a State are considered identifers for HIPAA purposes. Those may be very relevant for a data set depending on the study.
  For dates, any reference for even a year is considered an identifier for anyone over 89 years of age.
  There are lots of things that might be relevant to a biomedical data set that could not be made public. Anonymizing is harder than you think.
  Releasing that data to random fucking researchers is still a HIPAA violation.
  There's no NEW issue that comes up when data has to be public. The data has to be sanitized or released in accordance with HIPAA BEFORE it gets to researchers and journals.
That Wraps It Up by sexconker · 2014-02-25 11:07 · Score: 0

Well, that really wraps it up for the global warming crowd.
If their source data has to be publicly accessible, it'll be laughed out off the stage before their "studies" get any traction.
1. Re:That Wraps It Up by Microlith · 2014-02-25 11:51 · Score: 1
  
  Yup because IT'S A CONSPIRACY!
  Right? That's what Exxon Mobil and Fox News tell me...
2. Re:That Wraps It Up by turkeyfish · 2014-02-26 18:51 · Score: 1
  
  Hardly, when you consider that the results are typically posted every day online and in the newspapers. Its not as if world temperature data is being kept secret. Are you suggesting that scientists are hiding temperature data from the public? Surely you must be joking.
3. Re:That Wraps It Up by Anonymous Coward · 2014-02-27 04:50 · Score: 0
  
  Wow, what a poor argument. You think the scientists are just sitting around and get their data from the weather channel?
Re:This is not new - but few comply by Anonymous Coward · 2014-02-25 11:18 · Score: 1

And many scientists that get published in these high profile journals are scofflaws when it comes to sharing... It's been covered many times but compliance is near zero.
Good idea but... by Anonymous Coward · 2014-02-25 11:28 · Score: 0

I'm worried about the wording of around ALL DATA. In many experiments ALL DATA could easily be interpreted as their entire data sets running into the many Tera or even Petabytes. Making this much data publicly available could be prohibitively expensive for many papers.
Conflicts with privacy rights by Antique+Geekmeister · 2014-02-25 14:27 · Score: 1

There is a great deal of science, and public policy, that would benefit from public exposure. But medical and sociological research benefits from the privacy of the subject, who then feel more free to be truthful. The same is true of political survey data, and "anonymizing" it can be a lengthy, expensive, and uncertain process, especially when coupled with various metadata that is being collected with the experiments or in parallel with it. It can also be very expensive to make public, even without privacy issues, because transforming it from obsolete media and making it available for public download often takes real engineering time. Long term science projects can span decades, and the first sets of data are often on obsolete media.
Overall, it seems an excellent policy, but exceptions will have to be made.
Re:This is not new - but few comply by umafuckit · 2014-02-25 14:46 · Score: 1

I'd agree with that. I once tried, very politely, to get data from authors of an NPG paper. They stalled and it become awkward. In the end I gave up because my interest was purely motivated by curiosity and I didn't want to make an enemy (even if the person in question was in a different field). Glad I backed off now as I've ended up moving into that field...

--
soylentnews.org
Oh the irony by Roger+W+Moore · 2014-02-25 16:24 · Score: 1

In the case of publicly funded research, all the advantage accrues to those who receive grants
Really? That's a rather ironic argument given that you are posting it on the web which was something invented and developed at CERN using publicly funded research money.

Your idea of practicality has nothing to do with open access, it's a justification for keeping a lid on it.
So why are you also not complaining that museums with publicly owned collections are not displaying every single item they own? Do you want them to stop researching collections and making acquisitions in the public interest and instead spend money on building thousands of square metres of new display space so every item they own can be displayed?

The public may own the data but there is a cost to making that data publicly available. My own experience has shown that even when that cost is met the public actually have almost no interest in looking at that data. I absolutely zero objections to making all the data publicly accessible provided someone is going to pay for all the network bandwidth, servers, system administration, disk and tape storage, network connections etc. needed to access the data. However as a member of the public I would question whether that is a sensible way to spend all the money required to provide that access and argue that that money would be better going on research. After all that additional money going on data access corresponds to fewer postdocs and graduate students working on the experiment which, unless the data is wildly popular, probably means fewer people using it not more.
1. Re:Oh the irony by turkeyfish · 2014-02-25 18:38 · Score: 1
  
  Excellent point. Given the modern GOP who are reluctant to even spend money on people, who are starving, its hard to imagine that they are going to be forthcoming with hundreds of millions of dollars necessary to maintain "all relevant data" in archives and repositories that are available on line in electronic form. Having lived through the era of Proxmier's "Golden Fleece Awards", it is totally predictable how politicians would howl at having to fund all sorts of projects that they could mischaracterize out of context as an excuse to cut science budgets further. In the current climate we would probably see legislation calling for the execution of scientists who somehow "mishandle" data. Look at the grief Michael Mann was put through for no good reason. I certainly wish it were true that politicians would see the value and benefit of funding the archival of data, but judging from the behavior of this GOP congress toward scientific research and its funding, such thinking is pure fantasy.
unless... by l3v1 · 2014-02-25 19:18 · Score: 1

"This is good news for replicating experiments, building on past results, and science in general."

It is, unless the data can't be made "publicly available, without restriction" (very important emph. added), in which case you can't publish there. Yes, there are others, but demanding dropping all restrictions in all cases is simply an approach blind to reality. Also, if they demand so, they must provide free storage, which in some cases could range to multiple gb of data - and you won't want to pay for indefinite storage of large datasets, for certain.

Also, I wish to repeat my hatred towards the kind of open access publication methods most (if not all) major sci outlets use, namely charging the author many thousands of USD/EUR for publication, costs which most grants don't cover (e.g. my institute mandates open access publications, but of course they don't provide the financial resources to do so). This in turn shifts the focus, since now it's in the best interest of a publisher to accept as many as they can (keep the money flowing), instead of accepting the best ones and get the money from interested readers (and yes, if it's good, they come). Of course politician-scientists like the publicity they get from folks for trying to 'set science free'. I just wish they'd do a bit more thinking, they are scientists after all (or so they claim to be).

--
I am putting myself to the fullest possible use, which is all I can think that any conscious entity can ever hope to do.
http://xkcd.com/1321/ by mitzampt · 2014-02-25 21:34 · Score: 1

Yeah, it's getting colder outside on the global scale. Just look out the window every winter. It's all the proof I need. This winter the snow excess here was a football field-size snowflake. Those damn alarmists don't know what they're saying, let's just wait and see how wrong they are.

--
uhm...
PLOS should host it by Antonovich · 2014-02-25 21:40 · Score: 1

There would seem to be a relatively easy solution to this problem - make the raw data available from the article itself, or at least as an attachment. If that requires petabytes of storage, then presumably PLOS will provide the necessary infrastructure. That way they can ensure that as long as the article is being offered, all data used is also available. Does that sound unreasonable considering their requirement?
They'll not get much medical publication then by Anonymous Coward · 2014-02-25 23:08 · Score: 0

Because agribusiness, Biotech and Medical is Big Bux, they don't want free and open access to their data. They won't let people send to PLOS.
RIP PLOS by wanax · 2014-02-25 23:09 · Score: 2

It goes way beyond just genes and patient data. First, there's the issue of regulation. In most biology/psychology related fields, there's a raft of regulations from funding sources, internal review boards, the Dept. of Agriculture (which oversees animal facilities) and IACUCs for example that make it impossible to comply with this requirement, and will continue to do so for a long time. No study currently being conducted using animal facilities can meet this criteria, because many records related to animal facilities (including the all important experimental protocol) must remain confidential by statute (with the attestation of compliance from the IRB and IACUC). Likewise in the case of (any) human research, you'll have to get a protocol past the IRB for protecting subject anonymity, and given the likelihood of inadvertent identity disclosure that will extremely difficult to do.
Second, there's a deep flaw in how the policy is written and how it conceives of data. To wit, the policy defines: "Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances."
Now for starters, there's a loophole big enough to drive several trucks through: In many experimental contexts material necessary for complete understanding of the 'raw data' are not in digital form, but rather in say, lab notebooks. Which leads to the broader issue: what most researchers would be actually interested in seeing publicly disclosed is the 'data set' which is not 'raw data', but data that's processed into a useful, compact form that's suitable for statistical analysis.
However, in many experiments all of the material necessary to understand the 'raw data' (which I'll definite here as the measured result of an assay in a very general sense) is distributed between lab notebooks, digital data collection, calibration and compliance records in facilities archives and several levels of processing often using proprietary and very expensive software. Even if all of those things could be published (see above), the 'raw data' would be mostly worthless because of the vast amount of time and effort required in many cases to turn the 'raw data' into the 'data set'.
The third problem of course, which has been addressed in several places already on this thread is that there's no money in grants to fund the required repositories.
I think at some level this policy is a noble idea, but it's been implemented in a terrible way, and obviously written by people in fields that already have functioning, funded public databases. Either people are going to stop publishing in PLOS from many fields, or they'll drive the truck through the loopholes and it'll be just a toothless as Science and Nature's sharing requirements.
If they really wanted to effectively push for greater transparency, what they should be pushing at the moment is simultaneous publication of the 'data set', which would let fields that don't have standardized databases in place to design standards that would allow their creation.
Finally: Forcing Researchers to Standard Data by fygment · 2014-02-26 01:34 · Score: 1

Rather than publishing on proprietary data of uncertain characteristics, this will essentially force researchers to use common, known, and available data sets. A smattering of what's available and reputable:
http://www.itl.nist.gov/div898...
http://www.keypress.com/x2814....
http://lib.stat.cmu.edu/DASL/
http://www.statsci.org/dataset...
http://data.gc.ca/eng/facts-an...
http://library.med.cornell.edu...

--
"Consensus" in science is _always_ a political construct.
Re: ... I'm willing to be convinced either way by fygment · 2014-02-26 01:38 · Score: 1

We won't know the result BUT yeah, finally researchers will have to really provide transparency on their work.
That works both ways though. Now Exxon et al will also have to show their justifications with hard numbers whose origins are clearly replicable.

--
"Consensus" in science is _always_ a political construct.
Healthcare by alzaid.saud · 2014-02-26 07:00 · Score: 1

This would be revolutionary if applied to healthcare. It would mean that datasets could be recycled and meta analysed for rare tumors, rare cancers etc. It would also mean that drug companies will have to behave. Problems which may lead to panic such as how confidential data would be is often addressed at institutional review boards which vet the ethics of any study prior to its initiation at most institutions on the western hemisphere and based on personal experience dealing with studies involving patient data rarely are hing like filing number or ID codes used, neither is complete genetic data (pragmatism and practically would make it a little difficult to make a study with complete genetic code for 500 patients prohibitively expensive) Overall I hope this becomes a trend similar to that of open access which they have championed in the past.
Another open source journal that looks good by aurizon · 2014-02-26 12:15 · Score: 1

http://elife.elifesciences.org...