How Open Source Could Benefit Academic Research

Just as long as a central body maintains quality by eksith · 2013-01-29 12:14 · Score: 1

Open Source by default has the benefit of many eyes checking for errors, contributing ideas, but things get sour when too many people commit.

Too many chefs etc... etc...

A failure to be first to publish results in a loss of potential peer recognition which in turn results in a significant impact on future funding and employment opportunities.

I think this will change when there's an open record of who came up with an idea first. Wouldn't it be quite a bit harder to say "I came up with that" if we don't know about your ideas until after your paper is published?

It is so specialised that it is unlikely to attract significant contributions. Furthermore, where the work builds on existing open source solutions, it is likely to be seen as a "code dump" (a significant chunk of work rather than a set of incremental changes working towards a defined goal). Most community projects are resistant to code dumps because they are usually hard to review and difficult to maintain.

This is a false premise, IMO. By default, all changes are incremental. Dumps happen when there's poor coordination between parties involved an no one's really sure of what they're working on.

The "quick hack" nature of academic research software further contributes to the lack of maintainability and reusability.

There needs to be oversight by competent, impartial people who can pinpoint conflicts early, look for logical problems (this is harder when you're too involved your own research/programming/souffle) and most importantly, let ideas through that actually contribute to the overall understanding of the subject.

This doesn't apply to code per se, but really any research: If the global warming controversy is any hint, research is prime realestate for astroturfing. Empirically observed fact is no match for the perceived reality of the ignorant when special interests are involved. Open source, with research wouldn't just be beneficial to the programming aspect, but it would also ensure we're not walking into a wall with critical thought.

--
If computers were people, I'd be a misanthrope.

Shameless Plug by adam.rankin · 2013-01-29 12:18 · Score: 1

Shameless plug: http://perk.cs.queensu.ca/software We do exactly this. Our software is open source for anyone to use/test/fix. We do use SVN to maintain some control over the code that is commited, but overall it works quite well. We have just launched some projects on github; it's a new experiment and we're interested to see how it turns out.

Sometimes publishing code loses you papers by WillAffleckUW · 2013-01-29 12:24 · Score: 2

Sometimes, when you publish the code you used to develop new Biochemistry or Genetics solutions, you find that other scientists in other countries use your code to reverse engineer what you are working on - your results, if you will - to eliminate dead ends and publish a paper on what you invested years finding a solution for, but before you submit your paper that they "effectively" stole.

We had that happen when we deposited ligand results a few times, until we learned to stop submitting such things until AFTER we were approved for print.

This is one reason for hesitancy that I can agree to. Just because I wrote code, doesn't mean I want you to have it, if I haven't published the end result.

After it's in print, you're welcome to have the code. Not before.

--
-- Tigger warning: This post may contain tiggers! --

Re:Sometimes publishing code loses you papers by icebike · 2013-01-29 13:01 · Score: 1

Then too, there is a whole mash of what passes for "code" that is written, which is specific to which particular machines you have in your lab, and how you have
to extract data from those machines, followed by a lot more code that is off-the-cuff stuff to check some wild idea the researchers thought up over lunch.
On one lab we worked for there were several types of "software" being developed. One is automated data extraction from machines, some of which we had to sign NDAs to enable use access to their internals, another area was the cataloging of results, and a third was actual data manipulation.
For cataloging, we simply used an off the shelf database system. But even the full list of elements in the database revealed more than the researchers were willing to share, so that had to remain under NDA as well.
The actual data manipulation was more often than not simply database extractions (primary data reduction), feeding somewhat less monstrous spread sheets ("hunch" analysis, and "what-if" queries), finally filtered into a few specialized runs thru off the shelf statistical packages (for the rigorous analysis).
The only thing that made it into their publication was the input & output of the standard stat package, plus the data sets used there in.
What else was there that would be if interest to any other researcher?
It was all pretty site specific, device specific, field specific (virology) etc. Its amazing how little code was actually written, it was mostly lashing packages together. We ended up licensing some of the machine code back to the equipment suppliers. They now "give" *cough* it away when you buy their equipment.

--
Sig Battery depleted. Reverting to safe mode.
Re:Sometimes publishing code loses you papers by fermion · 2013-01-29 14:01 · Score: 1

My take is the following. Software is part of the process. A good scientific paper includes everything that one needs to reproduce the procedure. Otherwise it isn't science but propaganda.
In my younger days I was involved im developing software for custom data acquisition and analysis. I can recall two instances where results were skewed and hypothesis were developed based on the skew. If the software had been looked at by more researchers, and used in more labs, the errors may have been more quickly caught.
The thing with science is that it is often really new. It is easy to think that you are on the correct path, when you are not. The only things that keep science honest is to have other people checking your back.

--
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Re:Sometimes publishing code loses you papers by WillAffleckUW · 2013-01-29 14:13 · Score: 1

Not arguing with you about publishing.
What I am describing is the use of publishing either data or software BEFORE actual publication.
The timing is what is in question, not the (eventual) publication.
In certain cases where we do genetic studies, we find that publication of inheritance trees too specifically identifies specific people, who are still alive, and so we may have to not specifically publish all of the tree, as that would publish data which one is not authorized to publish about the people who inherited the genetic disease trait from. Qualified researchers who sign an NDA for the IRB can see it, but publication - once in the wild - means everyone can figure it out using public records data.
Submission of biochemical structures too early, prior to publication, can also result in similar things.
Same goes for code. If it's too specific, the very naming and methods mean that someone else could rush data to publication before the originator of the methods, code, and data was able to be published. This would be bad - as it incentivizes cheating - where, PRIOR to Publication, one avoids the pitfalls that would make things take years, so that one can do work in months that one did not in fact create. AFTER it is published, this is fine, especially as a check, but not BEFORE.

--
-- Tigger warning: This post may contain tiggers! --
Re:Sometimes publishing code loses you papers by the+gnat · 2013-01-29 14:20 · Score: 2

Submission of biochemical structures too early, prior to publication, can also result in similar things.
I can also confirm that this happens occasionally. Biologists are also reluctant to describe unpublished results at meetings unless the article is already accepted and scheduled for publication - it's a real shame, but I understand why they're reluctant. However, there is no requirement that they reveal any data before publishing; it's not like every crystal structure automatically goes into the PDB before they even get a chance to write (which really would be catastrophic).
Re:Sometimes publishing code loses you papers by RDW · 2013-01-30 02:58 · Score: 1

Sometimes, when you publish the code you used to develop new Biochemistry or Genetics solutions, you find that other scientists in other countries use your code to reverse engineer what you are working on - your results, if you will - to eliminate dead ends and publish a paper on what you invested years finding a solution for, but before you submit your paper that they "effectively" stole.
Fair enough, though sometimes getting out of the habit of 'releasing early, releasing often' can put academic developers on a slippery slope that ends with them closing the source. We use a well known (and excellent) suite of genomics software called GATK, originally MIT-licensed. Last year, the developers announced they were switching to a hybrid license, where the latest (unpublished) tools would only be available under closed source terms. The core (now 'lite') package would remain Open Source, and supposedly the new stuff would migrate to it over time as papers were published, etc. Now this has been retconned as a 'interim solution', and in all future versions the Open license will only apply to a basic framework with most of the useful stuff stripped out. Quite a few members of the genomics community are rather upset about the license changes, especially as there's a strong Open Source tradition in this field (a typical GATK data processing pipeline will depend on major components written by other developers that remain Open Source):
http://biomickwatson.wordpress.com/2013/01/28/gatk-why-it-matters/
http://blastedbio.blogspot.co.uk/2013/01/free-for-non-commercial-academic-use.html
Re:Sometimes publishing code loses you papers by nmr_andrew · 2013-01-30 07:08 · Score: 1

Unfortunately, what you say is only sort of true.
In order to publish a paper that includes a structure in any reasonably reputable journal, you have to provide the accession code (PDB ID) along with your manuscript. The only way you get that code is to deposit the structure in the PDB. You may also have to deposit diffraction or NMR data, depending on how the structure was solved.
When you submit the structure to PDB, you can indeed elect to have the coordinates held for publication. However, there is a maximum delay of one year. Normally not a problem, but in the most recent paper I'm an author on, the reviewers decided they wanted significant additional biological data that took longer than that. So although the paper itself just came out in the last week, the structure has been available at the PDB for over 6 months.
Re:Sometimes publishing code loses you papers by lbbros · 2013-01-30 08:34 · Score: 1

Oh yes, the GATK debacle. A pity, because it's such a useful tool, however since the license change, I'm phasing it out from my work.

--
A CC-licensed illustrated horror novel
Re: Sometimes publishing code loses you papers by nmr_andrew · 2013-01-31 06:37 · Score: 1

Fair enough, they're not automatically released. You can either let RCSB release the structure, or you can withdraw the entry and start the process over again. In the last few years, the 1 year deadline has become somewhat firm. Direct quote from the email I received ~6 weeks before the one year deadline:
"We request you to confirm release or withdrawal of your entry..."

They should open resource their research too by CuteSteveJobs · 2013-01-29 12:27 · Score: 1

The discoveries, algorithms and parameters generated by publicly-funded research is locked behind the paywalls of for-profit publishers. Those publishers won't publish an article unless the academic SURRENDERS THEM THE COPYRIGHT OF THEIR RESEARCH PAPER FOR FREE. The only reason these publishers have survived is because academics want their research published in the most prestigious (read 'expensive') journal they can find. Academics could benefit from 'open-sourcing' their research too.

"Academic publishers charge vast fees to access research paid for by us."
http://www.guardian.co.uk/commentisfree/2011/aug/29/academic-publishers-murdoch-socialist

"Academic papers are hidden from the public."
http://www.badscience.net/2011/09/academic-papers-are-hidden-from-the-public-heres-some-direct-action/

Re:They should open resource their research too by Obfuscant · 2013-01-29 12:58 · Score: 1

The discoveries, algorithms and parameters generated by publicly-funded research is locked behind the paywalls of for-profit publishers.
Publishing the results has very little to do with the reason the code isn't open source. The copyright for code and data doesn't transfer to the publisher.
The most likely reason that code is not open-source or reusable is that it has been written by a graduate student to process a specific data set for a specific purpose. The grad student has little reason and no time to deal with creating an open-source project where others may make demands on his limited time to add/fix/change the code to make it usable by others. He may give it away, but once it forks this way it's a stepchild and unsupported.
There are open source projects in academia, but most of those aren't managed by grad students, and paying a professor to manage an open source software project isn't usually part of any grant. Sometimes there is money for technical support, but not always.
Re:They should open resource their research too by codegen · 2013-01-29 12:59 · Score: 1

The only reason these publishers have survived is because academics want their research published in the most prestigious (read 'expensive') journal they can find.
Not most expensive, most referenced. Your career as an academic is largely based on how many people reference your work. Your ability to recruit grad students, attract research funding (to pay those students) depends on how many papers you publish and how many people cite your work. Some funding agencies are a bit more lenient allowing the referees to assess the paper quality directly, but others have strict rankings of publishing venues and how much a paper in each venu is to be evaluated as. Until you break that cycle and allow the grant referees to evaluate the research directly, the publishing target will be in venues that get you the most hits.

--
Atlas stands on the earth and carries the celestial sphere on his shoulders.
Re:They should open resource their research too by blueg3 · 2013-01-29 13:04 · Score: 1

The discoveries, algorithms and parameters...
No, only the reports about them. Not only are the general facts uncopyrightable, a paper is not the same as the substance of the research. It's a report on the research. It's still important, but that's not controlled by publishers. (For that matter, algorithms and parameters are frequently not published in papers. Regardless, you can generally get it from the researcher themselves by asking, unless it's still an active area of research for them.)
Even so, access to papers is frequently more a theoretical problem than a real one, depending on discipline. Google Scholar exists. A large fraction of papers are available online as preprints. Search for the paper. Get preprint. Hooray!
Re:They should open resource their research too by joe_frisch · 2013-01-29 16:55 · Score: 1

The government funds labs partially based on the number of publications they publish in "high impact" (almost always non-public) journals. Write your congressman (really!) that publicly funded research needs to be freely available, and that government needs to stop funding science based on publications in non-free sources.
Of course in that case you need to suggest a different metric for scientific success to allow the government to allocate limited funds between labs. This I think is the big sticking point.
Re:They should open resource their research too by iamwahoo2 · 2013-01-29 17:26 · Score: 1

You hit the nail on the head and I think what you wrote applies just as much to any researcher as it does graduate students. Journals are for the most part filled with papers of academics because institutions incentivize academics for the publication of papers through degrees and tenure and what-not. These institutions generally fail to incentivize the publication of a more complete set of data, code, and other useful things, so they are a low priority.
Re:They should open resource their research too by ceoyoyo · 2013-01-30 12:59 · Score: 1

Actually, not that many actually ask for transfer of copyright. And most academics completely ignore that anyway. Plus, if you really want to know, you have only to go to a library. Yes, I know it means leaving the basement.

many already do this by call+-151 · 2013-01-29 12:30 · Score: 2

There are many open-source research software efforts already, and of course it would be good to see this become more widespread. These range from small-scale individual researcher one-off efforts to broad multi-institution efforts that are well-maintained over years. The software that I develop in the course of my mathematical research is available freely from our webpages, with intermittent downloads. And I still get inquiries about using it, to which I just say that it's all on our webpages already.

One barrier to broader efforts in the US is that science agencies (at least the National Science Foundation) generally support research proper, rather than development of tools. Oddly, I am much more likely to get a grant to work out research that perhaps 20 to 50 people may be interested in than I am to get a grant to develop research tools that may be useful in furthering research to a few hundred researchers. Nevertheless, it is more common that universities and funding agencies expect data and software from research to be freely available. Many people drag their feet on these requirements as they are worried that some other researchers will use their tools to scoop them, but I think these instances are very rare.

--
It's psychosomatic. You need a lobotomy. I'll get a saw.

Re:many already do this by jpeaton · 2013-01-29 23:43 · Score: 1

This is very true. And it's not just in software. I work with a specific (hardware) science tool. Development of this tool, and making it's capabilities to more researchers could boost research for hundreds of researchers worldwide. I develop this inbetween those projects which lead to "RESULTS" by which we mean papers. This is because my funding agency will not fund a project to develop a tool for science, only "true science", or something technological which *might* lead to something that could be sold in the millions.

I am a scientist who has made "code" by brillow · 2013-01-29 12:41 · Score: 5, Insightful

The software I have written for my odd specialized purposes is similar to the software my colleagues write: It's spaghetti code written with custom libraries which are not better than common ones and it has no documentation at all.

We could open-source it, but then you'd just bitch about how poorly its constructed.

We don't have time to open-source our code. Heck, I've had people ask to use software I've made and I've regretted giving it to them because I then am obligated to explain to them how to use it.

Re:I am a scientist who has made "code" by pswPhD · 2013-01-29 13:01 · Score: 1

I've done research in Chemistry, and have heavily used open source quantum chemistry codes (mainly NWChem and Quantum Espresso. I am grateful to these guys for releasing these codes.
MY code on the other hand is not available. most of it is a bundle of scripts designed for one process that would make all the programmers on /. cry. the rest is in Fortran, which compiles using one compiler on one machine, and seg faults everywhere else. It's not open source because no one in their right mind would want it.
That said, if you want the raw data, all anyone needs to do is ask.
Re:I am a scientist who has made "code" by the+gnat · 2013-01-29 13:07 · Score: 3, Insightful

I've had people ask to use software I've made and I've regretted giving it to them because I then am obligated to explain to them how to use it.
As someone who writes academic software specifically for distribution, I can confirm that this is a gigantic time suck, and one which the funding agencies generally do not support. We are judged both on scientific innovation and publication record, and on whether our tools are adopted by the community - but the latter frequently interferes with the former. I basically wake up to an inbox full of bug reports and feature requests every morning, and I have to find time to deal with these in addition to all of the actual science I'm supposed to be working on. Despite being an obvious sign of success (people actually use our software!), it's become so discouraging that it helped drive out one of my (very competent) ex-coworkers.
Re:I am a scientist who has made "code" by VortexCortex · 2013-01-29 13:19 · Score: 1, Interesting

I basically wake up to an inbox full of bug reports and feature requests every morning, and I have to find time to deal with these in addition to all of the actual science I'm supposed to be working on. Despite being an obvious sign of success (people actually use our software!), it's become so discouraging that it helped drive out one of my (very competent) ex-coworkers.
You're doing it wrong then. Just because you release source doesn't mean you have to maintain it. If you don't maintain it, and it's important enough, then some one else will. Typically I find that people who are in your position cling too tightly to the reins. If you love it, set it free. Check up on it from time to time, hell, even if it's forked and you want to add a feature you need to the code you have two options: a) modify the onsite version you keep and push out the source; Letting the forkers figure out how to merge that feature if they like, or b) adopt the latest version of the forked code and make your changes there.
For fuck's sake people, you make it sound like simple resource management is a form of rocket science.
Re:I am a scientist who has made "code" by the+gnat · 2013-01-29 13:58 · Score: 2

You're doing it wrong then. Just because you release source doesn't mean you have to maintain it.
When I say "users", I do not mean "other programmers", I mean scientists who generally don't know a fucking thing about programming, except maybe rudimentary FORTRAN (which is not what I use), and are busy with their own research which does not leave them any time to fix other peoples' software. They are utterly incompetent to maintain our code for us, and the only people besides us who are qualified are our competitors, who are either too busy with their own projects, or wouldn't pour water on us if we were on fire. Who else are the users going to send email to when something breaks, if not us?
In fact much of our code really is open-source and available on the web, so anyone who wanted to fix it would be welcome to. In practice we have a few external developers whom we work with, who have been very valuable - but the bulk of user support has to be done by us. Your response indicates that you've never had to support a non-technical user, because if you did, you'd realize what a clusterfuck it is.
Re:I am a scientist who has made "code" by anom · 2013-01-29 14:42 · Score: 1

I'm a PhD student and this is completely true; mod parent up.
A small portion of the time, someone writes a tool with the intention of writing a tool for community use, and that can sometimes end well.
Other times, someone writes something and it ends up becoming popular, and is usually hacked upon and hacked upon when it should just be rewritten from scratch with the intention of being publicly consumed.
In any event, it is not often that academia will pay for either of the two above items; academia simply isn't about writing software.
Re:I am a scientist who has made "code" by joe_frisch · 2013-01-29 15:52 · Score: 1

I'm a scientist at a big national lab (SLAC). We do open source / collaboratively write some code. There is a real-time distributed control system "EPICS" that is developed and maintained by multiple labs. There are programs like LIAR, ELEGANT, GENESIS, etc that are widely used for accelerator design optimization. For widely used programs like these, it is wort the (very large) effort to support them and make them usable. Even with this effort through, I dare anyone to get EPICS running without help from someone who has already done it. (all the components are free to download from Argonne national lab, and source is available).
I've written a tone of "code", mostly Matlab scripts that are used for everything from optimizing the operation of the accelerator to electronics design. Anyone who wants a copy can have one - but it won't do them any good. This sort of technical code simply can't be used by anyone except the original writer, or someone that writer has trained. The effort required to make this code general purpose and well documented enough for others to use is larger than the original writing of the code.
So, I think the solution we have (at least at the big labs) works. Most code written by researchers is not worth the effort to make it general use, though it is often available to anyone who really wants it. The general purpose code IS open source, collaboratively written.
Re:I am a scientist who has made "code" by c0lo · 2013-01-29 17:23 · Score: 1

The software I have written for my odd specialized purposes is similar to the software my colleagues write: It's spaghetti code written with custom libraries which are not better than common ones and it has no documentation at all.
Yes, I know the feeling of "source code like the underwear: if it's dirty, better not show it to anybody".

--
Questions raise, answers kill. Raise questions to stay alive.
Re:I am a scientist who has made "code" by serviscope_minor · 2013-01-29 21:32 · Score: 1

Typically I find that people who are in your position cling too tightly to the reins. If you love it, set it free.
For fuck's sake people, you make it sound like simple resource management is a form of rocket science.
I'm in a similar position to the GP and I have to say, you have no idea what you're talking about here.
I have released a number of things. One (most popular) is a C library requireing zero configuration and consisting of two functions. I've also released a decent sized library. The latter is also used a decent amount.
It takes a lot of work. Hands free simply isn't an option if you want people to use your stuff. The majority of scientists, even those working in pretty computery fields are not hackers or mega-coders. They are scientists foremost and generally know as much coding as they need to do their job and no more.
You get a lot of questions about basic stuff (like how to compile it at all) lots of questions about really basic stuff (how to set up their own code to link against it), a small number of bug reports, the odd patch (quite often something like a patch to fix an obscure install bug from a group sysadmin) and rarely, oh so rarely a new feature. The latter annoys me actually. I know that several people have implemented a feature (the same one!!) I wanted but never had time for and never contributed it back because "it wasn't ready yet" or "it's really rough" or "its not a general solution" or "my code isn't good enough", despite begging from me and my claims to the contrary.
If you "set if free" then it basically sits there and nothing happens.
The project is on github for free forking and I hand out commit access like candy and readily accept patches. And still noone contributes much.
If you don't champion your own project, cajole others into using it, aggressively fix bugs, help total muppets and beg users for fixes they have made then it will go nowhere, fail to compile on new systems and rapidly fall into obscurity. IOW if you don't put in the time, you may as well not bother releasing the code.
The reason I do it is basically becuase I wanted to promote my work and I actually enjoy it. The experience has proven useful to me, but I already had some commercial background so I was able to do it moderately efficiently too. And it's still not clear if it was "worth" the time from a time investment point of view.

--
SJW n. One who posts facts.
Re:I am a scientist who has made "code" by Ginger+Unicorn · 2013-01-29 23:14 · Score: 1

One solution that leaps to mind is that someone other than yourself is given the responsiblity of supporting the users and maintaining the software.
If money is required to enable this, couldn't the users or the institutes that benefit from this software be conviced to chip in and fund the support?

--
(1.21 gigawatts) / (88 miles per hour) = 30 757 874 newtons
Re:I am a scientist who has made "code" by nmr_andrew · 2013-01-30 07:18 · Score: 1

100% correct. I've released software I've written, as have others in labs I've been in. I've never been in a CS lab, and none of us are/were professional coders. Once it's out there, it's out there, but as you say, there's no obligation to support. I've answered the occasional emailed question, had semi-useful feedback, but that's about it. Generally, this is all software that was written to solve one specific problem (or type of problem). I try to write clean code with useful comments and even will put together a brief how to guide to release with it. But:
All software I've ever been involved with is distributed with a "license" that says you're free to use it, you're pretty much on your own. The only requirement is to acknowledge where the software came from if you use it in a publication, and we'd like to hear about any improvements or changes you make although that's more a suggestion than a requirement.

Re:Bad quote by smallfries · 2013-01-29 12:49 · Score: 1

Maybe it is not as simple as pick zero / one / two. The purposes of writing software for research and engineering software for reuse are so different that it doesn't make sense to try and compare them. Going back to the summary:

What many academic researchers fail to understand is that this specialization problem is not unique to research projects. Most software developers will seek to provide an adequate solution to their specific problem, as quickly as possible. They don't seek to build a perfect, all-purpose, tool set that can be reused in every conceivable circumstance.

No. What the author of the article fails to understand is that software is not the point of research - it is a side-effect, and I say that as someone whose field is CS. We do not write software in academia because we want the software - we simply want the data about its behaviour that we can get from it. It doesn't matter if business / hobbyists / academics have in common an approach that builds software for the least effort. In the first two cases the software is being written because there is a need for it to be used. In the latter case it simply needs to exist in some form long enough for some data to be collected and then it is obsolete. This difference is purpose is so vast that it renders the rest of the argument in the article as not even wrong.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php

This is happening in statistics... by djmurdoch · 2013-01-29 12:53 · Score: 2

The Journal of Statistical Software is an electronic journal that publishes software. It tends to publish R packages because that's where the development is mostly happening these days, but it will publish any language. The refereeing process checks that the software works as well as that it is a good contribution. It has a reasonable reputation, far above the junk journals on Beall's list (Google it if you don't know what that is), though not as high as the better mathematical journals in the area. The R Journal has a similar goal, but it's newer, and the reputation isn't there yet.

I review grants, and I give a lot more credit to software published somewhere like JSS or the R Journal than to software available on someone's web site.

So some academics do get credit for this.

it's already out there by thegreatemu · 2013-01-29 12:53 · Score: 1

at least in the particle physics community, practically all anyone uses is open-source code. The most common are GEANT4 for simulating particles interacting with matter, and ROOT which handles data analysis. Both are maintained by dedicated people at CERN.

As to more specialized code, any time I've ever asked someone about their analysis, no matter what institution or relation (or lack of) to me, they've always been happy to share their code source with me. Usually with many caveats about quality, but it's there. The problem for us has always been knowing who to ask, so a dedicated central repository could be interesting.

Maybe a model like the arxiv.org could work. Almost everyone these days puts preprints of upcoming papers on the arxiv. Since there's no review system, you also get lots of garbage from crazies, but it's generally not hard to weed out if you know at least a little about the subject matter of your search, and trivial if you know the relevant big names in your field. In the same vein, a huge code repository where anyone could upload their junky scripts, tagged by name and subject/function/whatever, might work better than it would seem at first glance.

Re:it's already out there by serviscope_minor · 2013-01-29 21:45 · Score: 1

they've always been happy to share their code source with me. Usually with many caveats about quality, but it's there.
I've been on the other side of that.
But when I say, "sure, but it was written when I was a student in 2003, against libraries that will not compile on a modern OS/compiler (the newer liberaries are incompatible), it's spaghetti code due to being research code, you'll need your data converted to this hard to convert to custom format and there are better, simpler algorithms out there now for 1, 3 and 4 of the system (it has 4 stages).", I don't get any takers.
It's my code and I wouldn't bother to resurrect it at this point.

--
SJW n. One who posts facts.

I blame the Bayh-Dole act by the+gnat · 2013-01-29 13:02 · Score: 2

Most academics are under tremendous pressure to keep anything of potential commercial value closed; releasing code as open-source generally requires permission from above. (In fact, I know of one professor of biology who had to fight to get a line in his contract explicitly allowing him to open-source everything.) And it's not like most of them need encouragement; none of us are getting rich off NIH grants (well, most of us aren't) and we effectively hit a salary ceiling early in our careers, so the prospect of a few thousand dollars extra in licensing revenue is more than most can resist. In several cases that I'm aware of, the licensing money is used to support research activities - sometimes enough to pay for an entire employee, or pay for meetings that wouldn't happen otherwise. Note that in many cases the code itself is still available, just not under a license that allows distribution, which usually makes it difficult or impossible for anyone who wants to build on your work to do so.

Of course it's not always this simple - junior researchers have very little control, so many of us end up releasing code under proprietary licenses when we'd much rather open-source everything. I also know of many cases where paranoia and competitiveness, rather than avarice, are at fault - in these cases, the code itself is hidden and the software released as binary-only (which as far as I'm concerned should be unacceptable for anything published in a peer-reviewed journal, regardless of the license used). Regardless, there are simply too many incentives to retain full control.

This is a completely idiotic situation, of course, and it has been holding back science for years - I know of multiple cases where university researchers were effectively doing R&D for private companies (not always willingly!) with very little in return. I've also seen researchers prevent widespread adoption of their work (and hamper their career advancement) because of tight-fisted behavior. One asshole even charges other academics to obtain his software, with the result that some people avoid using it altogether. Frankly, since I have to deal with this bullshit on a near-daily basis, as far as I'm concerned a repeal of the Bayh-Dole act (and its equivalents in Europe), at least where software is concerned, would be a huge leap forward for academic computational research. The bonus I get from licensing fees is simply not worth the trouble and missed opportunities.

Re:I blame the Bayh-Dole act by guacamole · 2013-01-29 13:52 · Score: 1

In many fields, such as social sciences, most of the "code" is simply a bunch of MATLAB, Stata, or GNU R scripts, with virtually no commercial value. I'd be surprised the universities have any issues with releasing this. In fact, the faculty who want, simply post the code on their web page. The issue is that most people do not post either their code or data, and I know why. Most of the time, the code is just terrible. I know of smart tenure track faculty who rely on MATLAB to compute anything for their papers, and yet their code looks like they haven't consulted a manual for a long time. I can imagine how many published papers in fields like economics or public policy would require correction addendum once people were able to download, read and find bugs in the code.
Re:I blame the Bayh-Dole act by terec · 2013-01-29 15:55 · Score: 1

Most academics are under tremendous pressure to keep anything of potential commercial value closed;
Nonsense. There is tons of academic open source. And Bayh-Dole doesn't apply outside the US.
The real reasons for not publishing source in academia are much simpler: it's ugly, people don't want to spend time support messy code, or they think you haven't finished publishing yet. Generally, if you ask, you get the code.
Re:I blame the Bayh-Dole act by the+gnat · 2013-01-29 18:32 · Score: 1

There is tons of academic open source.
And there is just as much that is either closed-source, or not redistributable. It depends on the institution and the researchers involved - ultimately the professors have the most say in this; grad students and postdocs will do whatever they're told.
And Bayh-Dole doesn't apply outside the US.
Most other nations which can afford to fund basic research have similar provisions - I know this because all of our competitors outside the US have licenses which are equally restrictive (sometimes more so).
I should clarify that my comments primarily refer to what I'd call "computational scientists", where developing software really is a primary goal. I'm not talking about some statistics professor and his collection of godawful MATLAB scripts; in some cases these are large software packages with worldwide user bases. In my field it is very tempting to try to extract money from pharmaceutical companies, and some people do very well this way.
Re:I blame the Bayh-Dole act by lbbros · 2013-01-29 18:50 · Score: 1

Actually, no. Code can be the matter of a paper - and by releasing it, you may break the "novelty" aspect and never publish anything.
I have a bunch of software I've been very willing to set free (it has already even GPL3 headers!) but I can't, because it might be publishable one day.
And so, it'll keep on being hidden...

--
A CC-licensed illustrated horror novel
Re:I blame the Bayh-Dole act by dkf · 2013-01-29 21:26 · Score: 1

I have a bunch of software I've been very willing to set free (it has already even GPL3 headers!) but I can't, because it might be publishable one day.
And so, it'll keep on being hidden...
Sounds like you need to publish more often rather than whinging about the whether it "might be publishable one day" on slashdot. There's no point in sitting on stuff so long that it becomes irrelevant, and code that's just cowering in a dark corner of your disk might as well not exist at all.

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:I blame the Bayh-Dole act by terec · 2013-01-29 22:50 · Score: 1

Most other nations which can afford to fund basic research have similar provisions - I know this because all of our competitors outside the US have licenses which are equally restrictive (sometimes more so).
You're engaging in circular reasoning: you conclude that other nations must have similar laws because they have similar licenses, but that's your hypothesis, namely that similar laws lead to similar licenses.

It depends on the institution and the researchers involved - ultimately the professors have the most say in this
First you say "most academics are under tremendous pressure to keep anything of potential commercial value closed" and now this.
In fact, it *is* largely up to the professors, anywhere. Even in the US, any academic who has created "computational codes" that are widely used in the community would likely have enough power to push through open sourcing it, no matter what rights the institution has. Because if it's really important code, he can just say "OK, you keep the code and enjoy it; I and my grad students found a company and I'm off to another institution".
Professors choose not to open source for a wide variety of reasons, commercialization only being one of many.
Re:I blame the Bayh-Dole act by the+gnat · 2013-01-30 03:50 · Score: 1

Code can be the matter of a paper - and by releasing it, you may break the "novelty" aspect and never publish anything.
This has not been my experience - we release new features almost as soon as the code is written, sometimes years in advance of publishing anything, and we have never had a problem getting the eventual articles accepted. If anything it is beneficial to release early and often, because then we get credit for having come up with the idea first. Many of our competitors do the same. Again, it may depend on the field, but putting source code online is generally not counted as prior publication by journals.
Re:I blame the Bayh-Dole act by lbbros · 2013-01-30 07:24 · Score: 1

I had already tried - and got rejected. It's not that "sits in the dark corners of my disk" - I use it regularly (daily), but I'd love to spread it around (also more eyeballs around etc etc).
It doesn't help that I'm the only one doing this in my institution.
And to answer other replies, I had a *huge* flame with a Detroit professor because he wanted to keep other things closed - luckily I won that battle and the stuff went out as LGPL.

--
A CC-licensed illustrated horror novel
Re:I blame the Bayh-Dole act by jafac · 2013-01-30 11:56 · Score: 1

In MIDDLE SCHOOL, I took a science class.
Science is composed of:
Create a Hypothesis.
Write a Procedure.
Record Data.
Test the Hypothesis.
Other scientists independently reproduce based on your experiment.
So - if software (part of the procedure) is released closed source. . . then how in hell are other scientists supposed to reproduce the work?
This goes against the most very basic principles of science.

--

These are my friends, See how they glisten. See this one shine, how he smiles in the light.

Re:Bad quote by Stewie241 · 2013-01-29 14:29 · Score: 2

So for clarification, I think you missed the point of what the GP was trying to say. The statement in the summary as written suggests that highly specialized software solutions are commonly engineered for reuse.

Based on the context of the summary, it should probably say either:
1. These highly specialized software solutions are not engineered for reuse.

or

2. These highly specialized software solutions are rarely engineered for reuse.

Most research in computer science is available by godrik · 2013-01-29 14:37 · Score: 1

Citeseer and google scholar contain a large amount of scientific papers freely accessible. Many journals have open access policies. Many researchers publish their result on arxiv before sending it anywhere else. IEEE and ACM let their members access papers (IEEE policy at http://www.ieee.org/publications_standards/publications/subscriptions/prod/mdl/mdl_overview.html . ACM's policy at https://campus.acm.org/public/qj/profqj/qjprof_control.cfm?form_type=Professional . SIAM's policy http://www.siam.org/membership/individual/benefits.php ). So ok, it is not free, but that's not really expensive either if you are actually interested. Most researchers publish preprint on their website. If they don't, drop them an email they'll send you a preprint (if I had not put it on my website, I would send a preprint.)

Assuming you could not find it. And the author is a jerk. And you don't want to pay for it. You can still stop by a university libray where you will be able to download it using university subscription or photocopy it if the library has a paper edition.

Finally, we are not looking to send our papers to the most expensive journal. To the most prestigious certainly, but the price has nothing to do with it. Arguably, one of the most prestigious journal in CS is ACM Computing Surveys. It is an ACM journal, so all ACM members can read it online for the price of their subscription. Hardly the most expensive journal.

That being said, I'd rather we only publish in openaccess journal et we ditch the publishers out. But that's not realistically going to happen anytime soon.

Why? by nbsr · 2013-01-29 14:45 · Score: 1

Why would researchers publish their code? They have only one target - to get their *papers* published in reputable venues. More often than not, such venues are closed and paywalled, so it is not surprising that they do not enforce (in fact they discourage it, to say the least) opening up bits of research.

Some researchers would be happy to publish their code anyway (as a matter of principles, or to promote themselves through non-academia channels) but at best they would be frowned upon by their superiors for mis-allocating their resources. At worst, they would be accused of undermining team efforts (by disclosing too much information or exposing inconvenient assumptions to competing researchers) or risking legal conflicts with publishers (copyright).

As earlier mentioned, the code written as a part of research is often poor. This is caused by the same underlying mechanism - getting as many papers published with as little work as possible. That is not (only) about procrastination. The effort put into making the code better is better spent on work on another project.

As usual, "you get what you test for". In case of publicly funded academic projects this means "plenty of good enough papers and nothing more".

Re:Why? by godrik · 2013-01-29 15:01 · Score: 1

Actually, the copyright issues you are mentionning are important. I frequently end up mashing together code found god knows where. When we finally wanted to publish our code, we had to go through the various files and decide which ones we can publish and which one we can not publish because of copyright. We use them internally (and probably we actually should not, but nobody cares about that). But putting it on your website opens potential lawsuits.
We ended up scrapping auxilary functions and reimplementing a part of I/O that was relevant to us. It took a few days from a grad student. In the end, the time spent was worth it. But the code is just in a working state, as long as you do not blow on it too much. Proper documentation and making the code robust would certainly have taken weeks. We do not have time for that. It is not our goal, code is often a by-product.
Re:Why? by prefec2 · 2013-01-30 01:46 · Score: 1

Why would researchers publish their code? They have only one target - to get their *papers* published in reputable venues. More often than not, such venues are closed and paywalled, so it is not surprising that they do not enforce (in fact they discourage it, to say the least) opening up bits of research.
Well conference cost is normally paid by the university or institute. Therefore, this is an bad excuse. And in addition, if you have "proofen" something and publish it in a paper. How could I reproduce your study with out your data and software? It would be nearly impossible. Therefore, publish it. A good starting point is "e-science".

Some researchers would be happy to publish their code anyway (as a matter of principles, or to promote themselves through non-academia channels) but at best they would be frowned upon by their superiors for mis-allocating their resources. At worst, they would be accused of undermining team efforts (by disclosing too much information or exposing inconvenient assumptions to competing researchers) or risking legal conflicts with publishers (copyright).
What field are you working in? In CS code publishing in almost mandatory, otherwise no one will believe you. In geo sciences and marine research (as far as I can see), publishing research data is required. Publishing of the methodology is required. And publishing the tools is encouraged (see e-science), however, not everyone is able to do so for various technical reasons, e.g., not all steps in data selection and calculation were documented properly. Or a lot of the work is done by hand. but to release Mathlab code or Fortran code is becoming more popular. The only ones who did not share their code were the big-simulation guys.

As earlier mentioned, the code written as a part of research is often poor. This is caused by the same underlying mechanism - getting as many papers published with as little work as possible. That is not (only) about procrastination. The effort put into making the code better is better spent on work on another project.
True. Code of scientists is not that well written. However, if they would release it, it could improve. Right now they all reinventing the wheel all the time.

As usual, "you get what you test for". In case of publicly funded academic projects this means "plenty of good enough papers and nothing more".
This is presently changing with the e-science (and other) initiatives and direction of thought.

Re:Bad quote by godrik · 2013-01-29 14:53 · Score: 2

"What the author of the article fails to understand is that software is not the point of research - it is a side-effect, and I say that as someone whose field is CS."

(disclaimer: I am working as a postdoc for some US university)

The article in general is clueless. You are of course right. Researchers don't care about their code. I want to know if a design work, if an algorithm work or if it does not. That's why I end up writing code. Once my report/paper/thesis/grant application is written I do not care about the software anymore.

I'd love to produce proper software. But most researchers do not have a clue how to make good software. Software engineering is not our job. We typically do not know how to do *really* good software. That type of skills is not commonly found in grad students. You'll need a postdoc or a professor to do it well. PhD time is valuable, it *is* worth a lot of money. None of the money that come from grants pays for software development. Even if it was, my career would certainly advance more if I do research instead of software. (With occasional exception like "This is the holy grail. We need it done well.")
The only other option would be to pay a software engineer. Grants typically do not cover that. Some do, but most don't.
The final option would be to get somebody else to cover the software development cost. That can happen, but that's very rare. You'll need to find a company that need the proper edge the software will bring, that actually want to work with the academia and that is ok publishing the source code (so potentially losing the edge the project bring them.) That can happen, but do not count on it.

Finally, even assuming there is a useful software framework close to something I am interested in. What will be the investment cost for me to get in that software. Recently I was looking at Android programming for adding a calendar type. That stuff is ridiculously complicated with dozens of concepts and objects and all. And I am talking about a freaking calendar. All encompassing software tends to an overly engineered design. If it takes me more time to get into the software than getting my job done, why should I use it?

oh my by terec · 2013-01-29 15:51 · Score: 1

People should perhaps have a look at where open source actually started. In any case, there are reasons not to publish source that aren't nefarious: you haven't written up all the papers yet and don't want to get scooped, you don't want to spend a lot of time answering questions about it, etc. I think most academics really have these tradeoffs under control.

Depends on the Field by Roger+W+Moore · 2013-01-29 15:55 · Score: 2

None of the money that come from grants pays for software development. Even if it was, my career would certainly advance more if I do research instead of software.

This depends on the field. In particle physics where we have massive computational challenges grants can specifically fund software development. In fact when I was a grad student there were even permanent positions called physics programmers and software development certainly can be very good for your career as long as it is combined with physics analysis - at least it has not hurt me so far. As for "needing a postdoc or a professor to do it well" I very much beg to differ - and I say that as a professor! Programming skills vary considerably at all levels but good grad students, while lacking experience, can be a step or two ahead in terms of modern programming savvy than their older colleagues who are sometimes prone to the FORTRAN++ coding style!

Once my report/paper/thesis/grant application is written I do not care about the software anymore.

Again this varies by field. Monte-carlo simulators for particle physics have a life well beyond any one project and in fact can be projects in themselves. In fact you are reading this page using a software technology developed at CERN to assist particle physics research - the world wide web. So even if you don't care about it anymore sometimes software developed for research can be amazingly useful outside that research.

research uses a lot of open source by Khashishi · 2013-01-29 16:22 · Score: 1

Not sure what the intent of this article is, since academic research already uses a lot of open source software, far beyond use in industry. Knowing how to navigate a posix system is practically a requirement. Researchers also produce a lot of open source software. In my experience, software mostly falls into two categories: quick, hacked together scripts for analyzing data in a specific way, and complex simulations. The quick scripts generally aren't shared because it would take just as long to explain it to someone else as to rewrite it, and making a manual is simply a waste of time. But the quick scripts are written in a high level language which promotes the sharing of snippets of code, like math functions, commonly used analysis, plotting routines... The workstations at a lab are networked together and typically these little snippets get shared around the work group, and seem to find their way to other groups through collaborations and stuff.

Simulation codes are usually written in FORTRAN and are always distributed in source code form, because workstations have diverse architectures and typically a user will have to modify the program to fit his or her needs. Nobody cares about licenses and such, though you should probably include the code author in your coauthor list of a paper you publish using the code.

The small fish do share. by pigwiggle · 2013-01-29 16:52 · Score: 2

"Sharing can’t hurt the small fish. Almost nobody sets out to beat Daniel Lemire at some conference next year. I have no pursuer. And guess what? You probably don’t. But if you do, you are probably doing quite well already, so stop worrying. Yes, yes, they will give you a grant even if you don’t actively sabotage your competitors. Relax already!"

The big fish (and I've worked for them) don't, and it's likely they got that way by protecting their turf. Science is cut throat.

--
46 & 2

Re:Bad quote by smallfries · 2013-01-29 20:23 · Score: 1

Yes - posts written in the middle of the night are not entirely coherent :)

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php

There is no incentive by muecksteiner · 2013-01-29 20:24 · Score: 2

This guy, who wrote an extremely useful and powerful piece of OSS software that is widely used in the graphics community, said it very well in his blog:

http://meshlabstuff.blogspot.com/2010/03/assessing-open-source-software-as.html/

Basically, you are an idiot if you invest any time at all in such things. Papers are all that count. OSS software? You wrote something that hundreds of other researchers depend on for their daily work? Get lost, that professorship goes to someone else. Someone else who was a Real Man, and wrote Papers! Lots of them!

Here's hoping. by mutube · 2013-01-29 22:55 · Score: 1

I've developed quite a bit of code in the process of my PhD that I'm in the process of open-sourcing on github (backed with a website here I've developed with open-access scientific protocols - no code there yet; getting clearance).

As others have mentioned the big anti- to this is the problem of publication. If I put my software up there free to use, there is nothing to stop someone else swooping in and using it to pre-empt the results I've spent time writing the software to accomplish (I'm helped slightly by working an obscure angle on an equally obscure field). Further, opening software up to outside contributions opens all sorts of issues with authorship, credit, etc. As it stands I can publish a paper on the software with my name on it - but if I had 20 or so contributors are they all going to want their names on there? All solvable problems - there is typically a threshold of contribution for acknowledgement; and the fact of contribution is preserved right there in the git log. But not something most people want to spend time thinking about.

There is also a slight over-enthusiasm for patents - the first reaction I get to showing off my software is "you should patent it!" on the idea that I would get stinking rich. That's unrealistic for most software, but open-sourcing it immediately scuppers that as a future possibility. When you're funded via various grant organisations it can get more complex (everyone has to agree). I'm lucky enough to be funded by the Wellcome Trust who are pro open-access - and I'm hoping that will translate to pro open-source too.

--
Python coder | PyQt Applications | Writer

What we do by prefec2 · 2013-01-30 01:36 · Score: 1

We release all of our code (if not written for a company) used in our projects, or created during research. However, we are a software engineering group, and computer scientists more often open source their work. In recent years it has become mandatory to do so, as otherwise your claims are not backed. If you publish results and do not provide the means to reproduce the results, you are a blabber.

But, most of code produced in research is lousy, as it is just used to proof something not to actually use it. If you want to produce better quality code and documentation, the artifacts must be used by others and you have to start to incorporate agile concepts to develop the tools.

Re:Bad quote by gerddie · 2013-01-30 11:06 · Score: 1

"What the author of the article fails to understand is that software is not the point of research - it is a side-effect, and I say that as someone whose field is CS."

(disclaimer: I am working as a postdoc for some US university)

The article in general is clueless. You are of course right. Researchers don't care about their code. I want to know if a design work, if an algorithm work or if it does not. That's why I end up writing code. Once my report/paper/thesis/grant application is written I do not care about the software anymore.

Well, there's s always the CRAPL license that was made for exactly this kind of source code release, and IMNSHO publishing the source code with the paper should be a must, because it's only science if it is reproducible. I work in image processing and more often then not, papers are missing parameters, the description of the implementation is ambiguous, and as a result just reproducing the result of such a paper is impossible without contacting the authors. (The data used is yet another story.) I do not care if the code is production ready of if I would have to rewrite it from scratch, if at least could have a look at the tweaks that are not in the paper because the authors didn't deem them important enough and the reviewers didn't notes that the published algorithms are not really reproducible - or worse, the reviewers told the authors that "these are standard filters, so there is no need to publish the parameters".

I don't know what academic world you guys are in.. by Gideon+Fubar · 2013-01-30 11:16 · Score: 1

...but writing OSS software for education and academia and distributing it to other universities around the country and world has been my job for the last few years.

--
http://www.xkcd.com/354/

Re:Bad quote by godrik · 2013-01-30 11:59 · Score: 1

I completely agree with you. And I try to publish code as often as I can. Though, I do not believe the original article is about getting the code out.

I think the original article is getting the code in the shape where it can be reused and built upon in the same way open source software is. Any code released under CRAPL is probably not in a shape where it can reasonably be built upon. Most of the code I published are not in a very good shape.

To All Grantmakers On Copyright & Post-Scarcit by Paul+Fernhout · 2013-01-30 15:21 · Score: 1

http://www.pdfernhout.net/open-letter-to-grantmakers-and-donors-on-copyright-policy.html
"Summary: Foundations, other grantmaking agencies handling public tax-exempt dollars, and charitable donors need to consider the implications for their grantmaking or donation policies if they use a now obsolete charitable model of subsidizing proprietary publishing and proprietary research. In order to improve the effectiveness and collaborativeness of the non-profit sector overall, it is suggested these grantmaking organizations and donors move to requiring grantees to make any resulting copyrighted digital materials freely available on the internet, including free licenses granting the right for others to make and redistribute new derivative works without further permission. It is also suggested patents resulting from charitably subsidized research research also be made freely available for general use. The alternative of allowing charitable dollars to result in proprietary copyrights and proprietary patents is corrupting the non-profit sector as it results in a conflict of interest between a non-profit's primary mission of helping humanity through freely sharing knowledge (made possible at little cost by the internet) and a desire to maximize short term revenues through charging licensing fees for access to patents and copyrights. In essence, with the change of publishing and communication economics made possible by the wide spread use of the internet, tax-exempt non-profits have become, perhaps unwittingly, caught up in a new form of "self-dealing", and it is up to donors and grantmakers (and eventually lawmakers) to prevent this by requiring free licensing of results as a condition of their grants and donations."

--
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.

Slashdot Mirror

How Open Source Could Benefit Academic Research

63 of 84 comments (clear)