How Open Source Could Benefit Academic Research
dp619 writes "Ross Gardler, of Apache Fame, has written a guest post on the Outercurve Foundation blog advocating that universities accelerate the research process through a collaborative sharing and development of research software while examining reasons why many have been reluctant to publish their source code. Quoting: 'These highly specialized software solutions are not rarely engineered for reuse. They are often hacks to answer a specific question quickly. ... What many academic researchers fail to understand is that this specialization problem is not unique to research projects. Most software developers will seek to provide an adequate solution to their specific problem, as quickly as possible. They don't seek to build a perfect, all-purpose, tool set that can be reused in every conceivable circumstance. They simply solve the problem at hand and move on to the next one. The difference is that open source developers will do this incremental problem solving using shared code. They will share that code in incremental steps rather than wait until they've built the complete system they need but is too specific for others to use. Other people will reuse and improve on the initial solution, perhaps generalizing it a little in the process. There is no need to share the details of why one needs a 'green widget' nor is there any reason to prevent someone modifying it so it can be either a 'green widget' or a 'blue widget.'"
Sometimes, when you publish the code you used to develop new Biochemistry or Genetics solutions, you find that other scientists in other countries use your code to reverse engineer what you are working on - your results, if you will - to eliminate dead ends and publish a paper on what you invested years finding a solution for, but before you submit your paper that they "effectively" stole.
We had that happen when we deposited ligand results a few times, until we learned to stop submitting such things until AFTER we were approved for print.
This is one reason for hesitancy that I can agree to. Just because I wrote code, doesn't mean I want you to have it, if I haven't published the end result.
After it's in print, you're welcome to have the code. Not before.
-- Tigger warning: This post may contain tiggers! --
There are many open-source research software efforts already, and of course it would be good to see this become more widespread. These range from small-scale individual researcher one-off efforts to broad multi-institution efforts that are well-maintained over years. The software that I develop in the course of my mathematical research is available freely from our webpages, with intermittent downloads. And I still get inquiries about using it, to which I just say that it's all on our webpages already.
One barrier to broader efforts in the US is that science agencies (at least the National Science Foundation) generally support research proper, rather than development of tools. Oddly, I am much more likely to get a grant to work out research that perhaps 20 to 50 people may be interested in than I am to get a grant to develop research tools that may be useful in furthering research to a few hundred researchers. Nevertheless, it is more common that universities and funding agencies expect data and software from research to be freely available. Many people drag their feet on these requirements as they are worried that some other researchers will use their tools to scoop them, but I think these instances are very rare.
It's psychosomatic. You need a lobotomy. I'll get a saw.
The software I have written for my odd specialized purposes is similar to the software my colleagues write: It's spaghetti code written with custom libraries which are not better than common ones and it has no documentation at all.
We could open-source it, but then you'd just bitch about how poorly its constructed.
We don't have time to open-source our code. Heck, I've had people ask to use software I've made and I've regretted giving it to them because I then am obligated to explain to them how to use it.
The Journal of Statistical Software is an electronic journal that publishes software. It tends to publish R packages because that's where the development is mostly happening these days, but it will publish any language. The refereeing process checks that the software works as well as that it is a good contribution. It has a reasonable reputation, far above the junk journals on Beall's list (Google it if you don't know what that is), though not as high as the better mathematical journals in the area. The R Journal has a similar goal, but it's newer, and the reputation isn't there yet.
I review grants, and I give a lot more credit to software published somewhere like JSS or the R Journal than to software available on someone's web site.
So some academics do get credit for this.
Most academics are under tremendous pressure to keep anything of potential commercial value closed; releasing code as open-source generally requires permission from above. (In fact, I know of one professor of biology who had to fight to get a line in his contract explicitly allowing him to open-source everything.) And it's not like most of them need encouragement; none of us are getting rich off NIH grants (well, most of us aren't) and we effectively hit a salary ceiling early in our careers, so the prospect of a few thousand dollars extra in licensing revenue is more than most can resist. In several cases that I'm aware of, the licensing money is used to support research activities - sometimes enough to pay for an entire employee, or pay for meetings that wouldn't happen otherwise. Note that in many cases the code itself is still available, just not under a license that allows distribution, which usually makes it difficult or impossible for anyone who wants to build on your work to do so.
Of course it's not always this simple - junior researchers have very little control, so many of us end up releasing code under proprietary licenses when we'd much rather open-source everything. I also know of many cases where paranoia and competitiveness, rather than avarice, are at fault - in these cases, the code itself is hidden and the software released as binary-only (which as far as I'm concerned should be unacceptable for anything published in a peer-reviewed journal, regardless of the license used). Regardless, there are simply too many incentives to retain full control.
This is a completely idiotic situation, of course, and it has been holding back science for years - I know of multiple cases where university researchers were effectively doing R&D for private companies (not always willingly!) with very little in return. I've also seen researchers prevent widespread adoption of their work (and hamper their career advancement) because of tight-fisted behavior. One asshole even charges other academics to obtain his software, with the result that some people avoid using it altogether. Frankly, since I have to deal with this bullshit on a near-daily basis, as far as I'm concerned a repeal of the Bayh-Dole act (and its equivalents in Europe), at least where software is concerned, would be a huge leap forward for academic computational research. The bonus I get from licensing fees is simply not worth the trouble and missed opportunities.
So for clarification, I think you missed the point of what the GP was trying to say. The statement in the summary as written suggests that highly specialized software solutions are commonly engineered for reuse.
Based on the context of the summary, it should probably say either:
1. These highly specialized software solutions are not engineered for reuse.
or
2. These highly specialized software solutions are rarely engineered for reuse.
"What the author of the article fails to understand is that software is not the point of research - it is a side-effect, and I say that as someone whose field is CS."
(disclaimer: I am working as a postdoc for some US university)
The article in general is clueless. You are of course right. Researchers don't care about their code. I want to know if a design work, if an algorithm work or if it does not. That's why I end up writing code. Once my report/paper/thesis/grant application is written I do not care about the software anymore.
I'd love to produce proper software. But most researchers do not have a clue how to make good software. Software engineering is not our job. We typically do not know how to do *really* good software. That type of skills is not commonly found in grad students. You'll need a postdoc or a professor to do it well. PhD time is valuable, it *is* worth a lot of money. None of the money that come from grants pays for software development. Even if it was, my career would certainly advance more if I do research instead of software. (With occasional exception like "This is the holy grail. We need it done well.")
The only other option would be to pay a software engineer. Grants typically do not cover that. Some do, but most don't.
The final option would be to get somebody else to cover the software development cost. That can happen, but that's very rare. You'll need to find a company that need the proper edge the software will bring, that actually want to work with the academia and that is ok publishing the source code (so potentially losing the edge the project bring them.) That can happen, but do not count on it.
Finally, even assuming there is a useful software framework close to something I am interested in. What will be the investment cost for me to get in that software. Recently I was looking at Android programming for adding a calendar type. That stuff is ridiculously complicated with dozens of concepts and objects and all. And I am talking about a freaking calendar. All encompassing software tends to an overly engineered design. If it takes me more time to get into the software than getting my job done, why should I use it?
None of the money that come from grants pays for software development. Even if it was, my career would certainly advance more if I do research instead of software.
This depends on the field. In particle physics where we have massive computational challenges grants can specifically fund software development. In fact when I was a grad student there were even permanent positions called physics programmers and software development certainly can be very good for your career as long as it is combined with physics analysis - at least it has not hurt me so far. As for "needing a postdoc or a professor to do it well" I very much beg to differ - and I say that as a professor! Programming skills vary considerably at all levels but good grad students, while lacking experience, can be a step or two ahead in terms of modern programming savvy than their older colleagues who are sometimes prone to the FORTRAN++ coding style!
Once my report/paper/thesis/grant application is written I do not care about the software anymore.
Again this varies by field. Monte-carlo simulators for particle physics have a life well beyond any one project and in fact can be projects in themselves. In fact you are reading this page using a software technology developed at CERN to assist particle physics research - the world wide web. So even if you don't care about it anymore sometimes software developed for research can be amazingly useful outside that research.
"Sharing can’t hurt the small fish. Almost nobody sets out to beat Daniel Lemire at some conference next year. I have no pursuer. And guess what? You probably don’t. But if you do, you are probably doing quite well already, so stop worrying. Yes, yes, they will give you a grant even if you don’t actively sabotage your competitors. Relax already!"
The big fish (and I've worked for them) don't, and it's likely they got that way by protecting their turf. Science is cut throat.
46 & 2
This guy, who wrote an extremely useful and powerful piece of OSS software that is widely used in the graphics community, said it very well in his blog:
http://meshlabstuff.blogspot.com/2010/03/assessing-open-source-software-as.html/
Basically, you are an idiot if you invest any time at all in such things. Papers are all that count. OSS software? You wrote something that hundreds of other researchers depend on for their daily work? Get lost, that professorship goes to someone else. Someone else who was a Real Man, and wrote Papers! Lots of them!