Slashdot Mirror


How Open Source Could Benefit Academic Research

dp619 writes "Ross Gardler, of Apache Fame, has written a guest post on the Outercurve Foundation blog advocating that universities accelerate the research process through a collaborative sharing and development of research software while examining reasons why many have been reluctant to publish their source code. Quoting: 'These highly specialized software solutions are not rarely engineered for reuse. They are often hacks to answer a specific question quickly. ... What many academic researchers fail to understand is that this specialization problem is not unique to research projects. Most software developers will seek to provide an adequate solution to their specific problem, as quickly as possible. They don't seek to build a perfect, all-purpose, tool set that can be reused in every conceivable circumstance. They simply solve the problem at hand and move on to the next one. The difference is that open source developers will do this incremental problem solving using shared code. They will share that code in incremental steps rather than wait until they've built the complete system they need but is too specific for others to use. Other people will reuse and improve on the initial solution, perhaps generalizing it a little in the process. There is no need to share the details of why one needs a 'green widget' nor is there any reason to prevent someone modifying it so it can be either a 'green widget' or a 'blue widget.'"

13 of 84 comments (clear)

  1. Sometimes publishing code loses you papers by WillAffleckUW · · Score: 2

    Sometimes, when you publish the code you used to develop new Biochemistry or Genetics solutions, you find that other scientists in other countries use your code to reverse engineer what you are working on - your results, if you will - to eliminate dead ends and publish a paper on what you invested years finding a solution for, but before you submit your paper that they "effectively" stole.

    We had that happen when we deposited ligand results a few times, until we learned to stop submitting such things until AFTER we were approved for print.

    This is one reason for hesitancy that I can agree to. Just because I wrote code, doesn't mean I want you to have it, if I haven't published the end result.

    After it's in print, you're welcome to have the code. Not before.

    --
    -- Tigger warning: This post may contain tiggers! --
    1. Re:Sometimes publishing code loses you papers by the+gnat · · Score: 2

      Submission of biochemical structures too early, prior to publication, can also result in similar things.

      I can also confirm that this happens occasionally. Biologists are also reluctant to describe unpublished results at meetings unless the article is already accepted and scheduled for publication - it's a real shame, but I understand why they're reluctant. However, there is no requirement that they reveal any data before publishing; it's not like every crystal structure automatically goes into the PDB before they even get a chance to write (which really would be catastrophic).

  2. many already do this by call+-151 · · Score: 2

    There are many open-source research software efforts already, and of course it would be good to see this become more widespread. These range from small-scale individual researcher one-off efforts to broad multi-institution efforts that are well-maintained over years. The software that I develop in the course of my mathematical research is available freely from our webpages, with intermittent downloads. And I still get inquiries about using it, to which I just say that it's all on our webpages already.

    One barrier to broader efforts in the US is that science agencies (at least the National Science Foundation) generally support research proper, rather than development of tools. Oddly, I am much more likely to get a grant to work out research that perhaps 20 to 50 people may be interested in than I am to get a grant to develop research tools that may be useful in furthering research to a few hundred researchers. Nevertheless, it is more common that universities and funding agencies expect data and software from research to be freely available. Many people drag their feet on these requirements as they are worried that some other researchers will use their tools to scoop them, but I think these instances are very rare.

    --
    It's psychosomatic. You need a lobotomy. I'll get a saw.
  3. I am a scientist who has made "code" by brillow · · Score: 5, Insightful

    The software I have written for my odd specialized purposes is similar to the software my colleagues write: It's spaghetti code written with custom libraries which are not better than common ones and it has no documentation at all.

    We could open-source it, but then you'd just bitch about how poorly its constructed.

    We don't have time to open-source our code. Heck, I've had people ask to use software I've made and I've regretted giving it to them because I then am obligated to explain to them how to use it.

    1. Re:I am a scientist who has made "code" by the+gnat · · Score: 3, Insightful

      I've had people ask to use software I've made and I've regretted giving it to them because I then am obligated to explain to them how to use it.

      As someone who writes academic software specifically for distribution, I can confirm that this is a gigantic time suck, and one which the funding agencies generally do not support. We are judged both on scientific innovation and publication record, and on whether our tools are adopted by the community - but the latter frequently interferes with the former. I basically wake up to an inbox full of bug reports and feature requests every morning, and I have to find time to deal with these in addition to all of the actual science I'm supposed to be working on. Despite being an obvious sign of success (people actually use our software!), it's become so discouraging that it helped drive out one of my (very competent) ex-coworkers.

    2. Re:I am a scientist who has made "code" by the+gnat · · Score: 2

      You're doing it wrong then. Just because you release source doesn't mean you have to maintain it.

      When I say "users", I do not mean "other programmers", I mean scientists who generally don't know a fucking thing about programming, except maybe rudimentary FORTRAN (which is not what I use), and are busy with their own research which does not leave them any time to fix other peoples' software. They are utterly incompetent to maintain our code for us, and the only people besides us who are qualified are our competitors, who are either too busy with their own projects, or wouldn't pour water on us if we were on fire. Who else are the users going to send email to when something breaks, if not us?

      In fact much of our code really is open-source and available on the web, so anyone who wanted to fix it would be welcome to. In practice we have a few external developers whom we work with, who have been very valuable - but the bulk of user support has to be done by us. Your response indicates that you've never had to support a non-technical user, because if you did, you'd realize what a clusterfuck it is.

  4. This is happening in statistics... by djmurdoch · · Score: 2

    The Journal of Statistical Software is an electronic journal that publishes software. It tends to publish R packages because that's where the development is mostly happening these days, but it will publish any language. The refereeing process checks that the software works as well as that it is a good contribution. It has a reasonable reputation, far above the junk journals on Beall's list (Google it if you don't know what that is), though not as high as the better mathematical journals in the area. The R Journal has a similar goal, but it's newer, and the reputation isn't there yet.

    I review grants, and I give a lot more credit to software published somewhere like JSS or the R Journal than to software available on someone's web site.

    So some academics do get credit for this.

  5. I blame the Bayh-Dole act by the+gnat · · Score: 2

    Most academics are under tremendous pressure to keep anything of potential commercial value closed; releasing code as open-source generally requires permission from above. (In fact, I know of one professor of biology who had to fight to get a line in his contract explicitly allowing him to open-source everything.) And it's not like most of them need encouragement; none of us are getting rich off NIH grants (well, most of us aren't) and we effectively hit a salary ceiling early in our careers, so the prospect of a few thousand dollars extra in licensing revenue is more than most can resist. In several cases that I'm aware of, the licensing money is used to support research activities - sometimes enough to pay for an entire employee, or pay for meetings that wouldn't happen otherwise. Note that in many cases the code itself is still available, just not under a license that allows distribution, which usually makes it difficult or impossible for anyone who wants to build on your work to do so.

    Of course it's not always this simple - junior researchers have very little control, so many of us end up releasing code under proprietary licenses when we'd much rather open-source everything. I also know of many cases where paranoia and competitiveness, rather than avarice, are at fault - in these cases, the code itself is hidden and the software released as binary-only (which as far as I'm concerned should be unacceptable for anything published in a peer-reviewed journal, regardless of the license used). Regardless, there are simply too many incentives to retain full control.

    This is a completely idiotic situation, of course, and it has been holding back science for years - I know of multiple cases where university researchers were effectively doing R&D for private companies (not always willingly!) with very little in return. I've also seen researchers prevent widespread adoption of their work (and hamper their career advancement) because of tight-fisted behavior. One asshole even charges other academics to obtain his software, with the result that some people avoid using it altogether. Frankly, since I have to deal with this bullshit on a near-daily basis, as far as I'm concerned a repeal of the Bayh-Dole act (and its equivalents in Europe), at least where software is concerned, would be a huge leap forward for academic computational research. The bonus I get from licensing fees is simply not worth the trouble and missed opportunities.

  6. Re:Bad quote by Stewie241 · · Score: 2

    So for clarification, I think you missed the point of what the GP was trying to say. The statement in the summary as written suggests that highly specialized software solutions are commonly engineered for reuse.

    Based on the context of the summary, it should probably say either:
    1. These highly specialized software solutions are not engineered for reuse.

    or

    2. These highly specialized software solutions are rarely engineered for reuse.

  7. Re:Bad quote by godrik · · Score: 2

    "What the author of the article fails to understand is that software is not the point of research - it is a side-effect, and I say that as someone whose field is CS."

    (disclaimer: I am working as a postdoc for some US university)

    The article in general is clueless. You are of course right. Researchers don't care about their code. I want to know if a design work, if an algorithm work or if it does not. That's why I end up writing code. Once my report/paper/thesis/grant application is written I do not care about the software anymore.

    I'd love to produce proper software. But most researchers do not have a clue how to make good software. Software engineering is not our job. We typically do not know how to do *really* good software. That type of skills is not commonly found in grad students. You'll need a postdoc or a professor to do it well. PhD time is valuable, it *is* worth a lot of money. None of the money that come from grants pays for software development. Even if it was, my career would certainly advance more if I do research instead of software. (With occasional exception like "This is the holy grail. We need it done well.")
    The only other option would be to pay a software engineer. Grants typically do not cover that. Some do, but most don't.
    The final option would be to get somebody else to cover the software development cost. That can happen, but that's very rare. You'll need to find a company that need the proper edge the software will bring, that actually want to work with the academia and that is ok publishing the source code (so potentially losing the edge the project bring them.) That can happen, but do not count on it.

    Finally, even assuming there is a useful software framework close to something I am interested in. What will be the investment cost for me to get in that software. Recently I was looking at Android programming for adding a calendar type. That stuff is ridiculously complicated with dozens of concepts and objects and all. And I am talking about a freaking calendar. All encompassing software tends to an overly engineered design. If it takes me more time to get into the software than getting my job done, why should I use it?

  8. Depends on the Field by Roger+W+Moore · · Score: 2

    None of the money that come from grants pays for software development. Even if it was, my career would certainly advance more if I do research instead of software.

    This depends on the field. In particle physics where we have massive computational challenges grants can specifically fund software development. In fact when I was a grad student there were even permanent positions called physics programmers and software development certainly can be very good for your career as long as it is combined with physics analysis - at least it has not hurt me so far. As for "needing a postdoc or a professor to do it well" I very much beg to differ - and I say that as a professor! Programming skills vary considerably at all levels but good grad students, while lacking experience, can be a step or two ahead in terms of modern programming savvy than their older colleagues who are sometimes prone to the FORTRAN++ coding style!

    Once my report/paper/thesis/grant application is written I do not care about the software anymore.

    Again this varies by field. Monte-carlo simulators for particle physics have a life well beyond any one project and in fact can be projects in themselves. In fact you are reading this page using a software technology developed at CERN to assist particle physics research - the world wide web. So even if you don't care about it anymore sometimes software developed for research can be amazingly useful outside that research.

  9. The small fish do share. by pigwiggle · · Score: 2

    "Sharing can’t hurt the small fish. Almost nobody sets out to beat Daniel Lemire at some conference next year. I have no pursuer. And guess what? You probably don’t. But if you do, you are probably doing quite well already, so stop worrying. Yes, yes, they will give you a grant even if you don’t actively sabotage your competitors. Relax already!"

    The big fish (and I've worked for them) don't, and it's likely they got that way by protecting their turf. Science is cut throat.

    --
    46 & 2
  10. There is no incentive by muecksteiner · · Score: 2

    This guy, who wrote an extremely useful and powerful piece of OSS software that is widely used in the graphics community, said it very well in his blog:

    http://meshlabstuff.blogspot.com/2010/03/assessing-open-source-software-as.html/

    Basically, you are an idiot if you invest any time at all in such things. Papers are all that count. OSS software? You wrote something that hundreds of other researchers depend on for their daily work? Get lost, that professorship goes to someone else. Someone else who was a Real Man, and wrote Papers! Lots of them!