Slashdot Mirror


Mozilla Plan Seeks To Debug Scientific Code

ananyo writes "An offshoot of Mozilla is aiming to discover whether a review process could improve the quality of researcher-built software that is used in myriad fields today, ranging from ecology and biology to social science. In an experiment being run by the Mozilla Science Lab, software engineers have reviewed selected pieces of code from published papers in computational biology. The reviewers looked at snippets of code up to 200 lines long that were included in the papers and written in widely used programming languages, such as R, Python and Perl. The Mozilla engineers have discussed their findings with the papers’ authors, who can now choose what, if anything, to do with the markups — including whether to permit disclosure of the results. But some researchers say that having software reviewers looking over their shoulder might backfire. 'One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code,' says biostatistician Roger Peng at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. 'We need to get more code out there, not improve how it looks.'"

28 of 115 comments (clear)

  1. Wrong objective. by smart_ass · · Score: 5, Insightful

    I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.

    Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?

    Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.

    But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.

    --
    Ouch ... did I just say that.
    1. Re:Wrong objective. by Anonymous Coward · · Score: 4, Informative

      The problem is most papers do not publish the code, only the results. This causes dozens of problems: if you want to run their code on a different instance you can't, if you want to run it on different hardware you can't, if you want to compare it with yours you only sort of can since you have to either reimplement their code or run yours on a different environment than theirs, which makes comparisons difficult. Oh, and it makes verifying the results even more worse, but it isn't like many people try to verify anything.

      On the one hand catching bugs can help find a conclusion was wrong sooner than it would happen otherwise. On the other hand it may make it less likely that authors will put their code out there. Anyhow, I think it's a good idea and worth a shot. Who knows, maybe it'll end up helping a lot.

    2. Re:Wrong objective. by dcollins · · Score: 4, Insightful

      Yeah, it seems like the real objective should be to get more code read and verified as part of the scientific process. (Just "getting more code out there" and expecting it to go unread would be pretty empty.)

      One problem is that the publish-or-perish process has gotten sufficiently corrupt that many results are irreproducible, PhD students are warned against trying to reproduce results, and everyone involved has lost the expectation that their work will be experimentally double-checked.

      --
      We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
    3. Re: Wrong objective. by icebike · · Score: 3, Insightful

      Well running The ORIGINAL author's code isn't that important.

      What's important is the analysis that the code was supposed to do.

      Describing that in mathematical terms and letting anyone trying to replicate the research is better than handing the original code forward. That's just passing another potential source of error forward.

      Most of the (few) research projects I been called to help with coding on are strictly package runners. Only a one had anything approaching custom software, and it was a mess.

      --
      Sig Battery depleted. Reverting to safe mode.
    4. Re: Wrong objective. by ralphbecket · · Score: 4, Insightful

      I have to disagree. Before I go to a heap of effort reproducing your experiment, I want to check that the analysis you ran was the one you described in your paper. After I've convinced myself that you haven't made a mistake here, I may then go and try your experiment on new data, hopefully thereby confirming or invalidating your claims. Indeed, by giving me access to your code you can't then claim that I have misunderstood you if I do obtain an invalidating result.

    5. Re:Wrong objective. by mwvdlee · · Score: 4, Insightful

      I think that's exactly the opposite of the point the GP was trying to make.

      If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
      If it looks like old COBOL strung together with GO TO's and it works, it's okay.
      If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    6. Re:Wrong objective. by Anonymous Coward · · Score: 4, Interesting

      As a PhD student I am actively encouraged to reproduce results, mostly this has been possible but I know of at least one paper which has been withdrawn because my supervisor queried their results after we failed to reproduce them (I'll be charitable and say it was an honest mistake on their part).

      I guess whether you are encouraged to check others work depends on your university and subject, but in certain areas it Does happen.

    7. Re: Wrong objective. by old+man+moss · · Score: 5, Interesting

      Yes, totally agree. As someone who has tried to reproduce other people's results (in the field of image processing) with mixed success. It can be incredibly time consuming trying to compare techniques which appear to be described accurately in journals, but omit "minor" details of implementation which actually turn out to be critical. I have also had results of my own which seemed odd and were ultimately due to coding errors which inadvertently improved the result. Given the opportunity, I would have published all my academic code.

      --
      rt
    8. Re:Wrong objective. by Macchendra · · Score: 3, Informative

      It is easier to find bugs in code where all of the objects, variables, methods, etc. are named according to their actual purpose. It is easier for other researchers to integrate their own ideas if the code is self documenting. It is easier to integrate with other software if the interfaces are cleanly defined. It is easier to verify the results of intermediate steps if there is proper encapsulation. Also, proper encapsulation reduces the chances of unintended side-effects when data is modified outside of scope.

    9. Re:Wrong objective. by MiniMike · · Score: 2

      All of which are great if code is to be maintained, which this type of code rarely is.

      Not always true, probably not by a long shot. I'm maintaining code written over a span of time beginning in the 1980's (not by me) and last updated yesterday (and again as soon as I'm done here...). Some written very well, some quite the opposite. Not often is scientific code used for just one project, if it's of any significant utility.

    10. Re:Wrong objective. by swillden · · Score: 2

      All of which are great if code is to be maintained, which this type of code rarely is.

      Or if it is re-used, which is one of the potential benefits of publishing it alongside the paper.

      Also, since the purpose of research papers is to transmit ideas, clear, readable code serves readers much better than functional but opaque code... and that assumes the code is actually functional. Ugly code tends to be buggier, precisely because it's harder to understand.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    11. Re:Wrong objective. by ebno-10db · · Score: 3, Insightful

      If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
      If it looks like old COBOL strung together with GO TO's and it works, it's okay.
      If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

      None of the above. It's scientific code. It looks like bad Fortran (or even worse, FORTRAN) from 20 years ago, which is ok, since Fortran 90 is fine for number crunching.

      In all seriousness, my experience is that "Ph.D. types" (for want of a better term) write some of the most amateurish code I've ever seen. I've worked with people whose knowledge and ability I can only envy, and who are anything but ivory tower types, but write code like it was BASIC from a kindergartener (ok, today's kindergarteners probably write better code than in my day). Silly things like magic numbers instead of properly defined constants (and used in multiple places no less!), cut-and-paste instead of creating functions, hideous control structures for even simple things. Ironically, this is despite the fact that number crunching code generally has a simple code structure and simple data structures. I think bad code is part of the culture or something. The downside is that it makes it more likely to have bugs, and very difficult to modify.

      Realistically, this is because they're judged on their results and not their code. To many people here, the code is the end product, but to others it's a means to an end. Better scrutiny of it though would lead to more reliable results. It should be mandatory to release the entire program within, say, 1 year of publication. As for it being obfuscated, intentionally or otherwise, I don't think there's much you can do about that.

  2. Hell Yes! by Garridan · · Score: 5, Insightful

    Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it. Maybe mathematicians are weird like that -- I face stigma for using a computer, so anything I can do to make it look more trustworthy is awesome.

    1. Re:Hell Yes! by JanneM · · Score: 5, Insightful

      Problem is, at least in this trial they're reviewing already published code, when it's too late to gain much benefit from the review on the part of the original writer. A research project is normally time-limited after all; by the time the paper and data is public, the project is often done and people have moved on.

      There's nobody with the time or inclination to, for instance, create and release a new improved version of the code at that point. And unless there's errors which lead to truly significant changes in the analysis, nobody would be willing to publish any kind of amended analysis either.

      --
      Trust the Computer. The Computer is your friend.
    2. Re:Hell Yes! by PsyberS · · Score: 2

      Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it.

      Believe it or not, some computer science programming language conferences are doing *just that*.

      http://cs.brown.edu/~sk/Memos/Conference-Artifact-Evaluation/
      http://ecoop13-aec.cs.brown.edu/
      http://splashcon.org/2013/cfp/665

  3. Don't forget spreadsheets by MrEricSir · · Score: 4, Informative

    As we've seen recently, bad decisions can be made from errors in spreadsheets. We need these published so they can be double-checked as well.

    --
    There's no -1 for "I don't get it."
    1. Re:Don't forget spreadsheets by VortexCortex · · Score: 2

      bad decisions can be made from errors in spreadsheets.

      Oh, If only you knew...

      We need these published so they can be double-checked as well.

      Well, I wouldn't go so far as publishing my findings, but now I always double-check spread sheets when I'm not sure if it is or isn't a ladyboy.

  4. The Horror by Vegemite · · Score: 3, Interesting

    You must be joking. Many scientific papers out there have results based on prototype or proof of concept software written by naive grad students for their advisors. These are largely uncommented hacks with little, if any, sanity checks. To sell these prototypes commercially, I have had to cleanup after some of these grads. I take great sadistic pleasure in throwing out two years of effort and rewriting it all from scratch in a couple of weeks.

  5. If we knew what we were doing... by dargaud · · Score: 2

    ...it wouldn't be called research now does it ? Seriously manu scientific projects start with a vague idea and no funds. You do a table experiment, connect it to a 15 year old computer, then grow from there. In some projects I got no more than a quarter page of specifications for what ended up as 30 thousand lines of code. Yes I write scientific code, and no it's not always pretty and refactored and all that. Also there's never any money.

    --
    Non-Linux Penguins ?
  6. Egoless programming by Anonymous Coward · · Score: 2, Interesting

    Back in the late 70s middle ages of comp sci...
    There was this thing called "egoless programming" being taught. The idea being that we have to inculcate in developers the idea that your code is not necessarily a reflection of your personal worth, and that it deserves to be poked at and prodded, and that you should not take personal offense by it.

    Yeah, it's a child of the 60s kind of thing, but it does work.

    This is a huge challenge in the biomedical research field, because to be successful, you need personality traits like a strong ego (yes, *I* am brilliant, and my idea is the best, and you should fund it, and not that other bozo).

  7. researcher vs. software developer by Anonymous Coward · · Score: 5, Informative

    People doing scientific research and software developers are really doing very different things when they write code. For software developers or software engineers, the code is the end goal. They are building a product that they are going to give to others. It should be intuitive to use, robust, produce clear error messages, and be free of bugs and crashes. The code is the product. For someone doing scientific or engineering research, the end goal is the testing an idea, or running an experiment. The code is a means to an end, not the end itself; it needs only to support the researcher, it only needs to run once, and it only needs to be bug free in the cases that are being explored. The product is a graph or chart or sentence describing the results that is put into a paper that gets published; the code itself is just a tool.

    When I got my Ph.D. in the 1990s, I didn't understand this, and it brought be a lot of grief when I went to a research lab and interacted with software developers and managers, who didn't understand this either. The grief comes about because of the different approaches used during the development of each type of code. Software developers describe their process variously as a waterfall model, agile development model, etc.. These processes describe a roadmap, with milestones, and a set of activities that visualize the project at its end, and lead towards robust software development. The process a researcher uses is related to the scientific method: based on the question, they formulate a hypothesis, create an experiment, test it, observe the results, and then ask more questions. They do not always know how things will turn out, and they build their path as they go along. Very often, the equivalent "roadmap" in a researchers mind is incomplete and is developed during the process, because this is part of what is being explored.

    In my organization, this makes tremendous conflict between software developers, who want a careful, process driven model to produce robust code, and researchers, who are seeking to answer more basic questions and explore unknown territory in a way that has a great deal of uncertainty and cannot always easily deliver specific milestones and clarity into schedule that is often desired.

    It is worse when the research results in a useful algorithm; of course, the researcher often wants to make it available to the world so that others can use it. This is more of a grey area; if the researcher knows how to do software engineering, they may go through the process to create a more robust product, but this takes effort and time. The fact that Mozilla wants to help debug scientific code is a very good thing; it often needs more serious debugging and re-architecting than other software that is openly available.

    I wish more people understood this difference.

  8. Looking over the shoulder by glennrrr · · Score: 2

    I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.

    1. Re:Looking over the shoulder by biodata · · Score: 2

      this^1000

      --
      Korma: Good
    2. Re:Looking over the shoulder by tlhIngan · · Score: 2

      I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.

      Scientists and researchers generally write lousy code. If you think TheDailyWTF is bad, you haven't seen researcher code.

      Generally write-only, lots of copy-pasta going on, variables that *might* make sense (and probably declared globally) and if you're really lucky, lack of subroutines or functions.

      Hell, the code itself may only compile on one specific machine - the researcher's - due to hidden dependencies, version issues, etc. And may even involve a lot of convoluted mechanisms involving formatting the input, sending it through one program, then taking the output, reformatting it (manually, of course) and shoving it through a second program with a different script, etc.

      Code is generally secondary to the actual research at hand - it's done to facilitate the analysis by being quicker

      If you're really lucky, the researcher would've actually done an analysis manually to verify the program(s) actually work properly.

  9. any review may find off-by-one, etc. by raymorris · · Score: 2

    Having ANY second programmer look at the code may well find off-by-one or fence post errors and the like.

  10. Re:Good intentions, bad implementation by Anonymous Coward · · Score: 2, Insightful

    This is a logical fallacy that many 'smart' people fall into. I am smart (in this case usually PhD's or people on their way to it) so this XYZ thing should be no sweat. They seem to forget that they spent 10-15 years becoming very good at whatever they do. Becoming a master of it. Yet somehow they also believe they can use this mastery on other things. In some very narrow cases you can do this. But many times you can not. Or even worse assuming no one else can understand what you are doing or they will 'get it wrong'.

    When the right thing to do is find another master in that other field. Even that is dangerous. You will also see many out there who then follow in the footsteps of these 'know it all' masters. Yelling the word 'science' at anyone who disagrees. Disagreeing is not because they think you are wrong (maybe you are), but because they do not understand.

    In this case writing code is *easy*, writing good code takes work. Even those who are masters at it make mistakes. We call them bugs. Even when you are good at it you still work at making it correct, even if you do it just because you have 'been there'. There are whole books out there on anti-patterns, patterns, development style, code philosophy, etc. From my POV it usually takes someone about 2 years to become somewhat 'ok' at programming. Somewhere in the 5-10 year mark they become masters. Then that is if they do it every day.

  11. The Other Edge of the Sword by fygment · · Score: 4, Interesting

    Roger Peng's comment shows a typical, superficial understanding of programming. Ironically, he would be the first to condemn a computer scientist/coder who ventured in to biostatistics with a superficial knowledge of biology. I believe he would feel that anyone can program, but not anyone can do biostatistics. And I deeply disagree. Tools have been provided so that _any_ scientist can code. That does not mean that they understand coding or computer science.

    I have personally experienced that especially in the softer sciences like biology, economy, meteorology, etc., the scientists have absolutely no desire to learn any computer science: coding methodology, testing, complexity, algorithms, etc. The result is kludgy, inefficient code heavily dependent on pre-packaged modules, that produces results that are often a guess; the code produces results but with a lack of any understanding of what the various packaged routines are doing or whether they are appropriate for the task. For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong. It is the same as someone approaching engineering without some understanding of thermodynamics and as a result wasting their time trying to construct a perpetual motion machine.

    --
    "Consensus" in science is _always_ a political construct.
    1. Re:The Other Edge of the Sword by umafuckit · · Score: 3, Informative

      For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong.

      I'm a biologist who learned enough computational stats to get by and I do see what you mean. Initially I did do stuff like that, but over time I put in the effort to learn what's going on and now I hope I make these sorts of dumb mistakes a lot less often! However this is not so much a coding problem, but a stats problem. People in the "soft sciences" don't just have problems with more advanced stuff such as PCA, ICA, clustering, etc, but even simple stats. For example, it's very common to see ANOVA performed on data that would be much better suited to regression analysis. The concept of fitting a line or curve and extracting meaning from the coefficients is rather foreign to a lot of biologists, who are more comfortable with a table full of p-values. Indeed, there is a general fixation on p-values, despite the fact that these are not well understood. There is a tendency to hide raw data (since biological data are often noisy). There is also a tendency to use analyses such as PCA or hierarchical clustering simply to produce fancy plots to blind reviewers; these plots often add no insight (or the insight they might add is not explored).