Slashdot Mirror


Mozilla Plan Seeks To Debug Scientific Code

ananyo writes "An offshoot of Mozilla is aiming to discover whether a review process could improve the quality of researcher-built software that is used in myriad fields today, ranging from ecology and biology to social science. In an experiment being run by the Mozilla Science Lab, software engineers have reviewed selected pieces of code from published papers in computational biology. The reviewers looked at snippets of code up to 200 lines long that were included in the papers and written in widely used programming languages, such as R, Python and Perl. The Mozilla engineers have discussed their findings with the papers’ authors, who can now choose what, if anything, to do with the markups — including whether to permit disclosure of the results. But some researchers say that having software reviewers looking over their shoulder might backfire. 'One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code,' says biostatistician Roger Peng at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. 'We need to get more code out there, not improve how it looks.'"

75 of 115 comments (clear)

  1. Wrong objective. by smart_ass · · Score: 5, Insightful

    I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.

    Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?

    Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.

    But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.

    --
    Ouch ... did I just say that.
    1. Re:Wrong objective. by cheater512 · · Score: 1

      Exactly. If the code they are writing looks like bad PHP from 10 years ago then it needs to be exposed.

      What is needed is more *good quality* code being published.

    2. Re:Wrong objective. by Anonymous Coward · · Score: 4, Informative

      The problem is most papers do not publish the code, only the results. This causes dozens of problems: if you want to run their code on a different instance you can't, if you want to run it on different hardware you can't, if you want to compare it with yours you only sort of can since you have to either reimplement their code or run yours on a different environment than theirs, which makes comparisons difficult. Oh, and it makes verifying the results even more worse, but it isn't like many people try to verify anything.

      On the one hand catching bugs can help find a conclusion was wrong sooner than it would happen otherwise. On the other hand it may make it less likely that authors will put their code out there. Anyhow, I think it's a good idea and worth a shot. Who knows, maybe it'll end up helping a lot.

    3. Re:Wrong objective. by dcollins · · Score: 4, Insightful

      Yeah, it seems like the real objective should be to get more code read and verified as part of the scientific process. (Just "getting more code out there" and expecting it to go unread would be pretty empty.)

      One problem is that the publish-or-perish process has gotten sufficiently corrupt that many results are irreproducible, PhD students are warned against trying to reproduce results, and everyone involved has lost the expectation that their work will be experimentally double-checked.

      --
      We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
    4. Re: Wrong objective. by icebike · · Score: 3, Insightful

      Well running The ORIGINAL author's code isn't that important.

      What's important is the analysis that the code was supposed to do.

      Describing that in mathematical terms and letting anyone trying to replicate the research is better than handing the original code forward. That's just passing another potential source of error forward.

      Most of the (few) research projects I been called to help with coding on are strictly package runners. Only a one had anything approaching custom software, and it was a mess.

      --
      Sig Battery depleted. Reverting to safe mode.
    5. Re: Wrong objective. by ralphbecket · · Score: 4, Insightful

      I have to disagree. Before I go to a heap of effort reproducing your experiment, I want to check that the analysis you ran was the one you described in your paper. After I've convinced myself that you haven't made a mistake here, I may then go and try your experiment on new data, hopefully thereby confirming or invalidating your claims. Indeed, by giving me access to your code you can't then claim that I have misunderstood you if I do obtain an invalidating result.

    6. Re:Wrong objective. by mwvdlee · · Score: 4, Insightful

      I think that's exactly the opposite of the point the GP was trying to make.

      If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
      If it looks like old COBOL strung together with GO TO's and it works, it's okay.
      If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    7. Re:Wrong objective. by Anonymous Coward · · Score: 4, Interesting

      As a PhD student I am actively encouraged to reproduce results, mostly this has been possible but I know of at least one paper which has been withdrawn because my supervisor queried their results after we failed to reproduce them (I'll be charitable and say it was an honest mistake on their part).

      I guess whether you are encouraged to check others work depends on your university and subject, but in certain areas it Does happen.

    8. Re:Wrong objective. by K.+S.+Kyosuke · · Score: 1

      Not to mention that the idea of not publishing code is at stark odds with the goal of scientific publication, which is reproducibility: as things depend more and more on the processing SW, papers and datasets aren't enough, you need the code was used to generate the results, otherwise it's irreproducible.

      --
      Ezekiel 23:20
    9. Re: Wrong objective. by old+man+moss · · Score: 5, Interesting

      Yes, totally agree. As someone who has tried to reproduce other people's results (in the field of image processing) with mixed success. It can be incredibly time consuming trying to compare techniques which appear to be described accurately in journals, but omit "minor" details of implementation which actually turn out to be critical. I have also had results of my own which seemed odd and were ultimately due to coding errors which inadvertently improved the result. Given the opportunity, I would have published all my academic code.

      --
      rt
    10. Re:Wrong objective. by Macchendra · · Score: 3, Informative

      It is easier to find bugs in code where all of the objects, variables, methods, etc. are named according to their actual purpose. It is easier for other researchers to integrate their own ideas if the code is self documenting. It is easier to integrate with other software if the interfaces are cleanly defined. It is easier to verify the results of intermediate steps if there is proper encapsulation. Also, proper encapsulation reduces the chances of unintended side-effects when data is modified outside of scope.

    11. Re:Wrong objective. by mwvdlee · · Score: 1

      All of which are great if code is to be maintained, which this type of code rarely is.
      None of which affects whether the code actually works.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    12. Re:Wrong objective. by Macchendra · · Score: 1

      Making bugs visible does affect whether the code actually works. So does making the components testable.

    13. Re:Wrong objective. by MiniMike · · Score: 2

      All of which are great if code is to be maintained, which this type of code rarely is.

      Not always true, probably not by a long shot. I'm maintaining code written over a span of time beginning in the 1980's (not by me) and last updated yesterday (and again as soon as I'm done here...). Some written very well, some quite the opposite. Not often is scientific code used for just one project, if it's of any significant utility.

    14. Re:Wrong objective. by swillden · · Score: 2

      All of which are great if code is to be maintained, which this type of code rarely is.

      Or if it is re-used, which is one of the potential benefits of publishing it alongside the paper.

      Also, since the purpose of research papers is to transmit ideas, clear, readable code serves readers much better than functional but opaque code... and that assumes the code is actually functional. Ugly code tends to be buggier, precisely because it's harder to understand.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    15. Re:Wrong objective. by __aaltlg1547 · · Score: 1

      I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.

      Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?

      Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.

      But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.

      That's one of two worthy objectives. The other is to make the code more suitable for use by other researchers.

    16. Re: Wrong objective. by __aaltlg1547 · · Score: 1

      You had the opportunity. You could have put your code and notes on how to use it and the appendix to your papers.

    17. Re:Wrong objective. by ebno-10db · · Score: 3, Insightful

      If it looks like bad PHP from 10 years ago but contains no bugs, then that is completely okay.
      If it looks like old COBOL strung together with GO TO's and it works, it's okay.
      If it looks like perfect C++ code but contains bugs, the bugs needs to be exposed, especially so if the research results are based on the output of the code.

      None of the above. It's scientific code. It looks like bad Fortran (or even worse, FORTRAN) from 20 years ago, which is ok, since Fortran 90 is fine for number crunching.

      In all seriousness, my experience is that "Ph.D. types" (for want of a better term) write some of the most amateurish code I've ever seen. I've worked with people whose knowledge and ability I can only envy, and who are anything but ivory tower types, but write code like it was BASIC from a kindergartener (ok, today's kindergarteners probably write better code than in my day). Silly things like magic numbers instead of properly defined constants (and used in multiple places no less!), cut-and-paste instead of creating functions, hideous control structures for even simple things. Ironically, this is despite the fact that number crunching code generally has a simple code structure and simple data structures. I think bad code is part of the culture or something. The downside is that it makes it more likely to have bugs, and very difficult to modify.

      Realistically, this is because they're judged on their results and not their code. To many people here, the code is the end product, but to others it's a means to an end. Better scrutiny of it though would lead to more reliable results. It should be mandatory to release the entire program within, say, 1 year of publication. As for it being obfuscated, intentionally or otherwise, I don't think there's much you can do about that.

    18. Re: Wrong objective. by Impy+the+Impiuos+Imp · · Score: 1

      I agree. Code is math, and thus of the experiment and analysis, and is not just an interpretation. "Duplicate it yourself" stands against the very idea of review and reproduction.

      While there is tremendous utility in an independent reconstruction of an algorithm (I have numerous times built a separate chunk of code to calculate something in a completely different way, to test against the real algorithm/code, in practice they debug each other) the actual code needs to be there for review.

      They may have a desire to keep it secret for exclusivity reasons of one type or another (fame, future additional research, money) that can't justify secrecy in normal publication.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    19. Re:Wrong objective. by __aaltlg1547 · · Score: 1

      Ph.D. dissertations require original research. However, assigned classwork for Doctor's and Master's students would be improved if it involved replication and re-analysis of recent research in the field to study methods of data collection and analysis. This would make replication and reexamination of recent research a routine part of academia. The benefits for the students would be seeing how other researchers do their work and practice at methods of analysis and occasionally the satisfaction of showing that the original work was wrong. Also, the demonstration that if they publish bad work, there's a likelihood that it will be discovered by other researchers who will refute their findings.

    20. Re:Wrong objective. by dcollins · · Score: 1

      That is such a great idea. Wish it would happen.

      --
      We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
  2. Hell Yes! by Garridan · · Score: 5, Insightful

    Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it. Maybe mathematicians are weird like that -- I face stigma for using a computer, so anything I can do to make it look more trustworthy is awesome.

    1. Re:Hell Yes! by JanneM · · Score: 5, Insightful

      Problem is, at least in this trial they're reviewing already published code, when it's too late to gain much benefit from the review on the part of the original writer. A research project is normally time-limited after all; by the time the paper and data is public, the project is often done and people have moved on.

      There's nobody with the time or inclination to, for instance, create and release a new improved version of the code at that point. And unless there's errors which lead to truly significant changes in the analysis, nobody would be willing to publish any kind of amended analysis either.

      --
      Trust the Computer. The Computer is your friend.
    2. Re:Hell Yes! by Anonymous Coward · · Score: 1

      There is a reason that models have to be validated. If you choose validation cases well, a code that passes them will almost certainly be a good model. Beyond that, you do the best you really can, and that's that.

      Otherwise, here, I've got 40k lines of code here, anyone want to check it over for me? This is free of charge, right?

    3. Re:Hell Yes! by PsyberS · · Score: 2

      Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it.

      Believe it or not, some computer science programming language conferences are doing *just that*.

      http://cs.brown.edu/~sk/Memos/Conference-Artifact-Evaluation/
      http://ecoop13-aec.cs.brown.edu/
      http://splashcon.org/2013/cfp/665

  3. What is Mozilla? by phantomfive · · Score: 1

    When did Mozilla get a Science Lab? Here I always thought that all the Mozilla foundation made a decent browser, and now I find they have a science lab. What other things does Mozilla do?

    --
    "First they came for the slanderers and i said nothing."
    1. Re:What is Mozilla? by Anonymous Coward · · Score: 1

      A tiddlywinks ballroom, two vending machines and a build-a-squirrel online project. Apparently they have made some attempt at an internet browser too.

    2. Re:What is Mozilla? by sg_oneill · · Score: 1

      Mozilla is a bit like Apache, its a broad tent of vaguelly related projects , its not just firefox.

      --
      Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
    3. Re:What is Mozilla? by jones_supa · · Score: 1

      And of course let's not forget Thunderbird. A very good e-mail client in my opinion.

    4. Re:What is Mozilla? by jopsen · · Score: 1

      Mozlla also does webmaker, education and let's not forget Firefox OS...

    5. Re:What is Mozilla? by ebno-10db · · Score: 1

      they fracking suck at writing clean bug free code, and suck just as much at reviewing it

      Then how come the browser I'm using right now works pretty well?

  4. Software architecture by Anonymous Coward · · Score: 1

    The overall structure of most the code in HEP [1] is nasty. It's too late for the likes of ROOT [2]: input of software engineers at the early stages of code design could be very useful.

    1. https://en.wikipedia.org/wiki/Particle_physics
    2. https://en.wikipedia.org/wiki/Root.cern

  5. Don't forget spreadsheets by MrEricSir · · Score: 4, Informative

    As we've seen recently, bad decisions can be made from errors in spreadsheets. We need these published so they can be double-checked as well.

    --
    There's no -1 for "I don't get it."
    1. Re:Don't forget spreadsheets by Anonymous Coward · · Score: 1

      They should be publishing their code because the basic precept behind peer reviewed publishing is that results could be reproduced. Most of the time they are not but computational scientists need to be constantly reminded that they are performing experiments, not publishing the code is exactly the same as a synthetic chemist not including an experimental section (the procedure for the synthesis).

    2. Re:Don't forget spreadsheets by VortexCortex · · Score: 2

      bad decisions can be made from errors in spreadsheets.

      Oh, If only you knew...

      We need these published so they can be double-checked as well.

      Well, I wouldn't go so far as publishing my findings, but now I always double-check spread sheets when I'm not sure if it is or isn't a ladyboy.

    3. Re:Don't forget spreadsheets by swillden · · Score: 1

      If I made some approximation or used an algorithm that may fall apart in some limits, that is worth mentioning.

      Uh, huh. And what if you don't realize that your code has subtle failings that may have significantly altered your results? Anyone trying to reproduce your results but doing it right will fail, but be unable to explain why their results differed. Without your code peer review of your work is both harder and less valuable.

      Unless deterring review is the researcher's intent, of course.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    4. Re:Don't forget spreadsheets by dkf · · Score: 1

      I can write that I convolved two functions, but you don't need to see the code that I used to do the convolution.

      So you used a standard library for doing the convolution, cited that library correctly, and showed how you called the library? That would be very good academic programming and paper-writing too. Of course, the flip side also holds: if you don't show your methods properly, or don't cite others work that you use or reference, you're a bad academic. If you do it all yourself when much of it isn't your research focus, you're just wasting your time (and encouraging others to ignore you).

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    5. Re:Don't forget spreadsheets by ebno-10db · · Score: 1

      As we've seen recently, bad decisions can be made from errors in spreadsheets.

      For that problem, let's just get rid of spreadsheets (at least as they're implemented in most programs). Copy-and-paste is the standard way to do the same computation in several places. How much further could you get from good practice? Reviewing the "code" requires peering at every cell. Etc., etc,. etc. Lastly, the people who use them are often idiots who have no idea what they're doing. At least if you made them use a programming language, they'd never get it to run. That way they couldn't pretend that they made meaningful calculations.

  6. they do Seamonkey, a better browser than Firefox by raymorris · · Score: 1

    What else do they do, you ask? They support Seamonkey, Firefox's older brother. Firefox began as a stripped down,lightweight, minimalist version of Seamonkey. Though Firefox is no longer lightweight, Seamonkey is still more capable in some respects. The suite includes an email client and WYSIWYG editor, but I just like the browser.

    While Firefox is controlled by the Mozilla Foundation, Seamonkey is community driven now, with hosting and other support from the foundation.

  7. Re:Provide a tool then BUTT OUT by phantomfive · · Score: 1

    Believe it or not, there actually are at least some scientists in the Mozilla Science Lab. Crazy, right?

    --
    "First they came for the slanderers and i said nothing."
  8. Get used to it by flyingfsck · · Score: 1

    If you want to code, then you got to get used to code reviews. It is the only way to improve quality and a scientist that doesn't want to improve quality should not be a scientist.

    --
    Excuse me, but please get off my Pennisetum Clandestinum, eh!
    1. Re:Get used to it by VortexCortex · · Score: 1

      Correction: a scientist that doesn't want to improve source quality shouldn't be a codemonkey...

    2. Re:Get used to it by BitZtream · · Score: 1

      Correction: a scientist that doesn't want to improve source quality isn't a scientist.

      Some can argue that they don't have time or budget to do so, but flat out not wanting to is a failure of the process itself. Its not someone you want to trust to make predictions on data.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  9. The Horror by Vegemite · · Score: 3, Interesting

    You must be joking. Many scientific papers out there have results based on prototype or proof of concept software written by naive grad students for their advisors. These are largely uncommented hacks with little, if any, sanity checks. To sell these prototypes commercially, I have had to cleanup after some of these grads. I take great sadistic pleasure in throwing out two years of effort and rewriting it all from scratch in a couple of weeks.

    1. Re:The Horror by umafuckit · · Score: 1

      I have had to cleanup after some of these grads. I take great sadistic pleasure in throwing out two years of effort and rewriting it all from scratch in a couple of weeks.

      Of course it's a lot easier and quicker to re-write someone's code when you already know what you're aiming at.

  10. If we knew what we were doing... by dargaud · · Score: 2

    ...it wouldn't be called research now does it ? Seriously manu scientific projects start with a vague idea and no funds. You do a table experiment, connect it to a 15 year old computer, then grow from there. In some projects I got no more than a quarter page of specifications for what ended up as 30 thousand lines of code. Yes I write scientific code, and no it's not always pretty and refactored and all that. Also there's never any money.

    --
    Non-Linux Penguins ?
  11. Too heavy mozilla drives mac users to chrome by hereshalkidiki · · Score: 1

    I've been a fateful mozilla user for years. However on MAC due to the slowness of the browser and the high RAM consumption I permanently switched to Chrome. So may be they should make an experiment on how to keep their MAC users because until now they've been great at that. When I went to buy VPN from http://vpnarea.com I was surprised to find out that they had an extension for Chrome but not for Mozilla.

    --
    http://vpnarea.com
    1. Re:Too heavy mozilla drives mac users to chrome by smash · · Score: 1

      Good enough Safari had me ditch both Mozilla AND Chrome. I've had no real issue with Safari since 4.0... certainly nothing big enought to justify installing another browser to secure and maintain.

      --
      I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
  12. I agree, but FYI: by TheSeatOfMyPants · · Score: 1

    MAC (all-caps) - Machine Access Code, a hexadecmial address used to identify individual pieces hardware on a network
    Mac - marketing name for the longstanding "Macintosh" line of computers by Apple

    I've used Firefox since it first came out, but it's so damned bloated with unneeded 'extras' that I only stick with it because it's the one browser that allows extensions like AdBlock Plus to block outgoing server requests, not just hide the results. I had defected over to Opera for several months, but when they decided to become a Chrome clone, I gave up on it altogether.

    --
    Now mostly at Usenet:comp.misc & SoylentNews.org (it's made of people!)
    1. Re:I agree, but FYI: by _merlin · · Score: 1

      I've used Firefox since it first came out, but it's so damned bloated with unneeded 'extras' that I only stick with it because it's the one browser that allows extensions like AdBlock Plus to block outgoing server requests, not just hide the results.

      FWIW Safari allows extensions to block the requests before they're made as well, although the exact mechanism may be different.

  13. Absolutely necessary ... by Anonymous Coward · · Score: 1

    Most of my collegues at the university are terrible coders and I am often even not sure how much I trust their results. Even if it does scare people, there has to be more awareness about code review in the scientific field than there is today.

  14. Good intentions, bad implementation by OneSmartFellow · · Score: 1

    Having seen some code written by an esteemed Bio-Chemist, I agree that experienced programmers should be reviewing their code, but then, you'd expect a true scientist to have an expert review his stuff anyway.

    My experience was a real eye opener. Between the buffer overruns, and logic holes, I am amazed the crap ran at all. The fact that it compiled was a bit of a mystery until I realized that it was possible to ignore compile errors.

    1. Re:Good intentions, bad implementation by Anonymous Coward · · Score: 2, Insightful

      This is a logical fallacy that many 'smart' people fall into. I am smart (in this case usually PhD's or people on their way to it) so this XYZ thing should be no sweat. They seem to forget that they spent 10-15 years becoming very good at whatever they do. Becoming a master of it. Yet somehow they also believe they can use this mastery on other things. In some very narrow cases you can do this. But many times you can not. Or even worse assuming no one else can understand what you are doing or they will 'get it wrong'.

      When the right thing to do is find another master in that other field. Even that is dangerous. You will also see many out there who then follow in the footsteps of these 'know it all' masters. Yelling the word 'science' at anyone who disagrees. Disagreeing is not because they think you are wrong (maybe you are), but because they do not understand.

      In this case writing code is *easy*, writing good code takes work. Even those who are masters at it make mistakes. We call them bugs. Even when you are good at it you still work at making it correct, even if you do it just because you have 'been there'. There are whole books out there on anti-patterns, patterns, development style, code philosophy, etc. From my POV it usually takes someone about 2 years to become somewhat 'ok' at programming. Somewhere in the 5-10 year mark they become masters. Then that is if they do it every day.

  15. Not technical tho by mjwalshe · · Score: 1

    Mozilla would appear to be be mostly commercial progamers so not sure that having them look at the code would give any value.

  16. Oh dear by mjwalshe · · Score: 1

    You obviously haven't worked with people who are world leaders in their field they are not going to take advice from some commercial web dev on code.

    Though back in the day I did make one guys code a bit more user friendly (his origioal comment was I dont need any prompts to remind me what i need to type ) as we had scaled to 1:1 models and as one single run of the rig could cost £20k in materiel's.

  17. Egoless programming by Anonymous Coward · · Score: 2, Interesting

    Back in the late 70s middle ages of comp sci...
    There was this thing called "egoless programming" being taught. The idea being that we have to inculcate in developers the idea that your code is not necessarily a reflection of your personal worth, and that it deserves to be poked at and prodded, and that you should not take personal offense by it.

    Yeah, it's a child of the 60s kind of thing, but it does work.

    This is a huge challenge in the biomedical research field, because to be successful, you need personality traits like a strong ego (yes, *I* am brilliant, and my idea is the best, and you should fund it, and not that other bozo).

    1. Re:Egoless programming by John+Allsup · · Score: 1

      That modern research rewards egoism is one of the most dangerous, worrying and disillusioning features of modern research.  The best thinkers are sure to be suffocated in the face of masses of intellectual university graduates chasing research money and the dream of being regarded as one of those 'best thinkers'.

      --
      John_Chalisque
    2. Re:Egoless programming by ebno-10db · · Score: 1

      The idea being that we have to inculcate in developers the idea that your code is not necessarily a reflection of your personal worth, and that it deserves to be poked at and prodded, and that you should not take personal offense by it.

      Wusses and namby-pambies. I take the opposite approach. Three or more bugs found in your code results in summary execution, with your corpse hung from the flagpole as a reminder to others.

  18. We need more code out there by John+Allsup · · Score: 1

    and to improve how it looks, and lose the shame that we instinctively feel in the face of criticism.  No-one codes perfectly, so there is always room for useful criticism and progress, and we need to get that awareness of coding issues out as well, not just code alone.

    --
    John_Chalisque
  19. researcher vs. software developer by Anonymous Coward · · Score: 5, Informative

    People doing scientific research and software developers are really doing very different things when they write code. For software developers or software engineers, the code is the end goal. They are building a product that they are going to give to others. It should be intuitive to use, robust, produce clear error messages, and be free of bugs and crashes. The code is the product. For someone doing scientific or engineering research, the end goal is the testing an idea, or running an experiment. The code is a means to an end, not the end itself; it needs only to support the researcher, it only needs to run once, and it only needs to be bug free in the cases that are being explored. The product is a graph or chart or sentence describing the results that is put into a paper that gets published; the code itself is just a tool.

    When I got my Ph.D. in the 1990s, I didn't understand this, and it brought be a lot of grief when I went to a research lab and interacted with software developers and managers, who didn't understand this either. The grief comes about because of the different approaches used during the development of each type of code. Software developers describe their process variously as a waterfall model, agile development model, etc.. These processes describe a roadmap, with milestones, and a set of activities that visualize the project at its end, and lead towards robust software development. The process a researcher uses is related to the scientific method: based on the question, they formulate a hypothesis, create an experiment, test it, observe the results, and then ask more questions. They do not always know how things will turn out, and they build their path as they go along. Very often, the equivalent "roadmap" in a researchers mind is incomplete and is developed during the process, because this is part of what is being explored.

    In my organization, this makes tremendous conflict between software developers, who want a careful, process driven model to produce robust code, and researchers, who are seeking to answer more basic questions and explore unknown territory in a way that has a great deal of uncertainty and cannot always easily deliver specific milestones and clarity into schedule that is often desired.

    It is worse when the research results in a useful algorithm; of course, the researcher often wants to make it available to the world so that others can use it. This is more of a grey area; if the researcher knows how to do software engineering, they may go through the process to create a more robust product, but this takes effort and time. The fact that Mozilla wants to help debug scientific code is a very good thing; it often needs more serious debugging and re-architecting than other software that is openly available.

    I wish more people understood this difference.

  20. Looking over the shoulder by glennrrr · · Score: 2

    I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.

    1. Re:Looking over the shoulder by biodata · · Score: 2

      this^1000

      --
      Korma: Good
    2. Re:Looking over the shoulder by ebno-10db · · Score: 1

      In all fairness that's an easy mistake to make, because ^ means exponentiation in other languages. It's an historical stupidity, like the fact that log() is the natural log, not log10().

    3. Re:Looking over the shoulder by __aaltlg1547 · · Score: 1

      Frequently. It's not supposed to be their main area of expertise and they often learn just enough to solve their immediate problem. And why should they learn more? So occasionally they make blunders like that, but a professional computer programmer wouldn't know what problem to code or what analysis needs to be done in the first place. That's what the scientists are good at.

    4. Re:Looking over the shoulder by tlhIngan · · Score: 2

      I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.

      Scientists and researchers generally write lousy code. If you think TheDailyWTF is bad, you haven't seen researcher code.

      Generally write-only, lots of copy-pasta going on, variables that *might* make sense (and probably declared globally) and if you're really lucky, lack of subroutines or functions.

      Hell, the code itself may only compile on one specific machine - the researcher's - due to hidden dependencies, version issues, etc. And may even involve a lot of convoluted mechanisms involving formatting the input, sending it through one program, then taking the output, reformatting it (manually, of course) and shoving it through a second program with a different script, etc.

      Code is generally secondary to the actual research at hand - it's done to facilitate the analysis by being quicker

      If you're really lucky, the researcher would've actually done an analysis manually to verify the program(s) actually work properly.

  21. any review may find off-by-one, etc. by raymorris · · Score: 2

    Having ANY second programmer look at the code may well find off-by-one or fence post errors and the like.

  22. The Other Edge of the Sword by fygment · · Score: 4, Interesting

    Roger Peng's comment shows a typical, superficial understanding of programming. Ironically, he would be the first to condemn a computer scientist/coder who ventured in to biostatistics with a superficial knowledge of biology. I believe he would feel that anyone can program, but not anyone can do biostatistics. And I deeply disagree. Tools have been provided so that _any_ scientist can code. That does not mean that they understand coding or computer science.

    I have personally experienced that especially in the softer sciences like biology, economy, meteorology, etc., the scientists have absolutely no desire to learn any computer science: coding methodology, testing, complexity, algorithms, etc. The result is kludgy, inefficient code heavily dependent on pre-packaged modules, that produces results that are often a guess; the code produces results but with a lack of any understanding of what the various packaged routines are doing or whether they are appropriate for the task. For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong. It is the same as someone approaching engineering without some understanding of thermodynamics and as a result wasting their time trying to construct a perpetual motion machine.

    --
    "Consensus" in science is _always_ a political construct.
    1. Re:The Other Edge of the Sword by umafuckit · · Score: 3, Informative

      For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong.

      I'm a biologist who learned enough computational stats to get by and I do see what you mean. Initially I did do stuff like that, but over time I put in the effort to learn what's going on and now I hope I make these sorts of dumb mistakes a lot less often! However this is not so much a coding problem, but a stats problem. People in the "soft sciences" don't just have problems with more advanced stuff such as PCA, ICA, clustering, etc, but even simple stats. For example, it's very common to see ANOVA performed on data that would be much better suited to regression analysis. The concept of fitting a line or curve and extracting meaning from the coefficients is rather foreign to a lot of biologists, who are more comfortable with a table full of p-values. Indeed, there is a general fixation on p-values, despite the fact that these are not well understood. There is a tendency to hide raw data (since biological data are often noisy). There is also a tendency to use analyses such as PCA or hierarchical clustering simply to produce fancy plots to blind reviewers; these plots often add no insight (or the insight they might add is not explored).

    2. Re:The Other Edge of the Sword by jhumkey · · Score: 1

      I would add . . . its not just pure "research" with the superficial understanding of programming.

      I've seen personally (and "Dilbert" would seem to confirm as universal) the generalized business belief that . . .

      "programming is easy."
      "quality is easy."
      "expand-ability is easy."
      "maintainability is easy."
      "If I just had a Project Management tool to keep a death grip on delivery time . . . all those other "easy" things will just naturally fall into place."

      I keep thinking the opposite . . .

      Quality, Suitability of Purpose, Expand-ability, Maintainability, Inter-operability, the delivery of promised features . . . those are the "hard" things. If we had a true handle on those . . . we wouldn't need the death grip on deliver-ability. Because we'd deliver a reliable version of what was needed the first pass, and spend less time maintaining it over the long haul to have time for the other projects and future expansion.

      (Its been how many years now . . . and we're still not accepting the basic concepts from the Mythical Man Month?)

      jkh

      --
      No, I don't remember your name. But the memory mapped screen on a TRS80 from 1977 is from 15360 to 16383 if that helps.
  23. Re:Provide a tool then BUTT OUT by ebno-10db · · Score: 1

    Yes Mozilla. BUTT OUT!!! Your coders are not scientists. ... Scientists have enough to deal with

    Scientists have enough to deal with ... like buggy code? RTFA. It causes real problems, and I have no use for the "we're specialists, you couldn't possibly help us" attitude (often it's espoused to hide problems).

    Would you trust a chemist who didn't know the proper practices for working in a chem lab? If not, why should you trust someone doing computational chemistry problems who doesn't know how to code? It's too easy to fall for the "how hard could this be" syndrome. For example, the time Richard Feynman spent a sabbatical working in a biology lab and trashed an important experiment due to his ignorance of the proper methods (a mistake, which unlike many other people, he freely admitted to).

  24. Been doing this for 15 years by Anonymous Coward · · Score: 1

    For the brother-in-law, MD/PhD at local school - he sits on several review boards.

    The biggie is not the code, but the data set. Like to design data sets to test code rather than do code reviews.

    Have also done some code reviews when the b-in-law was not certain. And have found 'bogus' code twice.

    Another (anecdotal) point - all problems found were with life science students. NONE/ZERO/NADA problems with code done by physical sciences or engineering people. Unless you want to count some of the most ugly Python code ever seen...

  25. Babel by Warbothong · · Score: 1

    On a related note, the Babel project is getting pushed for Reproducible Research http://orgmode.org/worg/org-contrib/babel/intro.html
    It allows code to be embedded in other documents, eg. the LaTeX source of a paper, and executed during rendering.

    Also the Recomputation project is trying to archive scientific code, complete with virtual machines set up to run them http://www.recomputation.org/

  26. Do research, don't write code by Fuzzums · · Score: 1

    Researchers are good at researching. They can write some code though.
    Programmers are good at programming. They know how to write good code that is easy to maintain and adapt.

    If you're a researcher with some experience in writing code, you should ask you self, "should I spend that much time writing code, while a programmer does a better job in less time while it has also less bugs, will be reviewed and has unit tests"? Also, how much do you know about design patterns? Sure. Your code works without. Good luck with it. Also good luck with the headache in one year.

    --
    Privacy is terrorism.
    1. Re:Do research, don't write code by umafuckit · · Score: 1

      If you're a researcher with some experience in writing code, you should ask you self, "should I spend that much time writing code, while a programmer does a better job in less time while it has also less bugs, will be reviewed and has unit tests"? Also, how much do you know about design patterns? Sure. Your code works without. Good luck with it. Also good luck with the headache in one year.

      It usually doesn't work like that. The researcher does the experiments then analyses and interprets the data. If the latter process requires coding then the researcher does the coding. If a researcher gives up the coding to a programmer (who may have a bad understanding of the science) then they have lost ownership of their data. Besides, there's usually no money to pay a programmer. The only situation where a programmer is called for is in a big lab which needs one or more significant software projects created for things like complex data acquisition or controlling elaborate hardware. There you have a point: I've seen such projects undertaken by non-specialists and the results are hair-raising.

  27. It's Damn Fine Idea by LifesABeach · · Score: 1

    Nobody gives a rats rear what some persons code looks like. Code styles are like posterior sphincter muscles, everybody has one. But how about code, and conlusions that are just plane wrong? If that grad student hadn't checked, just how more damage would go on, try, "it wouldn't stop." I'm beginning to wonder if this couldn't be done using some kind of "blind" study?