Mozilla Plan Seeks To Debug Scientific Code
ananyo writes "An offshoot of Mozilla is aiming to discover whether a review process could improve the quality of researcher-built software that is used in myriad fields today, ranging from ecology and biology to social science. In an experiment being run by the Mozilla Science Lab, software engineers have reviewed selected pieces of code from published papers in computational biology. The reviewers looked at snippets of code up to 200 lines long that were included in the papers and written in widely used programming languages, such as R, Python and Perl. The Mozilla engineers have discussed their findings with the papers’ authors, who can now choose what, if anything, to do with the markups — including whether to permit disclosure of the results. But some researchers say that having software reviewers looking over their shoulder might backfire. 'One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code,' says biostatistician Roger Peng at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. 'We need to get more code out there, not improve how it looks.'"
I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.
Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?
Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.
But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.
Ouch
Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it. Maybe mathematicians are weird like that -- I face stigma for using a computer, so anything I can do to make it look more trustworthy is awesome.
When did Mozilla get a Science Lab? Here I always thought that all the Mozilla foundation made a decent browser, and now I find they have a science lab. What other things does Mozilla do?
"First they came for the slanderers and i said nothing."
The overall structure of most the code in HEP [1] is nasty. It's too late for the likes of ROOT [2]: input of software engineers at the early stages of code design could be very useful.
1. https://en.wikipedia.org/wiki/Particle_physics
2. https://en.wikipedia.org/wiki/Root.cern
As we've seen recently, bad decisions can be made from errors in spreadsheets. We need these published so they can be double-checked as well.
There's no -1 for "I don't get it."
What else do they do, you ask? They support Seamonkey, Firefox's older brother. Firefox began as a stripped down,lightweight, minimalist version of Seamonkey. Though Firefox is no longer lightweight, Seamonkey is still more capable in some respects. The suite includes an email client and WYSIWYG editor, but I just like the browser.
While Firefox is controlled by the Mozilla Foundation, Seamonkey is community driven now, with hosting and other support from the foundation.
Believe it or not, there actually are at least some scientists in the Mozilla Science Lab. Crazy, right?
"First they came for the slanderers and i said nothing."
If you want to code, then you got to get used to code reviews. It is the only way to improve quality and a scientist that doesn't want to improve quality should not be a scientist.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
You must be joking. Many scientific papers out there have results based on prototype or proof of concept software written by naive grad students for their advisors. These are largely uncommented hacks with little, if any, sanity checks. To sell these prototypes commercially, I have had to cleanup after some of these grads. I take great sadistic pleasure in throwing out two years of effort and rewriting it all from scratch in a couple of weeks.
...it wouldn't be called research now does it ? Seriously manu scientific projects start with a vague idea and no funds. You do a table experiment, connect it to a 15 year old computer, then grow from there. In some projects I got no more than a quarter page of specifications for what ended up as 30 thousand lines of code. Yes I write scientific code, and no it's not always pretty and refactored and all that. Also there's never any money.
Non-Linux Penguins ?
I've been a fateful mozilla user for years. However on MAC due to the slowness of the browser and the high RAM consumption I permanently switched to Chrome. So may be they should make an experiment on how to keep their MAC users because until now they've been great at that. When I went to buy VPN from http://vpnarea.com I was surprised to find out that they had an extension for Chrome but not for Mozilla.
http://vpnarea.com
MAC (all-caps) - Machine Access Code, a hexadecmial address used to identify individual pieces hardware on a network
Mac - marketing name for the longstanding "Macintosh" line of computers by Apple
I've used Firefox since it first came out, but it's so damned bloated with unneeded 'extras' that I only stick with it because it's the one browser that allows extensions like AdBlock Plus to block outgoing server requests, not just hide the results. I had defected over to Opera for several months, but when they decided to become a Chrome clone, I gave up on it altogether.
Now mostly at Usenet:comp.misc & SoylentNews.org (it's made of people!)
Most of my collegues at the university are terrible coders and I am often even not sure how much I trust their results. Even if it does scare people, there has to be more awareness about code review in the scientific field than there is today.
Having seen some code written by an esteemed Bio-Chemist, I agree that experienced programmers should be reviewing their code, but then, you'd expect a true scientist to have an expert review his stuff anyway.
My experience was a real eye opener. Between the buffer overruns, and logic holes, I am amazed the crap ran at all. The fact that it compiled was a bit of a mystery until I realized that it was possible to ignore compile errors.
Mozilla would appear to be be mostly commercial progamers so not sure that having them look at the code would give any value.
You obviously haven't worked with people who are world leaders in their field they are not going to take advice from some commercial web dev on code.
Though back in the day I did make one guys code a bit more user friendly (his origioal comment was I dont need any prompts to remind me what i need to type ) as we had scaled to 1:1 models and as one single run of the rig could cost £20k in materiel's.
Back in the late 70s middle ages of comp sci...
There was this thing called "egoless programming" being taught. The idea being that we have to inculcate in developers the idea that your code is not necessarily a reflection of your personal worth, and that it deserves to be poked at and prodded, and that you should not take personal offense by it.
Yeah, it's a child of the 60s kind of thing, but it does work.
This is a huge challenge in the biomedical research field, because to be successful, you need personality traits like a strong ego (yes, *I* am brilliant, and my idea is the best, and you should fund it, and not that other bozo).
and to improve how it looks, and lose the shame that we instinctively feel in the face of criticism. No-one codes perfectly, so there is always room for useful criticism and progress, and we need to get that awareness of coding issues out as well, not just code alone.
John_Chalisque
People doing scientific research and software developers are really doing very different things when they write code. For software developers or software engineers, the code is the end goal. They are building a product that they are going to give to others. It should be intuitive to use, robust, produce clear error messages, and be free of bugs and crashes. The code is the product. For someone doing scientific or engineering research, the end goal is the testing an idea, or running an experiment. The code is a means to an end, not the end itself; it needs only to support the researcher, it only needs to run once, and it only needs to be bug free in the cases that are being explored. The product is a graph or chart or sentence describing the results that is put into a paper that gets published; the code itself is just a tool.
When I got my Ph.D. in the 1990s, I didn't understand this, and it brought be a lot of grief when I went to a research lab and interacted with software developers and managers, who didn't understand this either. The grief comes about because of the different approaches used during the development of each type of code. Software developers describe their process variously as a waterfall model, agile development model, etc.. These processes describe a roadmap, with milestones, and a set of activities that visualize the project at its end, and lead towards robust software development. The process a researcher uses is related to the scientific method: based on the question, they formulate a hypothesis, create an experiment, test it, observe the results, and then ask more questions. They do not always know how things will turn out, and they build their path as they go along. Very often, the equivalent "roadmap" in a researchers mind is incomplete and is developed during the process, because this is part of what is being explored.
In my organization, this makes tremendous conflict between software developers, who want a careful, process driven model to produce robust code, and researchers, who are seeking to answer more basic questions and explore unknown territory in a way that has a great deal of uncertainty and cannot always easily deliver specific milestones and clarity into schedule that is often desired.
It is worse when the research results in a useful algorithm; of course, the researcher often wants to make it available to the world so that others can use it. This is more of a grey area; if the researcher knows how to do software engineering, they may go through the process to create a more robust product, but this takes effort and time. The fact that Mozilla wants to help debug scientific code is a very good thing; it often needs more serious debugging and re-architecting than other software that is openly available.
I wish more people understood this difference.
I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.
Having ANY second programmer look at the code may well find off-by-one or fence post errors and the like.
Roger Peng's comment shows a typical, superficial understanding of programming. Ironically, he would be the first to condemn a computer scientist/coder who ventured in to biostatistics with a superficial knowledge of biology. I believe he would feel that anyone can program, but not anyone can do biostatistics. And I deeply disagree. Tools have been provided so that _any_ scientist can code. That does not mean that they understand coding or computer science.
I have personally experienced that especially in the softer sciences like biology, economy, meteorology, etc., the scientists have absolutely no desire to learn any computer science: coding methodology, testing, complexity, algorithms, etc. The result is kludgy, inefficient code heavily dependent on pre-packaged modules, that produces results that are often a guess; the code produces results but with a lack of any understanding of what the various packaged routines are doing or whether they are appropriate for the task. For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong. It is the same as someone approaching engineering without some understanding of thermodynamics and as a result wasting their time trying to construct a perpetual motion machine.
"Consensus" in science is _always_ a political construct.
Yes Mozilla. BUTT OUT!!! Your coders are not scientists. ... Scientists have enough to deal with
Scientists have enough to deal with ... like buggy code? RTFA. It causes real problems, and I have no use for the "we're specialists, you couldn't possibly help us" attitude (often it's espoused to hide problems).
Would you trust a chemist who didn't know the proper practices for working in a chem lab? If not, why should you trust someone doing computational chemistry problems who doesn't know how to code? It's too easy to fall for the "how hard could this be" syndrome. For example, the time Richard Feynman spent a sabbatical working in a biology lab and trashed an important experiment due to his ignorance of the proper methods (a mistake, which unlike many other people, he freely admitted to).
For the brother-in-law, MD/PhD at local school - he sits on several review boards.
The biggie is not the code, but the data set. Like to design data sets to test code rather than do code reviews.
Have also done some code reviews when the b-in-law was not certain. And have found 'bogus' code twice.
Another (anecdotal) point - all problems found were with life science students. NONE/ZERO/NADA problems with code done by physical sciences or engineering people. Unless you want to count some of the most ugly Python code ever seen...
On a related note, the Babel project is getting pushed for Reproducible Research http://orgmode.org/worg/org-contrib/babel/intro.html
It allows code to be embedded in other documents, eg. the LaTeX source of a paper, and executed during rendering.
Also the Recomputation project is trying to archive scientific code, complete with virtual machines set up to run them http://www.recomputation.org/
Researchers are good at researching. They can write some code though.
Programmers are good at programming. They know how to write good code that is easy to maintain and adapt.
If you're a researcher with some experience in writing code, you should ask you self, "should I spend that much time writing code, while a programmer does a better job in less time while it has also less bugs, will be reviewed and has unit tests"? Also, how much do you know about design patterns? Sure. Your code works without. Good luck with it. Also good luck with the headache in one year.
Privacy is terrorism.
Nobody gives a rats rear what some persons code looks like. Code styles are like posterior sphincter muscles, everybody has one. But how about code, and conlusions that are just plane wrong? If that grad student hadn't checked, just how more damage would go on, try, "it wouldn't stop." I'm beginning to wonder if this couldn't be done using some kind of "blind" study?