Mozilla Plan Seeks To Debug Scientific Code
ananyo writes "An offshoot of Mozilla is aiming to discover whether a review process could improve the quality of researcher-built software that is used in myriad fields today, ranging from ecology and biology to social science. In an experiment being run by the Mozilla Science Lab, software engineers have reviewed selected pieces of code from published papers in computational biology. The reviewers looked at snippets of code up to 200 lines long that were included in the papers and written in widely used programming languages, such as R, Python and Perl. The Mozilla engineers have discussed their findings with the papers’ authors, who can now choose what, if anything, to do with the markups — including whether to permit disclosure of the results. But some researchers say that having software reviewers looking over their shoulder might backfire. 'One worry I have is that, with reviews like this, scientists will be even more discouraged from publishing their code,' says biostatistician Roger Peng at the Johns Hopkins Bloomberg School of Public Health in Baltimore, Maryland. 'We need to get more code out there, not improve how it looks.'"
Isn't it rich?
Are we a pair?
Me here at last on the ground,
You in mid-air.
Send in the codes.
Isn't it bliss?
Don't you approve?
One who keeps tearing around,
One who can't move.
Where are the codes?
Send in the codes.
Just when I'd stopped
Opening doors,
Finally knowing
The one that I wanted was yours,
Making my entrance again
With my usual flair,
Sure of my lines,
No one is there.
Don't you love farce?
My fault, I fear.
I thought that you'd want what I want -
Sorry, my dear.
And where are the codes?
Quick, send in the codes.
Don't bother, they're here.
Isn't it rich?
Isn't it queer?
Losing my timing this late
In my career?
And where are the codes?
There ought to be codes.
Well, maybe next year . . .
I don't know the actual objective ... but if the concern is "'We need to get more code out there, not improve how it looks.'" ... the objective is bad.
Wouldn't shouldn't this be about catching subtle logic / calculation flaws that lead to incorrect conclusions?
Agree ... if this is about indenting and which method of commenting ... then yeah ... bad idea.
But this has the possibility of being so much more. I would see it as free editing by qualified people. Seems like a deal.
Ouch
Yes Mozilla. BUTT OUT!!! Your coders are not scientists. Provide a code review tool like Findbugs and perhaps offer to assist pre-publication, but don't start spreading your "way of doing things" which puts off your own users. Scientists have enough to deal with
Where do I sign up? If I could get a "code reviewed by third party" stamp on my papers, I'd feel a lot better about publishing the code and the results derived from it. Maybe mathematicians are weird like that -- I face stigma for using a computer, so anything I can do to make it look more trustworthy is awesome.
When did Mozilla get a Science Lab? Here I always thought that all the Mozilla foundation made a decent browser, and now I find they have a science lab. What other things does Mozilla do?
"First they came for the slanderers and i said nothing."
The overall structure of most the code in HEP [1] is nasty. It's too late for the likes of ROOT [2]: input of software engineers at the early stages of code design could be very useful.
1. https://en.wikipedia.org/wiki/Particle_physics
2. https://en.wikipedia.org/wiki/Root.cern
Mozilla barely has control of their own code base. The number of open bugs keeps increasing. Attempts to multi-thread the browser failed. The frantic release schedule results in things like the broken Firefox 23, where panels in add-ons just disappeared off screen. They have legacy code back to Netscape 1, and it's crushing them. Firefox market share is declining steadily. Not good.
See subject line. I don't know what the hell qualifies Mozilla to review scientific code. For one thing, scientific code in academic papers is proof-of-concept - it's designed to show how to implement something according to the description in the paper, not engineered for general deployment.
The bla bla need more people counterargument is bollocks, however - there are enough people in computational biology doing utterly pointless things.
Perhaps Mozilla's looking for another way to justify its on-going tax avoision status, of course.
As we've seen recently, bad decisions can be made from errors in spreadsheets. We need these published so they can be double-checked as well.
There's no -1 for "I don't get it."
What else do they do, you ask? They support Seamonkey, Firefox's older brother. Firefox began as a stripped down,lightweight, minimalist version of Seamonkey. Though Firefox is no longer lightweight, Seamonkey is still more capable in some respects. The suite includes an email client and WYSIWYG editor, but I just like the browser.
While Firefox is controlled by the Mozilla Foundation, Seamonkey is community driven now, with hosting and other support from the foundation.
Mozilla better work on de-bloating its own code first.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Actually, wasn't the latter partially inspired by the former?
If you want to code, then you got to get used to code reviews. It is the only way to improve quality and a scientist that doesn't want to improve quality should not be a scientist.
Excuse me, but please get off my Pennisetum Clandestinum, eh!
You must be joking. Many scientific papers out there have results based on prototype or proof of concept software written by naive grad students for their advisors. These are largely uncommented hacks with little, if any, sanity checks. To sell these prototypes commercially, I have had to cleanup after some of these grads. I take great sadistic pleasure in throwing out two years of effort and rewriting it all from scratch in a couple of weeks.
...it wouldn't be called research now does it ? Seriously manu scientific projects start with a vague idea and no funds. You do a table experiment, connect it to a 15 year old computer, then grow from there. In some projects I got no more than a quarter page of specifications for what ended up as 30 thousand lines of code. Yes I write scientific code, and no it's not always pretty and refactored and all that. Also there's never any money.
Non-Linux Penguins ?
I've been a fateful mozilla user for years. However on MAC due to the slowness of the browser and the high RAM consumption I permanently switched to Chrome. So may be they should make an experiment on how to keep their MAC users because until now they've been great at that. When I went to buy VPN from http://vpnarea.com I was surprised to find out that they had an extension for Chrome but not for Mozilla.
http://vpnarea.com
MAC (all-caps) - Machine Access Code, a hexadecmial address used to identify individual pieces hardware on a network
Mac - marketing name for the longstanding "Macintosh" line of computers by Apple
I've used Firefox since it first came out, but it's so damned bloated with unneeded 'extras' that I only stick with it because it's the one browser that allows extensions like AdBlock Plus to block outgoing server requests, not just hide the results. I had defected over to Opera for several months, but when they decided to become a Chrome clone, I gave up on it altogether.
Now mostly at Usenet:comp.misc & SoylentNews.org (it's made of people!)
Peer review and all. But put your stupid egos aside and concentrate on what you're supposedly trying to achieve. That's science, buddy.
Most of my collegues at the university are terrible coders and I am often even not sure how much I trust their results. Even if it does scare people, there has to be more awareness about code review in the scientific field than there is today.
Quality doesn't really say much. I assume they mean efficiency, readability, re-usability,... things professional coders are confronted with daily.
Roger Peng should stop bitching, good code is written by professional coders, we don't look down on you (in public) for writing bad code along with research papers. This is an outreach from professional coders to academics, I find it quite arrogant to warn that this "might backfire'.
Having seen some code written by an esteemed Bio-Chemist, I agree that experienced programmers should be reviewing their code, but then, you'd expect a true scientist to have an expert review his stuff anyway.
My experience was a real eye opener. Between the buffer overruns, and logic holes, I am amazed the crap ran at all. The fact that it compiled was a bit of a mystery until I realized that it was possible to ignore compile errors.
Sorry, but almost no meaningful review can come of 200 line sample of some program.
This is the equivalent of saying "we want to review one paragraph of your paper."
You might find a few typos from copy/paste, but good luck catching whole-program issues.
Mozilla would appear to be be mostly commercial progamers so not sure that having them look at the code would give any value.
You obviously haven't worked with people who are world leaders in their field they are not going to take advice from some commercial web dev on code.
Though back in the day I did make one guys code a bit more user friendly (his origioal comment was I dont need any prompts to remind me what i need to type ) as we had scaled to 1:1 models and as one single run of the rig could cost £20k in materiel's.
Back in the late 70s middle ages of comp sci...
There was this thing called "egoless programming" being taught. The idea being that we have to inculcate in developers the idea that your code is not necessarily a reflection of your personal worth, and that it deserves to be poked at and prodded, and that you should not take personal offense by it.
Yeah, it's a child of the 60s kind of thing, but it does work.
This is a huge challenge in the biomedical research field, because to be successful, you need personality traits like a strong ego (yes, *I* am brilliant, and my idea is the best, and you should fund it, and not that other bozo).
and to improve how it looks, and lose the shame that we instinctively feel in the face of criticism. No-one codes perfectly, so there is always room for useful criticism and progress, and we need to get that awareness of coding issues out as well, not just code alone.
John_Chalisque
Faith is where it's at! Looking at "science" journals is like looking at internet pron- it's a one way ticket to H-E-double hockeysticks! You need some proper churchin'!
People doing scientific research and software developers are really doing very different things when they write code. For software developers or software engineers, the code is the end goal. They are building a product that they are going to give to others. It should be intuitive to use, robust, produce clear error messages, and be free of bugs and crashes. The code is the product. For someone doing scientific or engineering research, the end goal is the testing an idea, or running an experiment. The code is a means to an end, not the end itself; it needs only to support the researcher, it only needs to run once, and it only needs to be bug free in the cases that are being explored. The product is a graph or chart or sentence describing the results that is put into a paper that gets published; the code itself is just a tool.
When I got my Ph.D. in the 1990s, I didn't understand this, and it brought be a lot of grief when I went to a research lab and interacted with software developers and managers, who didn't understand this either. The grief comes about because of the different approaches used during the development of each type of code. Software developers describe their process variously as a waterfall model, agile development model, etc.. These processes describe a roadmap, with milestones, and a set of activities that visualize the project at its end, and lead towards robust software development. The process a researcher uses is related to the scientific method: based on the question, they formulate a hypothesis, create an experiment, test it, observe the results, and then ask more questions. They do not always know how things will turn out, and they build their path as they go along. Very often, the equivalent "roadmap" in a researchers mind is incomplete and is developed during the process, because this is part of what is being explored.
In my organization, this makes tremendous conflict between software developers, who want a careful, process driven model to produce robust code, and researchers, who are seeking to answer more basic questions and explore unknown territory in a way that has a great deal of uncertainty and cannot always easily deliver specific milestones and clarity into schedule that is often desired.
It is worse when the research results in a useful algorithm; of course, the researcher often wants to make it available to the world so that others can use it. This is more of a grey area; if the researcher knows how to do software engineering, they may go through the process to create a more robust product, but this takes effort and time. The fact that Mozilla wants to help debug scientific code is a very good thing; it often needs more serious debugging and re-architecting than other software that is openly available.
I wish more people understood this difference.
I remember when I was in graduate school looking over a member of my group's shoulder and realizing he thought that the ^ operator in C meant raise to the power of instead of being the bitwise XOR operator. Scientists are often pretty indifferent programmers.
...peer review is now bad for science?
Or is it a safety thing? The little researcher written code I have seen is so horrible that it shouldn't be inflicted upon anybody else. Could easily cause heart attacks and strokes in many a programmer if too much of that gets out. :-)
Having ANY second programmer look at the code may well find off-by-one or fence post errors and the like.
I've had the rare privilege of reviewing some DNA analysis toolkits in Java. The complete morass of logic free debris, which was supposed to be "replaced by the new version" written by the same monkey who'd abandoned the old project due to how embarrassing all the failures wee, was coupled with a complete lack of any kind of error checking, bounds checking, or milestones so days or weeks of analysis which failed at the very end could be pulled in and the final broken analysis step patched and re-run.
But of course, it was commercial and closed source, so the complete inability to string ATGC together in predictable order from years of data sampling was blamed on everything else, and doubtless millions of dollars of experimental research based on *bad code* was burned by companies excited to have the genetic data they wanted. Be very, very scared of what mistakes by gene sequencing companies can lead to, because a lot of their data is just plain wrong.
That's not how research works. The researcher is continuing research in the same narrow area of a specific field. That is where their knowledge and expertise is. And their new research is a based on and a continuation of their old research. So, yes, finding problems in published code is _VERY_ important and useful. It also keeps _other_ researchers from using that as a basis for their research.
Roger Peng's comment shows a typical, superficial understanding of programming. Ironically, he would be the first to condemn a computer scientist/coder who ventured in to biostatistics with a superficial knowledge of biology. I believe he would feel that anyone can program, but not anyone can do biostatistics. And I deeply disagree. Tools have been provided so that _any_ scientist can code. That does not mean that they understand coding or computer science.
I have personally experienced that especially in the softer sciences like biology, economy, meteorology, etc., the scientists have absolutely no desire to learn any computer science: coding methodology, testing, complexity, algorithms, etc. The result is kludgy, inefficient code heavily dependent on pre-packaged modules, that produces results that are often a guess; the code produces results but with a lack of any understanding of what the various packaged routines are doing or whether they are appropriate for the task. For example, someone using default settings on a principal component analysis package not understanding that the package expects the user to have pre-processed the data; the output looks fine but it is wrong. It is the same as someone approaching engineering without some understanding of thermodynamics and as a result wasting their time trying to construct a perpetual motion machine.
"Consensus" in science is _always_ a political construct.
For the brother-in-law, MD/PhD at local school - he sits on several review boards.
The biggie is not the code, but the data set. Like to design data sets to test code rather than do code reviews.
Have also done some code reviews when the b-in-law was not certain. And have found 'bogus' code twice.
Another (anecdotal) point - all problems found were with life science students. NONE/ZERO/NADA problems with code done by physical sciences or engineering people. Unless you want to count some of the most ugly Python code ever seen...
On a related note, the Babel project is getting pushed for Reproducible Research http://orgmode.org/worg/org-contrib/babel/intro.html
It allows code to be embedded in other documents, eg. the LaTeX source of a paper, and executed during rendering.
Also the Recomputation project is trying to archive scientific code, complete with virtual machines set up to run them http://www.recomputation.org/
Researchers are good at researching. They can write some code though.
Programmers are good at programming. They know how to write good code that is easy to maintain and adapt.
If you're a researcher with some experience in writing code, you should ask you self, "should I spend that much time writing code, while a programmer does a better job in less time while it has also less bugs, will be reviewed and has unit tests"? Also, how much do you know about design patterns? Sure. Your code works without. Good luck with it. Also good luck with the headache in one year.
Privacy is terrorism.
Sorry, but better code is better. There is no "we don't need none of your fancy book lernin' here!"
The study authors "... discussed their findings with the papers’ authors, who can now choose what, if anything, to do with the markups." What's not to like here? They made a study, published it, and provided targeted feedback. The academics can choose to learn something... or not. So who shoved a bug up Mr. Peng's *ss?
I get that academics aren't software engineers. They aren't going to produce coding works of art. No one expects that. However better code will lead to faster, better, and more reliable academic studies. Wouldn't the academics have a vested self-interest in that? Even modest improvements in code quality will pay off and Mr. Peng dismisses that with "we need more code" while failing to mention that with his attitude, that additional code will be of equally low quality. So he's promoting shoddy research processes and expecting Nobel quality results? Guess what, the world doesn't work that way.
End of soap box rant.
Nobody gives a rats rear what some persons code looks like. Code styles are like posterior sphincter muscles, everybody has one. But how about code, and conlusions that are just plane wrong? If that grad student hadn't checked, just how more damage would go on, try, "it wouldn't stop." I'm beginning to wonder if this couldn't be done using some kind of "blind" study?