Call For Scientific Research Code To Be Released
Pentagram writes "Professor Ince, writing in the Guardian, has issued a call for scientists to make the code they use in the course of their research publicly available. He focuses specifically on the topical controversies in climate science, and concludes with the view that researchers who are able but unwilling to release programs they use should not be regarded as scientists. Quoting: 'There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"
Particularly if the research is publicly funded.
seem to understand the very idea of scientific methods or processes, or the reasoning behind empiricism and careful management of precision.
It's a failure of education, no so much in science education, I think, as in philosophy. Formal and informal logic, epistemology and ontology, etc. People appear increasingly unable to understand why any of this matters and they essentialize the "answer" as always "true" for any given process that can be described, so science becomes an act of creativity by which one tries to create a cohesive narrative of process that arrives at the desired result. If it has no intrinsic breaks or obvious discontinuities, it must be true.
If another study that contradicts it also suffers from no breaks or discontinuities, they're both true! After all, everyone gets to decide what's true in their own heart!
STOP . AMERICA . NOW
The scientific community needs to get as far as we can from the policies of companies like Gaussian Inc., who will ban you and your institution for simply publishing any sort of comparative statistics on calculation time, accuracy, etc. from their computational chemistry software.
I can't imagine what they'd do to you if you started sorting through their code...
Hey mate, spare a sig?
One significant figure?
What? Scientists showing their work for peer-review? It's MADNESS I tell you. MADNESS !
"Why should I make the data available to you, when your aim is to try and find something wrong with it"
-Prof. Jones CRU
Yes and no. Which assertion do you think more probable:
1- "These are not the desired results. Check your code".
2- "These are the desired results. Check your code".
No conspiracy, but a conspiracy-like end result.
The Cloud - because you don't care if your apps and data are up in the air.
I'm perfectly OK with the fact that their job is science and not coding, but would they go to the satellite assembly guys and start gluing parts at random ?
Non-Linux Penguins ?
As it happens, my students and I are about to release a fairly specialized code - we discussed license terms, and eventually settled on the BSD (and explicitly avoided the GPL), which requires "citation" but otherwise leaves anyone free to use it.
That said, writing a scientific code can involve a good deal of work, but the "payoff" usually comes in the form of results and conclusions, rather than the code itself. In those circumstances, there is a sound argument for delaying any code release until you have published the results you hoped to obtain when you initiated the project, even if these form a sequence of papers (rather than insisting on code release with the first published results)
Thirdly, in many cases scientists will share code with colleagues when asked politely, even if they are not in the public domain.
Fourthly, I fairly regularly spot minor errors in numerical calculations performed by other groups (either because I do have access to the source, or because I can't reproduce their results) -- in almost all cases these do not have an impact on their conclusions, so while the "error count" can be fairly high, the number of "wrong" results coming from bad code is overestimated by this accounting.
If all scientists run their results through the same analytical software, using the same code as the first researcher, they are not providing confirmation, they are merely cloning the results. That doesn't give the original results either the confidence that they've been independently validated, or that they have been refuted.
What you end up with is no-one having any confidence in the results - as they have only ever been produced in one way and arguments thatt descend into a slanging match between individuals and groups of vested interests who try to "prove" that the same results show they are right and everyone else is wrong.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
And then they fix the bug and either...
A. The results change, thus indicating that the bug was important in some way. In this case, fixing the bug gained us not only silencing the critics, but improving our understanding.
or
B. The results don't change, thus indicating that the bug, while still a bug, was not important to the final result. In this case, we've fixed a bug that the critics were using as a banner, and that they were mistaken in it's importance. We don't get the improved understanding, but we do get a chance to politely say STFU to the more vocal/less qualified critics.
Either way looks like win/win to me.
Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
... and this is the problem. The move from direct government grants to research to "industry partnerships".
Well, (IMHO) if industry wants to make use of the resources of academic institutions, they need to understand the price: all the work becomes public property. I would go one step further, and say that one penny of public money in a project means it all becomes publicly available.
Those that want to keep their toys to themselves are free to do so, but not with public money.
Nonsense, they're not trying to produce code, they're trying to produce science. It doesn't matter how ugly the code is, or how inefficient, as long as it produces correct answers. Since software engineering "best practices" seem to change every week (and do not prove program correctness in any case), what are they supposed to do, spend huge amounts of time learning as much as a professional software engineer would? Do you do that for all the tools you use?
Does anyone have any evidence that the code is *wrong*? I.e. does it actually produce significantly wrong answers? I suspect not - this is just the latest FUD-spreading trick.
This is just typical programmer "when your tool's a hammer" mentality. Software's not the most important thing in the world, and science has better ways to verify correctness - have several independent analyses of the same thing for example, or different ways of measuring the same thing to check for consistency.
Scientists need to realize that if they're going to get public support, they really need to be very careful with their choice of wording. Like it or not, the scare mongers, and I mean scare mongers in the sense that there are people who are trying to scare folks into believing that Global Warming is some sort of wealth redistribution scheme by the socialists, are going to use any hint, real or not, that scientists are making up their findings.
Scare mongers? Let's take a look at some of these "hints" that scientists are making up their findings. From May 7, 2002
Dozens of mountain lakes in Nepal and Bhutan are so swollen from melting glaciers that they could burst their seams in the next five years and devastate many Himalayan villages, warns a new report from the United Nations.
From January 17, 2010:
In the past few days the scientists behind the warning have admitted that it was based on a news story in the New Scientist, a popular science journal, published eight years before the IPCC's 2007 report.
It has also emerged that the New Scientist report was itself based on a short telephone interview with Syed Hasnain, a little-known Indian scientist then based at Jawaharlal Nehru University in Delhi.
Hasnain has since admitted that the claim was "speculation" and was not supported by any formal research.
Do I need to pull the quotes that claim NY and Florida will be underwater?
As for the "fear mongers" saying that GW is a socialist wealth redistribution scheme.
Some officials from the United States, Britain and Japan say foreign-aid spending can be directed at easing the risks from climate change. The United States, for example, has promoted its three-year-old Millennium Challenge Corporation as a source of financing for projects in poor countries that will foster resilience. It has just begun to consider environmental benefits of projects, officials say.
Industrialized countries bound by the Kyoto Protocol, the climate pact rejected by the Bush administration, project that hundreds of millions of dollars will soon flow via that treaty into a climate adaptation fund.
Strange. When did Rush and Hannity start writing for the NY Times?
There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
This is hugely worrying when you realise that just one error -- just one -- will usually invalidate a computer program.
Back in the 1970s, a bunch of CompSci guys at the university where I was a grad student did a software study with interesting results. Much of the research computing was done on the university's mainframe, and the dominant language of course was Fortran. They instrumented the Fortran compiler so that for a couple of months, it collected data on numeric overflows, including which overflows were or weren't detected by the code. They published the results: slightly over half the Fortran jobs had undetected overflows that affected their output.
The response to this was interesting. The CS folks, as you might expect, were appalled. But among the scientific researchers, the general response was that enabling overflow checking slowed down the code measurably, so it shouldn't be done. I personally knew a lot of researchers (as one of the managers of an inter-departmental microcomputer lab that was independent of the central mainframe computer center). I asked a lot of them about this, and I was appalled to find that almost every one of them agreed that overflow checking should be turned off if it slowed down the code. The mainframe's managers reported that almost all Fortran compiles had overflow checking turned off. Pointing out that this meant that fully half of the computed results in their published papers were wrong (if they used the mainframe) didn't have any effect.
Our small cabal that ran the microprocessor lab reacted to this by silently enabling all error checking in our Fortran compiler. We even checked with the vendor to make sure that we'd set it up so that a user couldn't disable the checking. We didn't announce that we had done this; we just did it on our own authority. It was also done in a couple of other similar department-level labs that had their own computers (which was rare at the time). But the major research computer on campus was the central mainframe, and the folks running it weren't interested in dealing with the problem.
It taught us a lot about how such things are done. And it gave us a healthy level of skepticism about published research data. It was a good lesson on why we have an ongoing need to duplicate research results independently before believing them.
It might be interesting to read about studies similar to this done more recently. I haven't seen any, but maybe they're out there.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.