Call For Scientific Research Code To Be Released
Pentagram writes "Professor Ince, writing in the Guardian, has issued a call for scientists to make the code they use in the course of their research publicly available. He focuses specifically on the topical controversies in climate science, and concludes with the view that researchers who are able but unwilling to release programs they use should not be regarded as scientists. Quoting: 'There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"
Particularly if the research is publicly funded.
Great!
I'm getting somewhat tired from reading articles, where there is little or no information regarding program accuracy, total running time, memory used, etc.
And in some cases, i'm actually questioning whether the proposed algorithms actually work in practical situations...
If Pandora's box is destined to be opened, *I* want to be the one to open it.
seem to understand the very idea of scientific methods or processes, or the reasoning behind empiricism and careful management of precision.
It's a failure of education, no so much in science education, I think, as in philosophy. Formal and informal logic, epistemology and ontology, etc. People appear increasingly unable to understand why any of this matters and they essentialize the "answer" as always "true" for any given process that can be described, so science becomes an act of creativity by which one tries to create a cohesive narrative of process that arrives at the desired result. If it has no intrinsic breaks or obvious discontinuities, it must be true.
If another study that contradicts it also suffers from no breaks or discontinuities, they're both true! After all, everyone gets to decide what's true in their own heart!
STOP . AMERICA . NOW
Please apply Hanlon's razor before leaping to conspiracy theories. Or Occam's razor might inform you that a conspiracy among thousands of scientists is a highly improbable occurrence; look for a solution that doesn't involve a perfect lid of secrecy among a group of (frequently) socially inept people.
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
"Why should I make the data available to you, when your aim is to try and find something wrong with it"
-Prof. Jones CRU
My bet is there is a simple explanation...namely that scientists outside of computer science are too busy in their respective fields to know anything about code, or even care. The egocentric Slashdot-worldview strikes at the heart of logic yet again.
Nobody said conspiracy, just plain crappy code. You don't need a conspiracy if you are "trying to prove" something, your crappy code spits out what you want to see and you run with it. You just need plain old incompetence.
As it is written, the editorial is saying that if there is any error at all in a scientific computer program, the science is usually invalid. What a lot of bull hunky! If this were true, then scientific computing would be impossible, especially with regards to programs that run on Windows.
Scientists have been doing great science with software for decades. The editorial is full of it.
Not that it would be bad for scientists to make their software open source. And not that it would be bad for scientists to benefit from some extra QA.
So if scientists use MS Excel for part of their data analysis, MS should release the source code of Excel to prove that there's no bugs in it (that may favour one conclusion over another)
Soumds fair to me.
And if MS doesnt comply then all scientists have to switch to OO.org ?
If all scientists run their results through the same analytical software, using the same code as the first researcher, they are not providing confirmation, they are merely cloning the results. That doesn't give the original results either the confidence that they've been independently validated, or that they have been refuted.
What you end up with is no-one having any confidence in the results - as they have only ever been produced in one way and arguments thatt descend into a slanging match between individuals and groups of vested interests who try to "prove" that the same results show they are right and everyone else is wrong.
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
...so science becomes an act of creativity by which one tries to create a cohesive narrative of process that arrives at the desired result.
As someone who listens to Talk Radio on occasion, that sounds like you're creating a work of fiction. Rush and Hannity would have a whole week of shows based on that statement.
I would put it more like "piecing the narrative from the evidence" or "from facts" or something like that.
Scientists need to realize that if they're going to get public support, they really need to be very careful with their choice of wording. Like it or not, the scare mongers, and I mean scare mongers in the sense that there are people who are trying to scare folks into believing that Global Warming is some sort of wealth redistribution scheme by the socialists, are going to use any hint, real or not, that scientists are making up their findings.
I am suspect of the interface reference. Are they counting things where an enumeration got used as an int, or there was an implicit cast from a 32bit float to a 64bit one? From a recent TV show "A difference that makes no difference is no difference." Stepping back a bit there will be howls from OO/Functional/FSM zealots that look at a program and declare its inferior architecture, lack of maintainability etc. indicate its results are wrong. These are programs written to be run once to turn one set of data into a more understandable and concise one. A truth test set run through it is good enough, they don't need iso compliant, triply refactored, perfectly architectured code to get the right answer. I don't think any of my CS proffs would have cared about such inane drivel they barely paid attention to what language we each picked to solve the assignment in. My software engineering proff would have yelled about comment density and coding standards compliance, but I consider that a different discipline primarily applicable to widely used and/or safety critical code.
*However*
Keeping track of digit precision through a calculation isn't CS, its fundamental grade school science. That is only one step from forgetting to do unit analysis for a sanity check. If they are forgetting that, they are probably also not looking at numerical conditioning, or trying to get by with doubles when they need bignums. None of this is CS egocentrism, its stuff we learn in math and science courses.
refactor the law, its bloated, confusing and unmaintainable.
... and this is the problem. The move from direct government grants to research to "industry partnerships".
Well, (IMHO) if industry wants to make use of the resources of academic institutions, they need to understand the price: all the work becomes public property. I would go one step further, and say that one penny of public money in a project means it all becomes publicly available.
Those that want to keep their toys to themselves are free to do so, but not with public money.
As part of publication and peer review all data and providence of the data as well as any additional formula's, algorithms, and the exact code that was used to process the data should be placed online in a neutral holding area.
Neutral area needs to be independent and needs to show any updates and changes, preserving the original content in the process.
If your data and code (readable and compilable by other researchers) isn't available then peer review and reproduction of results is foolish. If you can't look in the black box then you can't trust it.
Ward
. Silence! Be thankful thy species is unpalatable! .
> This raises the question in what programming language the scientific code
> should be published.
The one it was written in. What should be published is the exact code that was compiled and run to generate the data. Think of it as similar to making the raw data available.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
> 600 lines of code in the main, no functions, no comments
Does that make it function incorrectly?
Looking pretty and being correct are orthogonal issues. Code can be well-structured but wrong, after all.
>So, while it is perfectly understandable that, say, physicists can't spend 5 years learning CS, at the very least they should be made aware that it requires trained people to write sane code and that they must hand the job to specialists, and spend their valuable time doing what the're skilled at.
And where will they get these specialists, and who will pay for them?
Add the overhead of explaining exactly what the code is supposed to do, and the fact that the specialist won't know the physics purpose of it all, and I wouldn't be suprised if there were more errors this way, not fewer. Most science code is fairly short, so all the fuss about "structured programming" (or is it OOP these days?) isn't as important.
Exactly this,
In my field (Hydrology and Statistics) writing the code in its final form is a long process of experimentation with different approaches and tests. Publishing it before one is truly done with a subject is the same as inviting other people to "scoop" you.
Also, many people seem to fail to understand the protectionism of scientists. Of course we like to build on other peoples results, and see the field grow. However, if we make this too easy (like handing them our code), they just might scoop us. Hence, we make it a little bit harder to ensure that we can feed ourselves as well.
Finally, about the errors. I have yet to see a single piece of error-free scientific code, however, the results are rigorously tested with an array of tests. The chances of these all coming up the same over a coding error is small.
The more important the research, the larger the item under study, the more rigorous the investigation should be, the more carefully the data should be checked. This isn't just for public policy reasons but for general scientific understanding reasons. If your theory is one that would change the way we understand particle physics, well then it needs to be very thoroughly verified before we say "Yes, indeed this is how particles probably work, we now need to reevaluate tons of other theories."
So something like this, both because of the public policy/economic implications and the general understanding of our climate, should be subject to extreme scrutiny. Now please note that doesn't mean saying "Look this one thing is wrong so it all goes away and you can't ever propose a similar theory again!" However it means carefully examining all the data, all the assumptions, all the models and finding all the problem with them. It means verifying everything multiple times, looking at any errors any deviations and figuring out why they are there and if they impact the result and so on.
Really, that is how science should be done period. The idea of strong empiricism is more or less trying to prove your theory wrong over and over again, and through that process becoming convinced it is the correct one. You look at your data and say "Well ok, maybe THIS could explain it instead," and test that. Or you say "Well my theory predicts if X happens Y will happen, so let's try X and if Y doesn't happen, it's wrong." You show your theory is bulletproof not by making sure it is never shot at, but by shooting at it yourself over and over and showing that nothing damages it.
However that this process is done right becomes more important the bigger the issue is. If you aren't right on a theory that relates to migratory habits of a sub species of bird in a single state, ok well that probably doesn't have a whole lot of wider implications for scientific understanding, or for the way the world is run. However if you are wrong on your theory of how the climate works, well that has a much wider impact.
Scrutiny is critical to science, it is why science works. Science is all about rejecting the ideas that because someone in authority said it, it must be true, or that a single rigged demonstration is enough to hang your hat on. It is all about testing things carefully and figuring out what works, and what doesn't.
Only if the real programmers out there promise to be nice to us scientists.
Most scientists will know a lot about, well, science... but not much about writing code or optimizing code.
Like my scripts. All correct, all working... lots of formulas... but probably a horribly inefficient way to calculate what I need. :-)
the last thing I need is someone to come to me and tell me that the outcome is correct but that my code sucks.
(And no, I am not interested in a course to learn coding - unless it's a 1-week crash course).
Comment removed based on user account deletion
It seems to me that what's important is the theory being modeled, the algorithms used to model it, and of course the data. The code itself isn't really useful for replicating an experiment, because it's just a particular - and possibly faulty - implementation of the model and as such is akin to the particular lab bench practices that might implement a standard protocol. Replicating a modeling experiment should involve using - and writing, if necessary - code that implements the model the original investigators intended to implement, but distinct from that which they actually used.
Running the same code on the same data demonstrates very little, and finding bugs in the original code tells you nothing about what results would/should have been achieved had the model been implemented correctly. But of course it's great for throwing stones and "discrediting" a result without actually adding anything constructive to the issue at hand.
I hate to break it to you but all programming is highly specialized. Climatology is in no way special in this regard.
Neither do programmers have to understand the abstract model of the program to write it or evaluate it. The vast majority of professional programmers do not understand the abstract model of the code they create. You do not have to be a high-level accountant to write corporate accounting software and you don't have to be a doctor to write medical software. Most programmers spend most of their time implementing models created by non-programmers from fields of which the programmers have no detailed knowledge.
Does that mean that programmers can't spot crappy code just because they don't understand the details of the model? No, it does not. Most software errors don't arise from the model but from sloppy practices in the management of the software project itself. An experienced programmer doesn't even have to know the language of project to see that it's creation and maintenance was incompetently handled.
You don't have to be a climatologist to know that the CRU software was utter crap that would produce sound outputs only by divine intervention. For any experienced programmer, it was immediately obvious that it was a great reeking gob of amateur coding with no structure, no plan and no standards. In my experience, most scientific software is like the CRU software. It evolves in an ad hoc manner over many years with no governing organizational structure.
Commercial software developers have created a wide range of tools and procedures to manage large, vital projects. In the main, scientist use none of these tools and most of them appear unaware they even exist much less why and when they are needed. As a result most scientific software project management is completely amateurish. If most scientific software were written for commercial applications, the developers would be sued or imprisoned for fraud.
Scientist tend to be arrogant and dismissive of the work of others especially those who work in the commercial sector. You believe that because you understand climatology that you therefore understand all the tools you are using. Well you don't. You think that because no one can understand your abstract model that therefore they cannot find significant errors in your code. Well, they can. You think we should reengineer our entire civilization based on your unquestioned and unexamined computerized ivory tower auguries.
Well, we won't.
Your just going to have to suck it up and withstand the at least the same scrutiny we give important commercial software.