Slashdot Mirror


Call For Scientific Research Code To Be Released

Pentagram writes "Professor Ince, writing in the Guardian, has issued a call for scientists to make the code they use in the course of their research publicly available. He focuses specifically on the topical controversies in climate science, and concludes with the view that researchers who are able but unwilling to release programs they use should not be regarded as scientists. Quoting: 'There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"

10 of 505 comments (clear)

  1. About time! by sackvillian · · Score: 5, Informative

    The scientific community needs to get as far as we can from the policies of companies like Gaussian Inc., who will ban you and your institution for simply publishing any sort of comparative statistics on calculation time, accuracy, etc. from their computational chemistry software.

    I can't imagine what they'd do to you if you started sorting through their code...

    --
    Hey mate, spare a sig?
  2. Re:This is not science. by Idiot+with+a+gun · · Score: 5, Insightful

    Irrelevant. If you can't take some trolls, maybe you shouldn't be in such a controversial topic. The accuracy of your data is far more significant than your petty emotions, especially if your data will be affecting trillions of dollars worldwide.

  3. Re:Seems reasonable by fuzzyfuzzyfungus · · Score: 5, Insightful

    The "The public deserves access to the research it pays for" position seems so self-evidently reasonable that further debate is simply unnecessary(though, unfortunately, the journal publishers have a strong financial interest in arguing the contrary, so the "debate" actually continues, against all reason). Similarly, the idea that software falls somewhere in the "methods" section and is as deserving of peer review as any other part of the research seems wholly reasonable. Again, I suspect that getting at the bits written by scientists, with the possible exception of the ones working in fields(oil geology, drug development, etc.) that also have lucrative commercial applications, will mainly be a matter of developing norms and mechanisms around releasing it. Academic scientists are judged, promoted, and respected largely according to how much(and where) they publish. Getting them to publish more probably won't be the world's hardest problem. The more awkward bit will be the fact that large amounts of modern scientific instrumentation, and some analysis packages, include giant chunks of closed source software; but are also worth serious cash. You can absolutely forget getting a BSD/GPL release, and even a "No commercial use, all rights reserved, for review only, mine, not yours." code release will be like pulling teeth.

    On the other hand, I suspect some of this hand-wringing of being little more than special pleading. "This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program." Right. I know that I definitely live in the world where all my important stuff: financial transactions, recordkeeping, product design, and so forth are carried out by zero-defect programs, delivered to me over the internet by routers with zero-defect firmware, and rendered by a variety of endpoint devices running zero-defect software on zero-defect OSes. Yup, that's exactly how it works. Outside of hyper-expensive embedded stuff, military avionics, landing gear firmware, and FDA approved embedded medical widgets(that still manage to Therac people from time to time), zero-defect is pure fantasy. A very pleasant pure fantasy, to be sure; but still fantasy. The revelation that several million lines of code, in a mixture of Fotran and C, most likely written under time and budget constraints, isn't exactly a paragon of code quality seems utterly unsurprising, and utterly unrestricted to scientific areas. Code quality is definitely important, and science has to deal with the fact that software errors have the potential to make a hash of their data; but science seems to attract a whole lot more hand-wringing when its conclusions are undesirable...

  4. Not a good idea by petes_PoV · · Score: 5, Insightful
    The point about reproducible experiments is not to provide your peers with the exact same equipment you used - then they'd get (probably / hopefully) the exact same results. The idea is to provide them with enough information so that they can design their own experiements to [b]measure the same things[/b] and then to analyze their results to confirm or disprove your conclusions.

    If all scientists run their results through the same analytical software, using the same code as the first researcher, they are not providing confirmation, they are merely cloning the results. That doesn't give the original results either the confidence that they've been independently validated, or that they have been refuted.

    What you end up with is no-one having any confidence in the results - as they have only ever been produced in one way and arguments thatt descend into a slanging match between individuals and groups of vested interests who try to "prove" that the same results show they are right and everyone else is wrong.

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
  5. Re:Seems reasonable by apoc.famine · · Score: 5, Insightful

    As someone doing a PhD in a climate related area, I can see both sides of the issue. The code I work with is freely and openly available. However, 99.9% or more of the people in the world wouldn't be able to do a damn thing with it. I look at my classmates - we're all in the same degree program, yet probably only 5% of them would really be able to understand and do anything meaningful with the code I'm using.
     
    Why? We're that specialized. Here, I'm talking 5% of people studying atmospheric and oceanic sciences being able to make use of my code without taking several years to get up to speed. What's the incentive to release it? Why bother with the effort, when the audience is soooo small?
     
    Release the code, and if some dumbass decides to dig into it, you either are in the position of having to waste time answering ignorant questions, or you ignore them, giving them ammo for "teh code is BOGUS!!!!" Far easier to just keep the code in-house, and hand it out to the few qualified researchers who might be interested. Unsurprisingly, a lot of scientific code is handled this way.
     
    However, I do very much believe in completely transparent discourse. My research group has two major comparison studies of different climate models. We pulled in data from seven models from seven different universities, and analyzed the differences in CO2 predictions, among other things. The data was freely and openly given to us by these other research groups, and they happily contributed information about the inner workings of their models. This, in my book, is what it's all about. The relevant information was shared with people in a position to understand it and analyze it.
     
    It'd be a whole different story if the public wasn't filled with a bunch of ignorant whack-jobs, trying to smear scientists. When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox. If you want access to the data and the code, go to a school and study the stuff. All the doors are open then. The price of admission is just having some vague idea wtf you're talking about.

    --
    Velociraptor = Distiraptor / Timeraptor
  6. Re:Seems reasonable by Sir_Sri · · Score: 5, Informative

    And it's not like the people writing this code are, or were trained in computer science, assuming computer science even existed when they were doing the work.

    Having done an undergrad in theoretical physics, but being in a PhD in comp sci now I will say this: The assumption in physics when I graduated in 2002 was that by second year you knew how to write code, whether they've taught you or not. Even more recently it has still been an assumption that you'll know how to write code, but they try and give you a bare minimum of training. And of course it's usually other physical scientists who do the teaching, not computer scientists, so bad information (or out of date information or the like) is propagated along. That completely misses the advanced topics in computer science which cover a lot more of the software engineering sort of problems. Try explaining to a physicist how a 32 or 64 bit float can't exactly replicate all of the numbers they think it can and watch half of them have their eyes gloss over for half an hour. And then the problem is what do you do about it?

    Then you get into a lab (uni lab). Half the software used will have been written in F77 when it was still pretty new, and someone may have hacked some modifications in here and there over the years. Some of these programs last for years, span multiple careers and so on. They aren't small investments but have had grubby little grad student paws on them for a long time, in addition to incompetent professor hands.

    None of scientific computing is done particularly well, they expect people with no training in software development to do the work, assuming it was done when software development existed, and there isn't the funding to pay people who might do it properly.

    On top of all that it's not like you want to release your code to the public right away anyway. As a scientist you're in competition with groups around the world to publish first. You describe in your paper the science you think you implemented, someone else who wants to verify your results gets to write a new chunk of code which they think is the same science and you compare. Giving out a scientists code for inspection means someone else will have a working software platform to publish papers based on your work, and that's not so good for you. For all the talk of research for the public good, ultimately your own good, of continuing to publish (to get paid) trumps a public need. That's a systematic problem, and when you're competing with a research group in brazil, and you're in canada their rules are different than yours, and so you keep things close to the chest.

  7. Re:This is not science. by acoustix · · Score: 5, Insightful

    "Why should I make the data available to you, when your aim is to find something wrong with it?"

    That used to be what Science was. Of course, that was when truth was the goal.

    --
    "A plan fiendishly clever in its intricacies"- Homer Simpson
  8. Re:Seems reasonable by TheTurtlesMoves · · Score: 5, Insightful

    Your not the F***** pope. You don't get to tell people they are not worthy enough to look at your/code data. You don't like it, don't do science. But this attitude of only cooperating with a "vetoed" group of people is causing far more problems than you think you are solving by doing it. You are not as smart as you think you are.

    Want to make a claim/suggestion that has very real economic and political ramifications for everyone, you provide the data/models for everyone. Otherwise, have a nice hot cup of shut the frak up.

    --
    The Grey Goo disaster happened 3 billion years ago. This rock is covered in self replicating machines!
  9. Re:Seems reasonable by bmajik · · Score: 5, Insightful

    However, 99.9% or more of the people in the world wouldn't be able to do a damn thing with it. I look at my classmates - we're all in the same degree program, yet probably only 5% of them would really be able to understand and do anything meaningful with the code I'm using.

    I think the world is very lucky that Linus Torvalds wasn't as narrow-sighted and conceited as you are.

    Why? We're that specialized. Here, I'm talking 5% of people studying atmospheric and oceanic sciences being able to make use of my code without taking several years to get up to speed. What's the incentive to release it? Why bother with the effort, when the audience is soooo small?

    Release the code, and if some dumbass decides to dig into it, you either are in the position of having to waste time answering ignorant questions, or you ignore them, giving them ammo for "teh code is BOGUS!!!!" Far easier to just keep the code in-house, and hand it out to the few qualified researchers who might be interested. Unsurprisingly, a lot of scientific code is handled this way.

    However, I do very much believe in completely transparent discourse. My research group has two major comparison studies of different climate models. We pulled in data from seven models from seven different universities, and analyzed the differences in CO2 predictions, among other things. The data was freely and openly given to us by these other research groups, and they happily contributed information about the inner workings of their models. This, in my book, is what it's all about. The relevant information was shared with people in a position to understand it and analyze it.

    It'd be a whole different story if the public wasn't filled with a bunch of ignorant whack-jobs, trying to smear scientists. When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox. If you want access to the data and the code, go to a school and study the stuff. All the doors are open then. The price of admission is just having some vague idea wtf you're talking about.

    Have you heard of "ivory tower"? You're it.

    Your position basically boils down to this: "unless you read all the same things I read, talked to all the same people I talked to, went to all the same schools I did... you're not qualified to talk to me".

    That is _the_ definition of monocultural isolationism.. i.e. the Ivory Tower of Academia problem.

    Here's the problem: if your requirement is that anyone you consider a "peer" must have had all of the same inputs and conditionings that you had... what basis do you have for allowing them to come out of the other side of that machine with a non-tainted point of view?

    As a specific counterpoint to your way of thinking:

    My dad is an actuary.. one of the best in the world. He regularly meets with the top handful of insurance regulators in foreign governments. He manages the risk of _billions_ of dollars. The maths involved in actuarial science embarass nearly any other branch of applied mathematics. I have an undergraduate math degree and I could only understand his problem domain in the crudest, rough-bounding box sort of fashion. Furthermore, he's been a programmer since the System/360 days.

    Yet his code, while there is a lot of it, is something I am definitely able to help him with. We talk about software engineering and specific technical problems he is having on a frequent basis.

    You don't need to be a problem domain expert in order to demonstrate value when auditing software.

    Furthermore, as a professional software tester, I happen to find that occasionally, not over-familiarizing myself with the design docs and implementation details too early allow me to ask better "reset" questions when doing design and code reviews. "Why are you doing this?" And as the developer talks me through it, they understand how shaky their assumptions are. If I had been "travelling" with them in lock step

    --
    My opinions are my own, and do not necessarily represent those of my employer.
  10. Re:Seems reasonable by bmajik · · Score: 5, Insightful

    there are well funded lobby groups and others with too much time on their hand looking for ANYTHING that is wrong.

    Errors are only errors if they are reported by the "right" people?

    Do you want to know how many questions Linus Torvalds has answered for me? Zero.

    I actually _have_ gotten personal responses from Theo DeRaadt on some OpenBSD issues but they all have the general form of "you're not interesting, don't waste my time".

    Nevertheless, I rely on OpenBSD. The fact that Theo has neither the time nor the interest in having a deep meaningful conversation with me about his code neither changes the quality of his code nor prevents him from releasing every 6 months, on schedule.

    I don't think that there is an expectation that scientists stop doing their day jobs to do software support for people. I think there is an expectation that publicly funded research used to set public policy be easily available to all comers.

    I'm a bit frustrated by the apparent contradiction. For the first time perhaps in history in the USA, you have armchair folks trying to do technical audits of scientific tools, research, and publications -- for free.

    I thought the "normal" problem in America is that the population is too apathetic to care and too stupid to provide any critical analysis. And yet we see this happening more and more frequently and the climate-science establishment is circling the wagons instead of celebrating the fact that there are a handful of people that for once give a damn about interesting research tools and methods.

    I must concede that there are some downsides to discussing your opinions and findings with others: When people disagree with you, it ends up taking some of your time.

    --
    My opinions are my own, and do not necessarily represent those of my employer.