Slashdot Mirror


Call For Scientific Research Code To Be Released

Pentagram writes "Professor Ince, writing in the Guardian, has issued a call for scientists to make the code they use in the course of their research publicly available. He focuses specifically on the topical controversies in climate science, and concludes with the view that researchers who are able but unwilling to release programs they use should not be regarded as scientists. Quoting: 'There is enough evidence for us to regard a lot of scientific software with worry. For example Professor Les Hatton, an international expert in software testing resident in the Universities of Kent and Kingston, carried out an extensive analysis of several million lines of scientific code. He showed that the software had an unacceptably high level of detectable inconsistencies. For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"

505 comments

  1. Seems reasonable by NathanE · · Score: 4, Insightful

    Particularly if the research is publicly funded.

    1. Re:Seems reasonable by fuzzyfuzzyfungus · · Score: 5, Insightful

      The "The public deserves access to the research it pays for" position seems so self-evidently reasonable that further debate is simply unnecessary(though, unfortunately, the journal publishers have a strong financial interest in arguing the contrary, so the "debate" actually continues, against all reason). Similarly, the idea that software falls somewhere in the "methods" section and is as deserving of peer review as any other part of the research seems wholly reasonable. Again, I suspect that getting at the bits written by scientists, with the possible exception of the ones working in fields(oil geology, drug development, etc.) that also have lucrative commercial applications, will mainly be a matter of developing norms and mechanisms around releasing it. Academic scientists are judged, promoted, and respected largely according to how much(and where) they publish. Getting them to publish more probably won't be the world's hardest problem. The more awkward bit will be the fact that large amounts of modern scientific instrumentation, and some analysis packages, include giant chunks of closed source software; but are also worth serious cash. You can absolutely forget getting a BSD/GPL release, and even a "No commercial use, all rights reserved, for review only, mine, not yours." code release will be like pulling teeth.

      On the other hand, I suspect some of this hand-wringing of being little more than special pleading. "This is hugely worrying when you realise that just one error — just one — will usually invalidate a computer program." Right. I know that I definitely live in the world where all my important stuff: financial transactions, recordkeeping, product design, and so forth are carried out by zero-defect programs, delivered to me over the internet by routers with zero-defect firmware, and rendered by a variety of endpoint devices running zero-defect software on zero-defect OSes. Yup, that's exactly how it works. Outside of hyper-expensive embedded stuff, military avionics, landing gear firmware, and FDA approved embedded medical widgets(that still manage to Therac people from time to time), zero-defect is pure fantasy. A very pleasant pure fantasy, to be sure; but still fantasy. The revelation that several million lines of code, in a mixture of Fotran and C, most likely written under time and budget constraints, isn't exactly a paragon of code quality seems utterly unsurprising, and utterly unrestricted to scientific areas. Code quality is definitely important, and science has to deal with the fact that software errors have the potential to make a hash of their data; but science seems to attract a whole lot more hand-wringing when its conclusions are undesirable...

    2. Re:Seems reasonable by Dr_Terminus · · Score: 1

      Most, if not all work done at US universities or government institutions is required to be made public. In fact, you can already find climate models from the main US based investigators:

      http://www.ncar.ucar.edu/tools/models/
      http://www.giss.nasa.gov/tools/

    3. Re:Seems reasonable by upmufa · · Score: 1

      In the case of publicly-funded research, I'd first like to compel scientists to publish their work in a free-to-read format. Many journals do not make the articles available to the public. It seems a little wrong that public money goes to pay for research and then the results of the research -- the journal articles -- aren't available to the public.

    4. Re:Seems reasonable by kenp2002 · · Score: 0, Flamebait

      The public doesn't fund any research you insensitive clod the goverment does. It's the goverment's money not yours...

      Seriously when in any debate with a politician have you heard "We need to be careful how we spend THEIR money..." Never, it is always "OUR money".

      Now go pay homage to dear leader! OBEY!

      --
      -=[ Who Is John Galt? ]=-
    5. Re:Seems reasonable by OldSoldier · · Score: 1

      I disagree to the extent that seeing the software is a red-herring.

      If there is code that research institutions use again and again then, to the extent that that code can be released it should be. Many eyes on code that many users use will reduce errors. However, many eyes on code that only a few users use may have the opposite effect.

      The alternative and in some cases better solution is for the scientists to release the raw data and their processed data. That way other people running different software on the raw data may end up with different processed results.

    6. Re:Seems reasonable by apoc.famine · · Score: 5, Insightful

      As someone doing a PhD in a climate related area, I can see both sides of the issue. The code I work with is freely and openly available. However, 99.9% or more of the people in the world wouldn't be able to do a damn thing with it. I look at my classmates - we're all in the same degree program, yet probably only 5% of them would really be able to understand and do anything meaningful with the code I'm using.
       
      Why? We're that specialized. Here, I'm talking 5% of people studying atmospheric and oceanic sciences being able to make use of my code without taking several years to get up to speed. What's the incentive to release it? Why bother with the effort, when the audience is soooo small?
       
      Release the code, and if some dumbass decides to dig into it, you either are in the position of having to waste time answering ignorant questions, or you ignore them, giving them ammo for "teh code is BOGUS!!!!" Far easier to just keep the code in-house, and hand it out to the few qualified researchers who might be interested. Unsurprisingly, a lot of scientific code is handled this way.
       
      However, I do very much believe in completely transparent discourse. My research group has two major comparison studies of different climate models. We pulled in data from seven models from seven different universities, and analyzed the differences in CO2 predictions, among other things. The data was freely and openly given to us by these other research groups, and they happily contributed information about the inner workings of their models. This, in my book, is what it's all about. The relevant information was shared with people in a position to understand it and analyze it.
       
      It'd be a whole different story if the public wasn't filled with a bunch of ignorant whack-jobs, trying to smear scientists. When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox. If you want access to the data and the code, go to a school and study the stuff. All the doors are open then. The price of admission is just having some vague idea wtf you're talking about.

      --
      Velociraptor = Distiraptor / Timeraptor
    7. Re:Seems reasonable by Sir_Sri · · Score: 5, Informative

      And it's not like the people writing this code are, or were trained in computer science, assuming computer science even existed when they were doing the work.

      Having done an undergrad in theoretical physics, but being in a PhD in comp sci now I will say this: The assumption in physics when I graduated in 2002 was that by second year you knew how to write code, whether they've taught you or not. Even more recently it has still been an assumption that you'll know how to write code, but they try and give you a bare minimum of training. And of course it's usually other physical scientists who do the teaching, not computer scientists, so bad information (or out of date information or the like) is propagated along. That completely misses the advanced topics in computer science which cover a lot more of the software engineering sort of problems. Try explaining to a physicist how a 32 or 64 bit float can't exactly replicate all of the numbers they think it can and watch half of them have their eyes gloss over for half an hour. And then the problem is what do you do about it?

      Then you get into a lab (uni lab). Half the software used will have been written in F77 when it was still pretty new, and someone may have hacked some modifications in here and there over the years. Some of these programs last for years, span multiple careers and so on. They aren't small investments but have had grubby little grad student paws on them for a long time, in addition to incompetent professor hands.

      None of scientific computing is done particularly well, they expect people with no training in software development to do the work, assuming it was done when software development existed, and there isn't the funding to pay people who might do it properly.

      On top of all that it's not like you want to release your code to the public right away anyway. As a scientist you're in competition with groups around the world to publish first. You describe in your paper the science you think you implemented, someone else who wants to verify your results gets to write a new chunk of code which they think is the same science and you compare. Giving out a scientists code for inspection means someone else will have a working software platform to publish papers based on your work, and that's not so good for you. For all the talk of research for the public good, ultimately your own good, of continuing to publish (to get paid) trumps a public need. That's a systematic problem, and when you're competing with a research group in brazil, and you're in canada their rules are different than yours, and so you keep things close to the chest.

    8. Re:Seems reasonable by Troed · · Score: 3, Informative

      Your comment clearly shows you know nothing about software. I'm able to audit your source code without having a slightest clue as to what domain it's meant to be run in.

    9. Re:Seems reasonable by SirWhoopass · · Score: 1

      Good luck.

      I do research at a public university in the US. A mix of privately funded and publicly funded work. In most cases the university retains a copyright to any work I do, and will aggressively defend it. I'd gladly share my work, but I am often forbidden to do so.

      I have had "colleagues" not only in the same university, but in the same research program (different labs) flat-out deny a request to share software and insist that I must pay them for it

    10. Re:Seems reasonable by TheTurtlesMoves · · Score: 5, Insightful

      Your not the F***** pope. You don't get to tell people they are not worthy enough to look at your/code data. You don't like it, don't do science. But this attitude of only cooperating with a "vetoed" group of people is causing far more problems than you think you are solving by doing it. You are not as smart as you think you are.

      Want to make a claim/suggestion that has very real economic and political ramifications for everyone, you provide the data/models for everyone. Otherwise, have a nice hot cup of shut the frak up.

      --
      The Grey Goo disaster happened 3 billion years ago. This rock is covered in self replicating machines!
    11. Re:Seems reasonable by TheTurtlesMoves · · Score: 1

      Many of us are trying to favor open access journals so that it is free to access. Its not easy however, since they don't get the bean counter points that you get from nature/science.

      --
      The Grey Goo disaster happened 3 billion years ago. This rock is covered in self replicating machines!
    12. Re:Seems reasonable by khayman80 · · Score: 1

      The existence of a bug can be established by a code auditor without a science background. However, establishing that said bug affects the scientific results does require specialized knowledge in many instances.

    13. Re:Seems reasonable by mjwalshe · · Score: 2, Insightful

      Well up to a point however its the model you have to validate. Years ago I helped write some code to model the behavior of pumps and one of the tests we did was to run the model and compare it to real life and also run the model in reverse to see if we got back to the same point we started from. With out knowing a ton about CS/Mathamatics and the modeling methods used and access to the origional data a non specialist is not going to get very far.

    14. Re:Seems reasonable by Hado · · Score: 1

      And so I did.

    15. Re:Seems reasonable by Compholio · · Score: 1

      Again, I suspect that getting at the bits written by scientists, with the possible exception of the ones working in fields(oil geology, drug development, etc.) that also have lucrative commercial applications, will mainly be a matter of developing norms and mechanisms around releasing it.

      The problem is that most institutions will not let us release our materials because they're concerned that there might be a lucrative commercial application and we're just not aware of it yet. At my institution there's an entire chapter of the handbook related to the distribution of research materials, and several sections on the appropriate procedure for getting approval to release those materials. Unless you have tenure you have to follow those rules or you will get fired.

    16. Re:Seems reasonable by Anonymous Coward · · Score: 1, Informative
      No, universities are not required to make their work public. Congress changed the public domain requirements some time ago, so contractors who are getting federal funds can keep their work secret, patent it, and hold copyright to it. They only are required to provide whatever results the contract demands, and they hold the rights to some of those results.

      And if you're satisfied with how open climate work is, perhaps you can explain the numerous temperature adjustments which are being done. Pick five climate stations and show where the reasons for all the adjustments are published.

    17. Re:Seems reasonable by TheKidWho · · Score: 1

      What exactly are you going to do with that research paper?

    18. Re:Seems reasonable by apoc.famine · · Score: 4, Insightful

      Of all the stuff that's important in scientific computing, the code is probably one of the more minor parts. The science behind the code is drastically more important. If the code is solid and the science is crap, it's useless. Likewise, the source data that's used to initialize a model is far more important than the code. If that's bogus, the entire thing is bogus.
       
      Sure, you could audit it, and find shit that's not done properly. At the same time, you wouldn't have a damn clue what it's supposed to be doing. Suppose I'm adding a floating point to an integer. Is that a problem? Does it ruin everything? Or is it just sloppy coding that doesn't make a difference in the long run? Understanding what the code is doing is required for you to do an audit which will produce any useful results.
       
      Unless you're working under the fallacy that all code must be perfect and bug free. Nobody gives a shit if you audit software and produce a list of bugs. What's important is that you be able to quantify how important those bugs are. And you can't do that without knowing what the software is supposed to be doing. When it's something a complicated as fluid dynamics or biological systems, a code audit by a CS person is pretty much worthless.

      --
      Velociraptor = Distiraptor / Timeraptor
    19. Re:Seems reasonable by bitingduck · · Score: 1

      The more awkward bit will be the fact that large amounts of modern scientific instrumentation, and some analysis packages, include giant chunks of closed source software; but are also worth serious cash. You can absolutely forget getting a BSD/GPL release, and even a "No commercial use, all rights reserved, for review only, mine, not yours." code release will be like pulling teeth.

      There are certainly ways to deal with this, too.

      In notes (and often in the published papers if it's relevant) to publish model numbers of instruments being used, and if it's something with different versions of firmware or software, that gets noted, too. Even having two ostensibly identical instruments one normally notes which one is being used so that anomalies can get traced more easily.

      As far as validating instruments, even for fully calibrated instruments it's not hard to put in known test signals yourself and check the outputs. Unfortunately a lot of people who just buy instruments and plug them in don't do this, and I've seen people blindly believe really silly outputs that were clear results of operator error.

    20. Re:Seems reasonable by ianturton · · Score: 1

      Particularly if the research is publicly funded.

      unless the public requires us to extract the maximum economic return for our research. In the UK and to a lesser extent the USA researchers are required to make as much money as possible from our research. I've had a lot of problems with university managers when trying to release research code under open licenses.

    21. Re:Seems reasonable by Troed · · Score: 4, Insightful

      You argument is void. A bug is a bug. Either it affects the outcome of the program run or it doesn't - and I still don't need to know anything about what it's supposed to do to verify that. You just need to re-run the program with a specified set of inputs and check the output - also known as verified against its own test suite.

      (Yes, I'm a Software Engineer by education)

    22. Re:Seems reasonable by Pinky's+Brain · · Score: 2, Interesting

      Lets assume for a moment you publish your code in the reproducible research sense, this will mean you also publish all the code necessary to compute the graphs in your papers ... at that point I can at the very least determine if what you thought was significant in the initial results as explained in your papers is still there.

    23. Re:Seems reasonable by professionalfurryele · · Score: 2, Interesting

      Having worked in academia I can attest to the very poor code quality, at least in my area. The reason is very simple, the modern research scientists is often a jack-of-all-trades. A combination IT professional, programmer, teacher, manager, recruiter, bureaucrat, hardware technician, engineer, statistician, author and mathematician as well as being an expert in the science of their own field. Any one of these disciplines would take years of experience to develop professional skills at. Most scientists simply don't have time to do that, so they wing it. I think publishing code would be a good idea as scrutiny would help quality, but a big chunk of this code is never going to be of professional quality because it isn't written by professional programmers.

    24. Re:Seems reasonable by joocemann · · Score: 1, Insightful

      You argument is void. A bug is a bug. Either it affects the outcome of the program run or it doesn't - and I still don't need to know anything about what it's supposed to do to verify that. You just need to re-run the program with a specified set of inputs and check the output - also known as verified against its own test suite.

      (Yes, I'm a Software Engineer by education)

      You assume far too much. I don't trust an analysis of anything, by anyone, who doesn't know what they are actually looking at. In your example you can look and analyze but you don't need to understand what it is....

      I'm seeing a pretty clear parallel between your view of how the code can be analyzed and the AGW ignoramus skeptic view of AGW science as a whole. I don't trust arguments for or against AGW that aren't by people with educations to demonstrate they at least *might* know what they are talking about.

      You're basically saying you're qualified to analyze and discuss a topic you do not understand simply because you know a language. That is just B.S.

    25. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Understanding what the code is doing is required for you to do an audit which will produce any useful results.

      I will use that the next time I am dropped in the middle of a project and have not the slightest idea what is going on. Then I will start looking for another job.

      I have audited hundreds of programs over the years. Many I could care less what they are supposed to do. That is helpful in an audit. But not necessary for finding stupid bugs (you would be surprised how easy they are to make). I have met many smart dudes over the years. Out of them I would say only a small fraction actually write good code. Yet they all thought they wrote solid code.

      Where I work you boundscheck it, lint it, static analysis it, peer review it, QA it, re do all previous steps, and hopefully actually create a formal proof that it is correct. From the way you talk you do MAYBE peer review and only if you deem the person worthy to look at your code. I would trust you to write a decent video game or non life changing software and then only with someone who is more methodical about it than you. But would not put you in charge of something where peoples lives are on the line.

      Lots of little errors points at the fact you do not want to fix things. I am sorry but you come off less than impressive in my book.

    26. Re:Seems reasonable by MikeBabcock · · Score: 3, Insightful

      Both are issues. If your code is buggy, the output may also be buggy. If the code is bug-free but the algorithms buggy, the output will also be buggy.

      The whole purpose of publishing in the scientific method is repeatability. If the software itself is just re-used without someone looking at how it works or even better, writing their own for the same purpose, you're invalidating a whole portion of the method itself.

      As a vastly simplified example, I could posit that 1 + 2 = 4. I could say I ran my numbers through a program as such:

      print f(1, 2);
      f (a, b):
      print $b + $b;

      If you re-ran my numbers yourself through MY software without validating it, you'd see that I'm right. Validating what the software does and HOW it does it is very much an important part of science, and unfortunately overlooked. While in this example anyone might pick out the error, in a complex system its quite likely most people would miss one.

      To the original argument, just because very few people would understand the software doesn't mean it doesn't need validating. Lots of peer review papers are truly understood by a very small segment of the scientific population, but they still deserve that review.

      --
      - Michael T. Babcock (Yes, I blog)
    27. Re:Seems reasonable by khayman80 · · Score: 1

      Good point. If the whole package is "turnkey" then the bug can be fixed, and the resulting graphs recomputed. If there's no change, the bug was scientifically irrelevant. If the graph changes, the bug may or may not have been scientifically important, depending on the specific conclusions drawn from the graph. Incidentally, this approach would not only require the research data (which the scientist may not be able to share) and a computer powerful enough to run the program before you die of old age.

    28. Re:Seems reasonable by iris-n · · Score: 1

      I second that. Many times I've just emailed the paper's authors "SAUCE PLX!!!" and got the code.

      Another thing to be aware of is that most code is a piece of shit. Maybe not incorrect, but in a terrible style, unreadable and unmaintainable. Most of the times it is worth to dump someone else's code and start anew, if you have to do modifications.

      --
      entropy happens
    29. Re:Seems reasonable by Anonymous Coward · · Score: 0

      thanks for perpetuating the stereotype that your profession is full of arrogant assholes.

      Having gone through code written by scientists at a previous employer, I think it's safe to say that most of it is of mediocre quality. sloppy coding, inefficient algorithms, poor in code documentation, lots of bugs waiting for the 'non happy path' use cases, difficult to maintain, etc. Code audits by a CS person does a world of good for these programs.

    30. Re:Seems reasonable by bmajik · · Score: 5, Insightful

      However, 99.9% or more of the people in the world wouldn't be able to do a damn thing with it. I look at my classmates - we're all in the same degree program, yet probably only 5% of them would really be able to understand and do anything meaningful with the code I'm using.

      I think the world is very lucky that Linus Torvalds wasn't as narrow-sighted and conceited as you are.

      Why? We're that specialized. Here, I'm talking 5% of people studying atmospheric and oceanic sciences being able to make use of my code without taking several years to get up to speed. What's the incentive to release it? Why bother with the effort, when the audience is soooo small?

      Release the code, and if some dumbass decides to dig into it, you either are in the position of having to waste time answering ignorant questions, or you ignore them, giving them ammo for "teh code is BOGUS!!!!" Far easier to just keep the code in-house, and hand it out to the few qualified researchers who might be interested. Unsurprisingly, a lot of scientific code is handled this way.

      However, I do very much believe in completely transparent discourse. My research group has two major comparison studies of different climate models. We pulled in data from seven models from seven different universities, and analyzed the differences in CO2 predictions, among other things. The data was freely and openly given to us by these other research groups, and they happily contributed information about the inner workings of their models. This, in my book, is what it's all about. The relevant information was shared with people in a position to understand it and analyze it.

      It'd be a whole different story if the public wasn't filled with a bunch of ignorant whack-jobs, trying to smear scientists. When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox. If you want access to the data and the code, go to a school and study the stuff. All the doors are open then. The price of admission is just having some vague idea wtf you're talking about.

      Have you heard of "ivory tower"? You're it.

      Your position basically boils down to this: "unless you read all the same things I read, talked to all the same people I talked to, went to all the same schools I did... you're not qualified to talk to me".

      That is _the_ definition of monocultural isolationism.. i.e. the Ivory Tower of Academia problem.

      Here's the problem: if your requirement is that anyone you consider a "peer" must have had all of the same inputs and conditionings that you had... what basis do you have for allowing them to come out of the other side of that machine with a non-tainted point of view?

      As a specific counterpoint to your way of thinking:

      My dad is an actuary.. one of the best in the world. He regularly meets with the top handful of insurance regulators in foreign governments. He manages the risk of _billions_ of dollars. The maths involved in actuarial science embarass nearly any other branch of applied mathematics. I have an undergraduate math degree and I could only understand his problem domain in the crudest, rough-bounding box sort of fashion. Furthermore, he's been a programmer since the System/360 days.

      Yet his code, while there is a lot of it, is something I am definitely able to help him with. We talk about software engineering and specific technical problems he is having on a frequent basis.

      You don't need to be a problem domain expert in order to demonstrate value when auditing software.

      Furthermore, as a professional software tester, I happen to find that occasionally, not over-familiarizing myself with the design docs and implementation details too early allow me to ask better "reset" questions when doing design and code reviews. "Why are you doing this?" And as the developer talks me through it, they understand how shaky their assumptions are. If I had been "travelling" with them in lock step

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    31. Re:Seems reasonable by Jaeph · · Score: 1

      Yeah, wouldn't want just any Tom, Joe, or patent clerk to look at this stuff. What could such people possibly contribute?

      If you're smart enough to understand this code, than I know you. If I don't know you, just keep your nose out of business where it doesn't belong.

      -Jeff

      --
      Please learn the difference between a dissenting opinion and a troll before you moderate.
    32. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Read it, twist the content into a series of alarming-yet-completely-invented "talking points", go public and watch the fur fly.

      Jeez, what did you think?

    33. Re:Seems reasonable by HungryHobo · · Score: 1

      I'm currently working on my finaly year project and this is the most annoying crap.
      Essentially what I'm doing is reading a bunch of CS research papers in a specialised area and creating basic implementations.
      Grand.
      Some people who write papers include good explanations of how they're handeling their data.
      Some go one better and provide actual source code so that it's clear as day.

      But then there's the dicks who are just trying to hide that what they've done isn't really all that complex.
      Sure the approach they're talking may be innovative and interesting but they try to hide the fact that what they're doing boils down to 5 to 10 lines of actual code.
      If there's 2 ways they could say something, one which is clear and concise and one which is cryptic and makes it harder to understand they'll go with the cryptic version.

      4 lines of psedocode or half a page of obfusticated mathematical symbols where they may not even define all their variables?
      Heaven forbid they use the 4 lines of code!

      Yes I'm pissed off that this particular aspect of the academic culture.
      It's bullshit and it's an attempt to hide what you're really doing and play up the "nobody but me is smart enough to understand what I'm doing" crap.

      Academics need to learn from the hacker culture.
      "Source or GTFO"

    34. Re:Seems reasonable by bmajik · · Score: 1

      If your goal is to do the best science possible, why not take the help where you can find it?

      Suppose that I buy your argument that the bugs don't matter, or that only you can determine if the bugs matter.

      What else doesn't matter? If you reduce the surface area of "things that matter" to "the things that only I am qualified to render an opinion on", it's kind of a circular dependancy, isn't it ?

      I'm sure there are chemists who, upon seeing that their scales aren't graduated finely enough or do not reliably re-zero, opine that in the context of the measurements they are taking, that it doesn't significantly affect the outcome and so is a tolerable source of error.

      But I'd like to think that any chemist, when a scale technician offers to fix her scale or provide her with a more accurate one [freely], would take advantage of that offer. Better results are always better, right?

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    35. Re:Seems reasonable by Starlet+Monroe · · Score: 2, Insightful

      This is a conundrum for me. My research is in the world of radiation physics, where results can definitely be life-changing. I absolutely respect the amount of impact small discrepancies can have on outcomes, but I also struggle to find a balance. The project I'm on right now is a retrospective analysis, so the results we report won't directly affect anyone. If policy changes are made from what we determine, the results will.

      My role is to conduct some fairly complex calculations against a data set, for which I've built some custom software and a database. The software isn't great software...it's good enough to get the job done. I validate the input...a little bit. Just enough to make sure we're using the right file. I confirm that the data I need exists in our input, but I don't do any boundary checking on it. Why should I? There's only one data file that gets analyzed, and as we collect more data, we run it again. I'll probably use this code in "production" four times in the course of the study. Are there stupid bugs that crop up if strings show up in the data instead of floats? Sure. But there won't ever be strings in the data, and the code won't ever be used after we run the data through. We don't have the budget for me to spend the time to write it "right", the way I would if it was for enterprise use. And we sure can't afford to QA it, too.

      I respect the idea that all code should go through a complete development cycle before use in production, and I think it's certainly important for that to happen in science, but I think there have to be limits. Sometimes the object is to get something done, and the difference between doing it "best" and "good enough" doesn't mean the difference between "right" and "wrong."

      --
      ++
    36. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Funny, as an undergrad in EE doign computer support for the Physics Department, I debugged a whole lot of code that the physicists wrote and mangled. You know, shit like "you get bad results when you point a double into a function that eats floats" and shit like that. So yes, I've succesfully debugged millions of lines of fortran and C without understanding the physics that's being calculated.

      -John

    37. Re:Seems reasonable by Anonymous Coward · · Score: 0

      I would release my code if there was one person in the world who could understand it.

      There are currently eight scientist in the world who understands in depth the problem my code is suppose to look at.

      I would release my code if there was one person in the world who could read my code.

      I created a new macro language which hooks onto a pre-existing physics simulation code which development was started in the 1970s. So physics base->my libraries to connect to base-> my higher level macro language -> general physics macro definition -> simulation specific macro file. I feed in data composed from another source which I pre-procces using 3 different codes. One to put it into an easy to manipulate format, one to actually modify the data, another to put it into the other format which the physics base needs. The data that comes out of this method is again in a format that is hard to modify so I dump that to an easy to use format, one to again modify and another to dump into another proprietary format.

      So you want me to release all this horribly document code with no description of what order to do things in...

      I would release my code if there was one person in the world who could run my code.

      My code takes a super computer listed in the top 100 supercomputers in the world 2 days to produce data that can be analyzed.

      I would release my code if it wouldnt take a majority of my time putting together a cohesive package together.

      What would you rather have? me spending 1 month documenting my code so others can look at it or you know... be a scientist and try to make the world a better place?

    38. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Are you serious??? If your 99.9% number wasn't fabricated from your conceit that would still leave 6.9 million potential reviewers. The other problem with your statement is that computer simulations are very prone to errors. Take the expression a/(b-c). If b is anywhere near c the errors become huge. Someone with zero knowledge of thermodynamics can look through code and find problems with numerical and statistical methods. Your statement is so ignorant that I wonder if you made up your post just to make climate researchers look arrogant and stupid.

    39. Re:Seems reasonable by mwvdlee · · Score: 1

      If recreating the results of research requires using the software, then the software should be open to scrutiny, just like any other tool and method used to produce research results already is.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    40. Re:Seems reasonable by c0d3g33k · · Score: 1

      thanks for perpetuating the stereotype that your profession is full of arrogant assholes.

      In my experience it's not a stereotype - it's true enough sometimes that exceptions are worth noting. :-) The reality is that being an arrogant asshole isn't always a liability in science, and in many contexts may be an asset. For example, nice people don't call bullshit when they see it, even if they should - that would be impolite. But politeness doesn't result in better science, while a no-bullshit, no-compromise attitude often does. Some people can turn it off at the end of the day, many can't. They go home alone, but they still do good science.

    41. Re:Seems reasonable by ottothecow · · Score: 1
      Maybe its a bug that only pops up on certain inputs. Maybe the researcher knows this and avoids those inputs (or wrote the program without intending to go anywhere near the input range where the code fails). This sees fine to me...researcher needs a one-off set of statistics and writes some quick and dirty code that does it even if it isn't robust or even efficient.

      Releasing this code is probably bad for two reasons. If the researcher is not aware of bugs outside of the exact inputs they used, they probably aren't going to disclose them--just wait until some amateur gets a hold of the code, runs it, and claims that all global warming data is questionable because this model has a bug or produces weird output. Second, it will waste the researchers time releasing the code and then responding to questions when people are like "lolz this code blows".

      I don't expect researchers to write great code for everything...it may be repetitive or inefficient but they can usually tell from the result (and comparing it to other models) whether or not something went wrong. I know that I write code at work (IANAClimate Researcher) that is quite sloppy or wasteful because I just want to see what the result looks like (and will never run the program again) and it therefor makes more sense to chug onwards through bugs and strange cases rather than rewrite a more robust program from the start.

      That being said, it should definitely be available as a part of the peer review process if something is really called into question.

      --
      Bottles.
    42. Re:Seems reasonable by mkosmul · · Score: 1

      Try explaining to a physicist how a 32 or 64 bit float can't exactly replicate all of the numbers they think it can and watch half of them have their eyes gloss over for half an hour.

      I have studied both physics and CS and I can hardly imagine a physics curriculum (especially one in theoretical physics) which doesn't include any course in numerical methods.

      Another thing is basic good practices, in particular those related to maintainability. My experience is that this area seems to be lacking and indeed I've seen some terrible practices, like a complex application compiled from a single FORTRAN file 1 MB in size (including comments). While FORTRAN still has its merits in many areas of computationally-intensive programming, it is a nightmare from the maintainability perspective, especially if misused as it sometimes is.

    43. Re:Seems reasonable by Anonymous Coward · · Score: 0

      You seriously need to get over yourself.

    44. Re:Seems reasonable by ZombieWomble · · Score: 1
      The problem may be fixing itself, however - journal costs are getting so out of hand that many universities are cutting back on journal subscriptions, which means that open access journals are getting higher citation counts because they're all some researchers can access. While it'll be a while before they're going to displace heavies like Nature or Science, many free-to-read journals like PLoS Biology or the New Journal of Physics have higher Impact Factors than their closed brethren, meaning that at a glance they're at least as pleasing to the bean-counters as a comparable pay-to-view journal.

      The only issue remaining is the slight distastefulness of pay-to-publish models - The coupling of acceptance of a paper to income is slightly troubling, although a more robust solution is not immediately apparent.

    45. Re:Seems reasonable by mrxak · · Score: 1

      If climate scientists release their code, people will be able to control the weather!

    46. Re:Seems reasonable by Troed · · Score: 0, Offtopic

      You still don't get it - which means that you might be brilliant in your field of science but you have no business whatsoever talking about source code - which would be one of my fields :)

      You're simply not qualified, to use your own words.

    47. Re:Seems reasonable by Troed · · Score: 1

      I'm sorry, you seem to want to claim that bad code and bad testing is ok in science.

      It's not, of course.

    48. Re:Seems reasonable by mwvdlee · · Score: 1

      If somebody releases a scientific research paper on any complex topic, 99.9999% of the people in the world wouldn't get it either. The reason you publish the research and results is so your peers (the remaining 0.0001%) WILL be able to doublecheck your work.

      Handing it out to a few qualified researchers means handing it out to people you handpicked yourself (or used a bunch of rules or systems you handpicked; same difference). Any selection of this type will be subjective to some degree and thus has no place in science.

      Even somebody that totally disagrees with the fundamental principles your research is based upon should be allowed to doublecheck your work. Sadly, this means opening up research to total nutjobs, paranoid conspirancy theorists and Joe-Average-who-lives-in-a-trailer. The price of not doing so could be much higher.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    49. Re:Seems reasonable by Troed · · Score: 1

      Do you have a degree in software?

      If so, someone should've told you that you NEVER know for how long, by whom and for what your code will get used. If you have serious limitations, you document them (and thus it's no problem getting the code audited anyway).

      You know why we have a Y2K problem? Coders who couldn't imagine there code still being around decades later, used for purposes they never imagined.

    50. Re:Seems reasonable by Anonymous Coward · · Score: 1, Insightful

      You seem to have completely missed the point of the gp. Scientists are often more than willing to listen to software engineers, but noone pays for software engineers to talk to them. If a scientist isn't "researching" and is instead handling first level support on code he wrote five years ago (or risk being slated for not dong so), then they're not going to remain in post very long.

      If you add to this controversial areas of science (ie not Steven Hawking's area but climate change) then there are well funded lobby groups and others with too much time on their hand looking for ANYTHING that is wrong. As various people have commented, codes have bugs in them. Most o them don't matter and you can check the results a posteri to detect any critical ones.

    51. Re:Seems reasonable by Anonymous Coward · · Score: 0

      No, no no. All programs like this have test cases to validate their functionality. For example, a mass spec program I worked on would validate itself via emulated peaks. You knew the answer in the "fake" data, and just wanted to see if the program produced satisfactory results.

      This same thing is also done in climate science. They put in data (like CO2 concentrations) and calculate temperatures, heat balance, whatever and compare it with real world data. You know, the 150 years of good data and millions of years of less accurate data. If the model works for that time period, it then can be used to model the future impact of more CO2. This should be common sense?

      Scientists don't want to release their code because it also tends to look like shit. People that write this stuff are generally not software professionals. They code stuff to solve a problem with generally only a background in math, not software development. You end up with a trillion variables named i,j,k,l,m,n,o,p,i1,i2, and copy-paste code development. The code will work for specific scenarios, but will be very fragile. All CS people would do is lol at the design and say "this needs a rewrite" but they will not understand WTF the code does.

      As for your "analysis", how about quickly code Monte-Carlo solution generator for an n-element wave function in infinite well? Or how about modeling Fe-57 neutral, ground state?

      Here's a link *with code* how to do it for a Helium :P
            http://www.physics.buffalo.edu/phy410-505/topic5/lec-5-4.pdf

    52. Re:Seems reasonable by Troed · · Score: 2, Insightful

      You're not doing science if you're not performing work that can be falsified (and replicability is a cornerstone in that).

      I'd rather have you do science.

    53. Re:Seems reasonable by mwvdlee · · Score: 1

      Nobody would care if the bugs are in a part that doesn't produce output. If it crashes on invalid input, so what? If it's slow or unstable, who cares? If a bug may introduce discrepancies in the results, even in rare situations, it's a problem.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    54. Re:Seems reasonable by khellendros1984 · · Score: 3, Informative

      It may not be possible to actually "discuss" the topic, but it's certainly possible to find bugs that may or may not influence the output of the program. And given the original input data, it's possible to remove the bugs, run the corrected program against the original input data, and see if the output is different. It would take someone with knowledge in the target topic to analyze the output data and decide if any difference is significant, but the actual check for bugs could certainly be done by anyone that "speaks" the language the program was written in.

      Even with something like a Global Warming argument, a person with a strong grasp of both English and logic might not be able to verify claims in an argument, but they can certainly analyze the argument for certain logical fallacies. Perhaps the fallacious section of the argument doesn't invalidate the argument as a whole. You can't trust this generic English-speaker to accurately make that determination, but they're certainly able to identify and remove a strawman, an ad hominem, etc.

      --
      It is pitch black. You are likely to be eaten by a grue.
    55. Re:Seems reasonable by STRICQ · · Score: 2, Informative

      Careful, you are getting dangerously close to the conceited, "Holier than thou" attitude that many climate scientists are spewing out. You really don't know what you're talking about when you say the op doesn't know what he's talking about. I'm a software engineer, finding bugs, even when you don't know what the code is doing, is a lot easier than you would think.

    56. Re:Seems reasonable by Pentagram · · Score: 1

      None of scientific computing is done particularly well, they expect people with no training in software development to do the work, assuming it was done when software development existed, and there isn't the funding to pay people who might do it properly.

      I'm not sure about "none"! I expect you think your code is pretty good :)

      However, I think quite a lot of code *is* poor. I switched from a PhD (and undergrad degree) in computer science to modelling in psychology. The code I inherited was written by a variety of people from different disciplines, none of them computer scientists/software engineers. It mostly worked as intended but it was very hard to verify how. In straightening it all out (actually, mostly rewriting it) I fixed a lot of bugs, several of which could have seriously affected previous results.

      Giving out a scientists code for inspection means someone else will have a working software platform to publish papers based on your work, and that's not so good for you.

      Against that, there's a fair amount of kudos having other scientists use your software. As well as being referenced in their papers, you get name-recognition from other scientists just looking you up and downloading your code.

      Anyway, this isn't really a problem is it? If publishing code becomes the norm it will be the same for everyone.

    57. Re:Seems reasonable by Explodo · · Score: 2, Interesting

      You seem to assume that your code is correct. What if, by allowing others to audit it, bugs were found that significantly altered the output. Wouldn't that be something that you'd be interested in? Or, what if you spent years working on your doctoral thesis but at the last second found an error in your software that was what allowed your results to be in line with your assumptions and theory work? Would you scrap your years of work, or would you ignore it since you're freaking tired of working on it and want to be done already? Now assume that the results of your work are used to set public policy somewhere down the road...would you be honest enough to stand up and say it was fraudulent?

    58. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Particularly if the research is publicly funded.

      The public wants to pay for me to document, package, release and support my code, they are more than welcome to. However, public funding isn't a blank check and I don't see ever being paid to do it. The people paying the bills want published results, not code.

    59. Re:Seems reasonable by bmajik · · Score: 5, Insightful

      there are well funded lobby groups and others with too much time on their hand looking for ANYTHING that is wrong.

      Errors are only errors if they are reported by the "right" people?

      Do you want to know how many questions Linus Torvalds has answered for me? Zero.

      I actually _have_ gotten personal responses from Theo DeRaadt on some OpenBSD issues but they all have the general form of "you're not interesting, don't waste my time".

      Nevertheless, I rely on OpenBSD. The fact that Theo has neither the time nor the interest in having a deep meaningful conversation with me about his code neither changes the quality of his code nor prevents him from releasing every 6 months, on schedule.

      I don't think that there is an expectation that scientists stop doing their day jobs to do software support for people. I think there is an expectation that publicly funded research used to set public policy be easily available to all comers.

      I'm a bit frustrated by the apparent contradiction. For the first time perhaps in history in the USA, you have armchair folks trying to do technical audits of scientific tools, research, and publications -- for free.

      I thought the "normal" problem in America is that the population is too apathetic to care and too stupid to provide any critical analysis. And yet we see this happening more and more frequently and the climate-science establishment is circling the wagons instead of celebrating the fact that there are a handful of people that for once give a damn about interesting research tools and methods.

      I must concede that there are some downsides to discussing your opinions and findings with others: When people disagree with you, it ends up taking some of your time.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    60. Re:Seems reasonable by STRICQ · · Score: 1

      You're whole post just screams, "This is why we need peer review of science based source code!"

    61. Re:Seems reasonable by mrxak · · Score: 1

      If you're the only one looking at your code, and it's so complex, how do you know it's working as you intended it to? This is the question that is at the very heart of the issue we're discussing here. It really doesn't matter what the problem is you're working on, I assume the algorithms can be described in some logical/mathematical way that somebody can turn into code without understanding what it's all about. After all, there's got to be something at the heart of your project you can share with those 8 other scientists so they know your methodology.

      What would be better, keeping your code to yourself and not knowing if there are even tiny bugs in the implementation of your algorithms, or releasing it and having interested parties examine it and tell you if your algorithms are actually converted properly to code or not? It doesn't matter if anybody can run it, or what the data is you're number-crunching on, I would think you'd want somebody to make sure you didn't type 1+2 instead of 2+3.

      As a (wo)man of science, I would expect your primary concern would be in the truth. You cannot know you're getting the truth if your code is never seen by another set of eyes. Nobody's going to be challenging your algorithms outside of the scientific community. What we're talking about is equivalent to a lab tech making sure your scales are calibrated. Trust me, no matter whatever special new language you created, there's more than a few computer scientists out there who can figure it out without you spending a month going back and documenting everything.

      That said, if scientists were shamed enough into writing good code to begin with, and documenting everything, that wouldn't be a bad thing either, for science or programing.

    62. Re:Seems reasonable by STRICQ · · Score: 1

      LOL, that's what they're doing now. Hoping we don't notice there's a difference between what they say and what is really happening around us.

    63. Re:Seems reasonable by mrxak · · Score: 1

      Whatever he wants. Since when was having information a bad thing? I'm sure a lot of scientists would be a lot happier if there was a greater understanding and curiosity about science in the world. Ultimately it will translate to more money for research, but in the shorter term, a more scientifically-conversant society can only be a good thing. Yes, there will be stuff that most of the population won't understand. But you'll use that as a basis for depriving everyone of that knowledge? There will be people who do understand it, and if it's only a single person understanding a single paper out of thousands, the entire exercise was still worth it.

    64. Re:Seems reasonable by bdwlangm · · Score: 2, Insightful

      If I find code that will cause heap corruption in your code (e.g. you wrote past the end of an array in C), then there is a bug in that code whether you do fluid simulations, or make 3D games. I worked as an undergraduate RA under some guys doing ocean modelling, and found several small bugs before I had the foggiest idea what most of the code was meant to do. Yes there will be many problems someone without your background can't find in your model, but that is not an argument for closed source science.

      A more important concern is that someone else who does have your background should have access to your code. That would be part of "peer review". Otherwise they're taking your computations on faith, with no way to reproduce.

    65. Re:Seems reasonable by Pentagram · · Score: 1

      Of all the stuff that's important in scientific computing, the code is probably one of the more minor parts. The science behind the code is drastically more important.

      Usually though the code is a method of determining whether the science is correct. If the code is not correct, it's not possible to assess the science.

      Sure, you could audit it, and find shit that's not done properly. At the same time, you wouldn't have a damn clue what it's supposed to be doing. Suppose I'm adding a floating point to an integer. Is that a problem? Does it ruin everything? Or is it just sloppy coding that doesn't make a difference in the long run? Understanding what the code is doing is required for you to do an audit which will produce any useful results.

      I disagree. Provided the data used in the experiment are also freely available, it should be relatively straightforward to fix any bug that was found and rerun the experiment. You could then easily tell whether the bug made a substantial difference.

    66. Re:Seems reasonable by c0d3g33k · · Score: 1

      On top of all that it's not like you want to release your code to the public right away anyway. As a scientist you're in competition with groups around the world to publish first. You describe in your paper the science you think you implemented, someone else who wants to verify your results gets to write a new chunk of code which they think is the same science and you compare. Giving out a scientists code for inspection means someone else will have a working software platform to publish papers based on your work, and that's not so good for you. For all the talk of research for the public good, ultimately your own good, of continuing to publish (to get paid) trumps a public need. That's a systematic problem, and when you're competing with a research group in brazil, and you're in canada their rules are different than yours, and so you keep things close to the chest.

      Bullshit. If this kind of thinking is prevalent in the field you're familiar with/in, then it's an unhealthy field and probably isn't producing good science.

      First, this "play it close to the chest so someone doesn't steal my stuff" isn't common in the scientific areas I'm familiar with (biochemistry, biophysics, chemistry, biomedical computing, bioinformatics). People may hold off on releasing info because it's not ready or because it's too new and untested, but rarely because somebody may steal it and beat them. When this behavior does happen, it's usually between bitter rivals and mostly viewed as unhealthy. In practice, preliminary results and work-in-progress see the light of day pretty early, whether in talks given as an invited speaker or as a presentation of work in progress at a conference or symposium. That's considered healthy, because it gives people a chance to talk about their work with their peers *before* it gets published so the work is that much better.

      Second, how is software different from any other piece of lab equipment, reagent, tool or method? If it can be purchased, other folks already have it or are planning to, if it's any good. If you developed it in house, you should be discussing it with people as described above, because it might not be as good as you think. The strength of science is based on the free exchange of ideas, and that includes how it's done, not just the results.

      Medieval alchemists were good at keeping secrets too. Compare how well their approach worked relative to the free exchange that characterizes the last several hundred years.

    67. Re:Seems reasonable by ottothecow · · Score: 1
      Not actual bad code that produces bad results.

      Code that is not very robust however seems quite common. This code is not being packaged up and sold to a customer who might try to feed a picture of their pet dog in as an input to a climate model so the code doesn't have to check to make sure. The scientist running it is making his own check on the input or the input is being checked somewhere else. Good code is always better...but lets not kid ourselves and pretend that every little program is fully commented, sanitizes inputs, and has an implementation of clippy saying "it looks like you are trying to model cloud cover with a pet dog, would you like some assistance choosing a better seed input"

      --
      Bottles.
    68. Re:Seems reasonable by mrxak · · Score: 1

      The release of raw data along with the results is a good thing, I agree. But somebody checking to make sure the scientists' algorithms are implemented correctly in an existing program is easier, and more likely to happen, than somebody independently coding their own implementation, particularly in scientific research where the computing power necessary is prohibitive.

    69. Re:Seems reasonable by Anonymous Coward · · Score: 0

      *Exactly*

      Furthermore, all algorithms are published and data can be fetched. In that case it is very bad to public code because the other research groups are not duplicating your results from your models, but they are just running the same code on their machines - they are cloning your *arithmetic* not models. It is much better to re-write specific algorithm yourself and check if it matches the publication.

      Mission critical software is NOT necessarily code audited by 3rd parties because that is prone to errors. What happens is multiple parties write code to same specs and then check if the results agree. That's how mission critical flight software is developed. Multiple computers running software created by different vendors. Then they have to agree.

      That's how scientific method works too. Code sharing is anti-scientific method for that alone, especially with spaghetti code that is many researchers write.

    70. Re:Seems reasonable by Pentagram · · Score: 2, Insightful

      You assume far too much. I don't trust an analysis of anything, by anyone, who doesn't know what they are actually looking at. In your example you can look and analyze but you don't need to understand what it is....

      If the code is freely available and so are the data used, what is stopping you rerunning the experiment with the same data if you find a bug? No analysis comes into it: if the results are significantly different, you can show that the program is running incorrectly.

      I'm seeing a pretty clear parallel between your view of how the code can be analyzed and the AGW ignoramus skeptic view of AGW science as a whole. I don't trust arguments for or against AGW that aren't by people with educations to demonstrate they at least *might* know what they are talking about.

      A mathematician could point out flaws in the calculations of climate science, a physicist could point out problems with the understanding of the physics, a chemist could point out issues with the understanding of the chemistry... you don't have to understand an entire issue to notice problems with a subset of the science. I speak as someone who accepts the majority expert view of climate change.

    71. Re:Seems reasonable by jbengt · · Score: 1

      You (and other posters in this thread) are looking at one side of it. Sure, finding coding bugs and fixing them may be necessary for a valid result. But what after that? You still won't know whether the code is valid if you can't thoroughly understand the problem space.

    72. Re:Seems reasonable by bdwlangm · · Score: 3, Insightful

      just wait until some amateur gets a hold of the code, runs it, and claims that all global warming data is questionable because this model has a bug or produces weird output

      The onus is on the researcher to demonstrate/argue that for the inputs given the code produces meaningful results. If you don't like that then stop doing research with computations? Idiots can always misrepresent you, no matter how you publish. Most of us understand that simulations are limited.

      Second, it will waste the researchers time releasing the code and then responding to questions when people are like "lolz this code blows"

      What makes you think that there will be more people trying out that code and not understanding it, than currently there are people reading the paper and not understanding it? Personally I'm not going to waste my spare time downloading complex simulations that I know nothing about and try to invalidate them.

      That being said, it should definitely be available as a part of the peer review process if something is really called into question.

      So make it available and reference it in your paper. No one's asking you to tell everyone on the planet about it.

    73. Re:Seems reasonable by EaglemanBSA · · Score: 1

      I also agree with your statement, although perhaps not the presentation. It's important that we share as much data and models as possible. If you're trying to hide your model because of some private interest, that's one thing. If your research is owned by the American people, it's not yours to censor, and you'd be surprised how intelligent other people are, and what interesting things they may glean from the knowledge you've cataloged.

      If your models truly work (and they don't represent a trade secret), what, exactly, do you expect to gain by hiding them? If your models truly work, you'd certainly want everyone to see that they're indisputable.

      --
      Quiz: True or False -- On a scale of 1 to 10, what is your middle name?
    74. Re:Seems reasonable by Pentagram · · Score: 3, Insightful

      Maybe its a bug that only pops up on certain inputs. Maybe the researcher knows this and avoids those inputs (or wrote the program without intending to go anywhere near the input range where the code fails). This sees fine to me...researcher needs a one-off set of statistics and writes some quick and dirty code that does it even if it isn't robust or even efficient.

      Sorry, but I wouldn't trust any code that fails on certain inputs!

      I can accept code that isn't efficient, that's just not necessary. I can accept bugs in peripheral code (such as an added-on GUI) but the code that actually does the science really should be as good as the scientist can write. If it has known bugs they should be fixed before any research is published that is based on the code.

      I speak as someone who has written code for scientific research.

      Releasing this code is probably bad for two reasons. If the researcher is not aware of bugs outside of the exact inputs they used, they probably aren't going to disclose them--just wait until some amateur gets a hold of the code, runs it, and claims that all global warming data is questionable because this model has a bug or produces weird output.

      Good. That means researchers will be more careful about the code they are writing, and we can all have more confidence in the science.

      I don't expect researchers to write great code for everything...it may be repetitive or inefficient but they can usually tell from the result (and comparing it to other models) whether or not something went wrong.

      Comparing it to other models? What if they are wrong too? Perhaps that's how they verified their results. Trying to tell if the program is correct from the results is even worse. You end up fixing bugs until the code produces the result you want.

      I know that I write code at work (IANAClimate Researcher) that is quite sloppy or wasteful because I just want to see what the result looks like (and will never run the program again)

      That's exploratory programming, and is quite fair enough (in fact I think people should do it more), but you shouldn't use such code to do anything important. Throw it away and start again.

    75. Re:Seems reasonable by Fembot · · Score: 1

      I'm rather inclined to agree with your analysis here, but my reasons for not publishing the full source code to my work (other than the bits which are generally applicable) is twofold:

      1) I'm willing to share my code with anyone who's interested. I actually rather like talking to people who are sufficiently interested in emailing me though - this can foster collaborations and both parties gain from a discussion of interests
      2) Some chunks of my code make the whole lot totally undistributable. I use implementations of algorithms from books which state "you may not distribute electronic copies of this code". I don't have the time or inclination to work around this myself, it doesn't win publications. I have however structured my source tree such that it's easy enough for me to distribute the remainder and someone with relevant background can find replacements.

      Most of the datasets I use are *huge* and again I'm willing to share with anyone, but they need to be serious enough about it to contact me first.

      Finally there's things in my code which aren't broken per say, but are massive hacks designed to shortcut some of the original design in order to meet some crazy deadline I forgot all about. Rather embarrassing really if I've not had time/inclination to fix it.

    76. Re:Seems reasonable by philipgar · · Score: 3, Insightful

      Actually, I'm pretty sure everyone is fairly close with the current data they're generating to prevent other groups from beating you out the door with your idea. The exceptions to this rule are when professors trust one another, and know that the other wouldn't use the information you're supplying them with to do the same research you are already working on.

      As a graduate student, you definitely don't want to share code you've developed immediately. You may spend 2 or 3 years of a PhD writing code, and get a couple papers out of it, but with the code base in place you plan on getting a handful more. More to the point, these papers become relatively easy to generate, because you spent those years developing the program that allows you to do it. Writing papers, and generating results, analyzing them etc takes time, so you can't do everything at once. Releasing your code too early means other groups can do these other experiments, and you, the grad student who spent so many years setting up the code or experiment for them, still wouldn't be able to graduate, because you have not produced enough original research, and instead only developed the tools others used to pump out results.

      As a student nears graduation, they might be more willing to release their code, as then competition is less of a concern. Someone won't pick up your code and be releasing a paper based on it in 2 or 3 months, it just takes too long to get up to speed. However, the BIGGEST impediment to releasing software in academia is the support that you have to give to your software if anyone is going to use it. You first need to audit and clean up your code, a non-trivial task. You have to supply documentation on how to use the software, another non trivial task, and then provide documentation on the basics of how it works etc. All of this stuff takes a lot of time, and doesn't tend to help a student graduate. Also, once code is released, there's an expectation that you'll be providing some level of help with questions. Granted that normally rarely happens (as the author has gone on to do other things, and hasn't touched the code in years). It just becomes a difficult thing to do.

      Phil

    77. Re:Seems reasonable by OldSoldier · · Score: 1

      The release of raw data along with the results is a good thing, I agree. But somebody checking to make sure the scientists' algorithms are implemented correctly in an existing program is easier, and more likely to happen, than somebody independently coding their own implementation, particularly in scientific research where the computing power necessary is prohibitive.

      I'm not sure what is more likely to happen. It could be as you describe.

      But scientists strike me as being more interested in data than in code. So other possibilities are:

      * Releasing the code will spur more scientists to use it w/o examining it in detail resulting in the same errors being made in other research projects.

      * Scientists will focus on the data and will find other software to analyze it and/or make said software themselves if they don't have access to the source of the original scientists software.

      Over the long term we are in agreement, but there is some near-term danger to be aware of.

    78. Re:Seems reasonable by xiox · · Score: 1

      You're being unrealistic. As a scientist I'm payed to produce papers, not polish code. If the code does what I want it to do, and I'm satisfied that it is sufficiently unlikely to be problems with it on on the data I am putting into it, that is enough.

      If you want me to write perfect code, you should pay me to do so, and hire people for scientific research on the basis of their code. Being hired as a scientist is based on results, not on how well documented the code is or checking every possible input. I'd love to have more time to do every possible test, but I cannot do that and have a career.

      BTW, I have produced some code for others to use from my research, and it takes much longer to get it into a usable and documented state than the usual run of the mill script I write.

    79. Re:Seems reasonable by Urkki · · Score: 2, Insightful

      You assume far too much. I don't trust an analysis of anything, by anyone, who doesn't know what they are actually looking at. In your example you can look and analyze but you don't need to understand what it is....

      I'm seeing a pretty clear parallel between your view of how the code can be analyzed and the AGW ignoramus skeptic view of AGW science as a whole. I don't trust arguments for or against AGW that aren't by people with educations to demonstrate they at least *might* know what they are talking about.

      You're basically saying you're qualified to analyze and discuss a topic you do not understand simply because you know a language. That is just B.S.

      So if the scientist who wrote the computer model isn't a qualified software engineer and doesn't have intimate knowledge of the workings of processor architectures, computer languages and all that, then any results he gets using a computer program of his own making are not to be trusted?

      I think you just threw out a significant portion of latest science...

    80. Re:Seems reasonable by monoi · · Score: 1

      The fact that you think it is possible to just knock up a "specified set of outputs" for this kind of code shows how unfamiliar you are with this kind of program.

      Unfortunately, the purpose of simulation code is to determine those outputs! Yes, you could re-implement the entire algorithm in a different language. Yes, you can sometimes use analytic methods to find approximations which hold in special regions of parameter space. But traditional black or clear box testing, as if this were a business systems problem? Sorry, no dice.

      Qualifier: I am a practicing Software Engineer (outside academia) with a background in scientific simulation.

    81. Re:Seems reasonable by Xyrus · · Score: 1, Insightful

      I think the world is very lucky that Linus Torvalds wasn't as narrow-sighted and conceited as you are

      Linus Torvalds was writing an operating system, software intended for a general computing audience. Something like a climate model has a very exclusive audience and requires that the users have a deep understanding about the subject. You are comparing apples to oranges.

      Have you heard of "ivory tower"? You're it.

      Your position basically boils down to this: "unless you read all the same things I read, talked to all the same people I talked to, went to all the same schools I did... you're not qualified to talk to me".

      That is _the_ definition of monocultural isolationism.. i.e. the Ivory Tower of Academia problem.

      No, what he saying is that unless you have an education in the subject material then you're not going to understand what is going on. Have you ever had to explain to anyone that those programming montages in certain movies really are not accurate? It's sort of like that.

      Sure, you can decipher the code if you're a programmer, but you may not know WHY they are doing the things they are doing. Naively going through a code base with an engineering ax without understanding what the code is doing is a sure way to seriously screw things up.

      As a specific counterpoint to your way of thinking...

      That's not a counterpoint. You're proving his point. You're a software engineer working hand in hand with the "expert". But that's not the issue. The problem is having every idiot who thinks they're $DIETY's gift to programming come through a scientific codebase and expecting said scientist to do tech support, which they would have to as they may be the only one who understands what they did. This is a waste of time for said scientist, as their primary job is research, not conducting Fluid Dynamics 101. And a lot of the codes written are one-time solutions for a particular bit of research.

      Seriously, if you want to invalidate some results read the papers you want to attack and duplicate their algorithms in whatever programming language you want to use to prove that their wrong. Most code used in scientific papers are fairly short. Plus, doing and independent implementation helps further validate the research. What's the point of using their code? It's going to give you the same answers. Write your own, then you can be sure it was done "right".

      ~X~

      --
      ~X~
    82. Re:Seems reasonable by Xyrus · · Score: 2, Insightful

      The discussion in scientific circles is constructive.

      The quasi/anti-science mad-dog drivel that makes up almost all the rest of the discussion is what they're circling the wagons about. It's like having Joe Sixpack looking come into your workplace screaming that your professional work is bullshit and you should be fired.

      And actually having people in power listen to him.

      ~X~

      --
      ~X~
    83. Re:Seems reasonable by cirby · · Score: 1

      However, 99.9% or more of the people in the world wouldn't be able to do a damn thing with it.

      ...which mean six million or so COULD do something with it.

      "One in a thousand" doesn't mean very much when you have six billion for your sample size.

      Considering how many damaging flaws have turned up in the core research of Global Warming Science (and how much damning fakery has turned up in the leaked letters from CRU), it's safe to say that many of those "ignorant whack-jobs" are, to say the least, smarter and more perceptive than most of the folks who rabidly defend AGW...

    84. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Trouble is that you are assuming a linear system. Most science yet to be solved involves complex interacting systems, non-linear systems. Good luck checking the input against the output on those.

    85. Re:Seems reasonable by IronicToo · · Score: 1

      If 99% of the people are idiots and 1% of them ask a question you cannot immediately answer; congratulations: you just got 1% smarter and 1% is a HUGE gain in any endeavour worth earnestly chasing.

      That sounds very good until you have to deal with that 99% who are trying to discredit you and ruin your career. You should read to get a little taste for the kind of "questions" you usually get: http://www.conservapedia.com/Conservapedia:Lenski_dialog The whole question is a little bit mute anyway as a mechanism for dealing with this, any many other problems, is already used: replication of results. Published results are generally not widely accepted until they can be replicated in a different lab. This overcomes any coding errors, but more importantly, equipment errors, user errors, random chance, and even active data manipulation. I think what most people who are not actively involved in research fail to realize is what an iterative process science is. Early results are often, maybe even usually, error ridden. This can come from bad code or anything else. But as more people work on it and improve it the errors are removed and the final product is something very close to ground truth, and the longer it is discussed the closer it gets to that goal. If you don't believe me I would ask you where you think computers, automobiles, pain killers, vaccines, rockets, cameras, genetically modified mice, and skyscrapers come from. All of those required an incredible detailed, and accurate, knowledge of how the working components function. Science, it works bitches.

    86. Re:Seems reasonable by HiThere · · Score: 1

      What's wrong with adding a floating point to an integer? Isn't the resul a floating point? And if you store that result in an integer, then the result is a truncation to integer, isn't it?

      It's been a long time since I used FORTRAN, but that's how I remember it as working, and (in my memory) that was standard practice...except when you wanted to retain the extra precision of the floating point. (And even then there wasn't anything wrong with adding a floating point to an integer. That was, IIRC, standard in computing a moving average, e.g.)

      Now I'll admit that I'm not absolutely certain. I moved from FORTRAN to PL/1 to C...and when I got to C I was appalled as how stupid it was about conversions. But FORTRAN might be equally stupid, I suppose, and it was just the detour through PL/1 that accustomed me to automatic conversions. (But that's not the way I'd bet.)

      Note also that in Python explicit conversions are rare. Also in Smalltalk, LISP, Scheme, and, I believe, Ruby. Those are the only current languages that I['m informed enough to have an opinion. If C requires explicit conversions to add a float to an integer, I'd be surprised, as I don't remember that as being necessary. OTOH, I haven't used C much in a long time, and I don't use floats (or doubles) often, so I wouldn't be that surprised. (Appalled would be closer than surprised.)

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    87. Re:Seems reasonable by Anonymous Coward · · Score: 0

      I'm going to point out something that should be obvious about this article but apparently isn't.... the paper cited, where "the accuracy of results declined from six significant figures to one significant figure during the running of the program" is 13 years old. You can't cite a 13 year old paper and claim that these are current problems, you have to prove nothing has changed in 13 years. Honestly, this is the point where my advisor would say, "Find a reference that was published in a year starting with a 2", this is a straw man.

      As for the assumption that someone who codes C in their spare time is going to understand what the code is doing (or more importantly why, and how it affects the accuracy), no, you won't be able to spot check it and say "Look, I found the bug, global warming is a farce!".

    88. Re:Seems reasonable by Mashdar · · Score: 1

      Um... Troed you could certainly provide a purely syntactic review of the code, but the semantics would be beyond your scope of knowledge, and there would be no test cases to troubleshoot, since the system cannot be solved. And while syntactic review would be worthwhile, without having a semantic understanding you would be incapable of identifying the more elusive errors.

    89. Re:Seems reasonable by Troed · · Score: 1

      Then you shouldn't do the code, just as you shouldn't do any number of other irrelevant things for the science you're hired to produce.

      I've got patents. Quite a few actually. I haven't written a single one of the actual patent applications.

    90. Re:Seems reasonable by Mashdar · · Score: 1

      I had not previously read the Lenski Dialog. Thanks for the link! it was a good (smile inducing) read.

    91. Re:Seems reasonable by Troed · · Score: 1

      On the contrary, I'm quite familiar with it. I'm a Software Engineer, a Mechanical Engineer and I work with Research. Do we need to put caps on more words? ;)

      Your field is not magic. Your field is not special. Your field can be aproached just as any other field.

    92. Re:Seems reasonable by Troed · · Score: 1

      Yes? Feel free to hire me and I'll show you how it's done.

      (To start with, you break down the system into simpler systems, until you reach testable components)

    93. Re:Seems reasonable by emanroga · · Score: 1

      Right, but your financial codes and routers have extensive ECC hard-coded into the software and hardware. The CRU code in particular had no error handling, and some errors were not easily detected in the output - like reporting overflow errors as negative numbers for an intermediate value. The real point is the reduction from six to one significant digit, which is a much better measure of the software's efficacy than saying it has errors. People who don't like this will say it will jack up the price of science because you will need professional programmers. And it will. However, I suspect the market will find a good solution that is robust and reasonably customizable, at least within specialties.

    94. Re:Seems reasonable by Troed · · Score: 2, Insightful

      Yes - but the fact that there are classes of errors (specially those pertaining to the construction of the model) that would be hard to find without domain knowledge does not invalidate the fact that you'll be able to find other classes of errors.

      Errors as those detailed in the article.

    95. Re:Seems reasonable by wealthychef · · Score: 1

      I think you are both right, and this all goes to show that this is a very complex subject and there are no absolute right answers. I personally would favor the release of the code and data as a requirement. Mashdar, just because some idiot claims you are wrong in the newspapers doesn't mean you have to respond. In fact, they are already doing so, so no change is likely. However, it's possible you will be shown a weakness in your argument by releasing the code, and I know you would like that, as a lover of knowledge.

      --
      Currently hooked on AMP
    96. Re:Seems reasonable by JumpDrive · · Score: 1

      I think you just made his point.

    97. Re:Seems reasonable by wealthychef · · Score: 1

      So if the scientist who wrote the computer model isn't a qualified software engineer and doesn't have intimate knowledge of the workings of processor architectures, computer languages and all that, then any results he gets using a computer program of his own making are not to be trusted?

      Isn't the whole point of peer review to remove the need to rely on trust in order to promote knowledge? Hell no, I don't trust your results, and yes I want to know how you got them. The extent to which you won't tell me is the extent to which I become very suspicious.

      --
      Currently hooked on AMP
    98. Re:Seems reasonable by pod · · Score: 3, Insightful

      Exactly, although I echo the sentiment the presentation could have been better.

      Everywhere we turn there are people who think they are smart telling us what to do and what to think, because they know what is best for us. They're the experts with years of training, and we know nothing. Do not question the high priests, do not pay attention to the man behind the curtain.

      This is just following the general trend of late, culminating in "this time, it's different, trust us". We think we're smarter, we're better, we have more tools, we have more knowledge, we have more insight, and that things are somehow fundamentally different, and that today we can fix all the problems that our predecessors have been unable to fix in centuries past. In the end, the more we "fix", the more we break.

      As a lay person, I know we cannot predict what the weather will be like next week, and all I see around me is global climate hysteria. I don't see science, I don't see deliberation, I don't see openness, I don't see debate. I see politics and dogma. Enough of this "you're not smart enough to understand so just trust me" nonsense. Enough of this "science by consensus". It doesn't exist, and it's not scientific anyways even if it did.

      Show everyone the science, open up the process, accept opposing data (heck, accept ALL legitimate data to begin with), interpretations and views, so we can all see why it is that we need to undertake a complete reorganization of economy, society and personal life, at a cost of trillions of dollars and undoubtedly much resulting misery and suffering.

      It was global cooling and visions of frozen wastelands and a new ice age. Where did that go? Then it was the ozone hole that would fry anyone not wearing SPF1000 sunblock. Where did that go? Then it was global warming and sea level rise that would make disaster movies seem like documentaries. Where did that go? Now we have the amorphous all-encompassing "climate change".

      But THIS TIME, it's different. Really. This time, we're smarter, and we have better science, and we've learned, and we know better, we know for sure. Trust us.

      Well, sorry. You're gonna have to do better than that.

      --
      "Hot lesbian witches! It's fucking genius!"
    99. Re:Seems reasonable by wealthychef · · Score: 1

      lets not kid ourselves and pretend that every little program is fully commented, sanitizes inputs, and has an implementation of clippy

      Snarky references to Clippy aside, let's not kid ourselves and pretend that just because you release your source code you have to respond to every idiot out there who questions it. I mean, anyone can read a science journal right now and see some of your methods and object on ridiculous grounds, and what do you do? You consider the source and act appropriately. If someone doesn't know how to use the code and challenges you, you can just say "You don't know what you are talking about, fuck off." You're apparently pretty good at that already.

      --
      Currently hooked on AMP
    100. Re:Seems reasonable by Citizen+of+Earth · · Score: 1

      It's like having Joe Sixpack looking come into your workplace screaming that your professional work is bullshit and you should be fired.

      If my work was being used to justify the transfer of trillions of dollars to genocidal dictators, then Joe might have a point and a right.

    101. Re:Seems reasonable by wealthychef · · Score: 3, Insightful

      Just release your god damned code and don't worry about it. What are you afraid of? The sky will not fall. Your reputation will not crumble. Of course it's not perfect, duh. The point of releasing it is not to have people check for perfection, it's to see if there is a bug that could explain your surprising results. It's part of defending your results. Deal with it. I don't trust you.

      --
      Currently hooked on AMP
    102. Re:Seems reasonable by Citizen+of+Earth · · Score: 1

      print f(1, 2);
      f (a, b):
      print $b + $b;

      Here is my scientific model for proving AGW. You can examine all you want, but I guarantee it contains no programming errors. Run it as many times and with as many different data sets as you want. Suck on that, AGW deniers!

      10 PRINT "AGW IS CONFIRMED"
      20 END

    103. Re:Seems reasonable by CptNerd · · Score: 2, Interesting

      What it does is, it eliminates one possible cause of errors. Software that doesn't do bounds checking, for instance, is like uncalibrated measuring instruments. Writing 100 numbers into an array of 10 integers will cause 90 numbers to be written into random areas of memory, and you can't be guaranteed that they aren't affecting other parts of your model, including parts that have been calculated previously and which are now overwritten by false values. I saw something just like this when converting a legacy communications package from Fortran to C, all through the code the previous programmers had defined 16 character strings and were writing 256 characters into them, due to a change in one constant that wasn't used to define the array bounds. Fortunately the problem caused the C code to crash, but the Fortran code would occasionally produce strange results, caused by this coding error.

      I've been a programmer for 30 years, and a science geek for longer than that, and I would assume that a scientist would be in favor of eliminating as many potential errors as possible in the instruments they use, whether the instruments are hardware or software.

      --
      By the taping of my glasses, something geeky this way passes
    104. Re:Seems reasonable by monoi · · Score: 1

      Yes, fair enough, I seem to have a small problem with Proper Nouns today. Call it a German moment, if you will.

      It's not my field any more, but here's an example from when it was: I wrote some code to perform a particular Monte Carlo simulation. Within the code were a number of functions which performed specific numerical computations. They were deterministic, so I unit tested them as you might expect using known results computed by an alternate technique. I found bugs, I fixed them.

      However, the core function of the program, the simulation itself, I did not and could not know the "correct" result of, as it was a numerical simulation using pseudo-random numbers (so repeatable, but possibly repeatably wrong).

      I had no other way of generating a specimen set of results. So, how was I supposed to test it? If you can offer a practical solution, I'll be impressed. I'm expecting some more hand-waving and hubris, however.

    105. Re:Seems reasonable by mathfeel · · Score: 2, Interesting

      You argument is void. A bug is a bug. Either it affects the outcome of the program run or it doesn't - and I still don't need to know anything about what it's supposed to do to verify that. You just need to re-run the program with a specified set of inputs and check the output - also known as verified against its own test suite.

      Unlike many pure-software case, scientific simulation can and MUST be checked against theory/simplified model/asymptotic behavior. The latter requires specialized understanding of the underlying science. The kind of coding bug you are talking about will usually (not always) result in damningly unphysical result, which would the immediate prompt any good student to check for the said bug. Heck, my boss usually refuse to look at my codes when I am working on it (besides advising on general structure) so that even if my code got the expected result, he can still perform an independent inspection.

      --
      The only possible interpretation of any research whatever in the 'social sciences' is: some do, some don't
    106. Re:Seems reasonable by Troed · · Score: 1

      In your case, you should have two different models that both should result in the same answer - that is one of the few ways you can test your core idea.

      For the models, you could have two different implementations, and they should arrive at the same answer.

      Now of course, I don't expect you to do that. You should however document enough of your thinking and release the code for your implementation. Someone else will do the other model, and/or the other implementation. That's reproducability.

      Bar all of the above - someone could still check your original source code for errors. A code audit.

      You know the above to be true, so why argue against it?

    107. Re:Seems reasonable by Troed · · Score: 1

      You're confusing two different issues as being one and the same. The fact that your model might require domain knowledge does not invalidate the fact that your code (the specific implementation) should be verifyable as well. And no, there are many bugs that would not be immediately obvious especially if they're in the middle of calculations.

      The good old "9999" for NaN when doing databases has a habit of suddenly being treated as a real value decades later by someone who's never experienced said usage.

      (I chose that example on purpose)

    108. Re:Seems reasonable by quanticle · · Score: 1

      If that's your test, then I'm pretty sure that every piece of software ever made would fail. Not all bugs are equally significant. Also, you can't assume that all effects on the outcome are equally significant. An implicit approximation that reduces the number of significant figures from six to four is not nearly as important as a module being sent measurements in the wrong units.

      Both bugs affect the outcome of the program, but I know which one I'd rather have in my code.

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
    109. Re:Seems reasonable by quanticle · · Score: 1

      At the same time, how can you say whether the bug affects the output of the program enough to invalidate the results? Lets say you find a bug and remove it. The program output 0.3452 before the bug was removed. Afterward, the program outputs 0.3754. How do you judge whether that's a significant enough divergence to invalidate the results of the original program?

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
    110. Re:Seems reasonable by quanticle · · Score: 2, Interesting

      A more important concern is that someone else who does have your background should have access to your code. That would be part of "peer review". Otherwise they're taking your computations on faith, with no way to reproduce.

      I fully agree. Perhaps something that scientific journals could do is to create a source code repository that allows researchers to publish the source code used to create the results along with the results themselves. At the very least, other researchers would be able to look at the code and see if there are any glaring errors or omissions.

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
    111. Re:Seems reasonable by quanticle · · Score: 1

      I actually _have_ gotten personal responses from Theo DeRaadt on some OpenBSD issues but they all have the general form of "you're not interesting, don't waste my time".

      Funny, I thought that was the only type of response Theo De Raadt was capable of making.

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
    112. Re:Seems reasonable by monoi · · Score: 1

      The point I'm trying to get across is that the reason for computer simulation is often as much to test (and perhaps supercede) known models as to "get a number out". So, if the only testing option is to compare against another model, then that's really no option at all.

      Document thinking and code? Fully agreed. Release the code? With you 100%, anyone who refuses is an intellectual fraud. Code audits? Yup, they ought to be routine in my opinion (and I'm pretty sure they aren't)

      However, none of those address your original claim, which is that all code is genuinely testable in the usual sense ("You just need to re-run the program with a specified set of inputs and check the output"). I think that in this special case, which is the case that TFA discusses, that is just not true.

      For 99.9999% of other problems you are indeed totally correct however, and I agree. Also, none of this excuses the rank laziness and ignorance of many academic programmers.

    113. Re:Seems reasonable by quanticle · · Score: 1

      Given the increasing importance of computer models in scientific research, I think that we need to make writing good code as important as writing good research papers. No journal would accept research that was filled with grammatical errors or lacked citations. So why are journals accepting results created by poor quality code? Attaching equal importance to the code and the paper would go a long way towards alleviating some of the problems you have described. For example, having to submit your code for journal review right along with your paper would motivated students to write clean, structured code that would be more fit for public release. Thus the student would have to spend less time doing cleanup after the fact.

      Another thing that would help is encouraging citations for code as well as for results. That way, if a graduate student comes up with a particularly original computer model, then they can point to the number of citations their code has received, in order to show its significance.

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
    114. Re:Seems reasonable by Jane+Q.+Public · · Score: 1

      I will avoid the ubiquitous automobile analogy... and use airplanes instead.

      For the most part, Troed is correct. It is quite possible to do a diagnostic and checkup on a (single-engine, let's say) airplane and tell if it's fit to fly... without knowing a darned thing about what the pilot intends to do with it. In software parlance that is finding defects or "bugs".

      On the other hand, it is quite possible that the airplane will not do what the pilot intended it to do. For example, if the pilot intends to fly inverted or do other acrobatics, it might be necessary for the ailerons to move in one direction or the other more than normal. If they were not actually made to do that, then that is a design flaw. In software terms, such a design flaw would correspond to a logic error in the program, not a "defect".

      So you are really talking apples and oranges here. Yes, it's possible to find defects or inconsistencies in the software, without knowing much of anything about what the software is supposed to do in a real-world sense. You just know that if the software is working properly as written, if you put in this input, you had better get that output, or something is wrong.

      On the other hand, in order to be useful, the program had better do what it was designed to do. And to know if it was properly written takes knowledge about the intent. And testing may require such things as giving it known input, and comparing the output against values that are known to be correct in this context. (As opposed to what the software engineer knows it should be given the actual algorithm... those are two very different viewpoints.)

    115. Re:Seems reasonable by Xyrus · · Score: 2

      This is the reason why scientist feel they are wasting their time engaging the public.

      You clearly already have your mind set in stone. You've got the whole thing figured out backed by 100% solid conspiracy theories. No amount of data, review, validation or verification will change your mind.

      It doesn't matter if the code is open. You don't care. You're not going to look at it because is so much easier to wrap yourself in a blanket of your own crazy world view and discredit anything that might just impact your comfy existence.

      In fact, the arctic could melt completely and 20% of Florida could go underwater and you'd still deny anything was happening or that anything should be done about it.

      Scientists would just be wasting valuable resources in dealing with people like yourself. Just like democrats would be wasting their time talking at a Tea Party convention, or Republicans would be wasting their time at a MoveOn.org convention.

      You will not be convinced. Ever. Like most followers of McIntyre and Watts. It doesn't matter what the scientists do it will never be enough. Even God itself would not convince you otherwise.

      Fortunately, the research will continue with or without your support.

      ~X~

      --
      ~X~
    116. Re:Seems reasonable by Anonymous Coward · · Score: 0

      What if we assume they don't use C or any other language that allows you to do strange shit with pointers and write out of the bounds of an array? Those features have their place but are also a major source of bugs, I can imagine scientists would choose to use a language that eliminates these problems (e.g. by throwing exceptions and not allowing pointers, like Java) because they won't need these features much.

    117. Re:Seems reasonable by Thiez · · Score: 2, Insightful

      > Then it was the ozone hole that would fry anyone not wearing SPF1000 sunblock. Where did that go?

      We stopped using the CFCs that were identified as a major contributor to the problem and it appears that is working. Oh sorry, I don't think that supports your argument.

    118. Re:Seems reasonable by Citizen+of+Earth · · Score: 1

      I merely pointed out that this work is extraordinarily important and requires way more scrutiny than boring science because the stakes are so high. I was also commenting on the bizarre social-engineering experiment that was intended to make us reduce our CO2 emissions. Surely you're not defending that. Cutting CO2 is a notion that is doomed to fail. If anything, our CO2 emissions will grow exponentially throughout this century, not being reduced by 80% from 1990 levels. The more sensible approach would be to counter-act the AGW with engineering. This approach can actually work in reality and has a price tag about 1/1000th that of the CO2-reduction approach. Also, transferring large amounts of money to corrupt third-world dictatorships could have some negative repercussions on civilization as well as the environment. Do climate scientists even publish papers on these subjects? You certainly have a hair trigger.

    119. Re:Seems reasonable by 10101001+10101001 · · Score: 1

      there are well funded lobby groups and others with too much time on their hand looking for ANYTHING that is wrong.

      Errors are only errors if they are reported by the "right" people?

      No. Errors that don't change the result (for example, badly formed functions that don't abort on negative numbers) or change the result marginally should be fixed or at least considered. But, "when just one error -- just one -- will usually invalidate a computer program", who cares; clearly all that global warming stuff is absolutely wrong unless you can prove your software has zero defects.

      Do you want to know how many questions Linus Torvalds has answered for me? Zero.

      Sounds like Linus Torvalds has something to hide. If he's receiving public money, he should be answering any question I have.

      I actually _have_ gotten personal responses from Theo DeRaadt on some OpenBSD issues but they all have the general form of "you're not interesting, don't waste my time".

      Nevertheless, I rely on OpenBSD. The fact that Theo has neither the time nor the interest in having a deep meaningful conversation with me about his code neither changes the quality of his code nor prevents him from releasing every 6 months, on schedule.

      Which is relevant how, again?

      I don't think that there is an expectation that scientists stop doing their day jobs to do software support for people. I think there is an expectation that publicly funded research used to set public policy be easily available to all comers.

      I think the expectation should be that anything used to set public policy, publicly funded or not, should be easily available to all comers. Unfortunately, that isn't happening any time soon.

      I'm a bit frustrated by the apparent contradiction. For the first time perhaps in history in the USA, you have armchair folks trying to do technical audits of scientific tools, research, and publications -- for free.

      Um...there's been armchair folks trying to do technical audits pretty much forever. The things that have changed are (a) more armchair folk can now rant on their blogs about how much more they know about atmospheric research than people whose focus is entirely atmospheric research and (b) lobbying arms can fund those bloggers or use such blogs for their own marketing--this really isn't massively different than the yellow journalism or pulp rags of the past; it's just a matter of degree.

      I thought the "normal" problem in America is that the population is too apathetic to care and too stupid to provide any critical analysis. And yet we see this happening more and more frequently and the climate-science establishment is circling the wagons instead of celebrating the fact that there are a handful of people that for once give a damn about interesting research tools and methods.

      Funny. The vast majority of Americans are too apathetic to care and too stupid to provide critical analysis. I'm certainly in the latter category. The circling the wagons is a backlash against the way evolution has been treated by those with an agenda against it; ie, climatologists have learned the lesson that those with an agenda are perfectly willing to take even easily refutable "facts" and the vast apathetic and stupid American population will give it equal weight as tons of evidence provided by scientists. With a seeming need to fight constantly to "win" in the court of public opinion, is it any surprise that many would rather closet the information against those more interested in trying to sell their snake oil than seeking some truth?

      I must concede that there are some downsides to discussing your opinions and findings with others: When people disagree with you, it ends up taking some of your time.

      Or a lo

      --
      Eurohacker European paranoia, gun rights, and h
    120. Re:Seems reasonable by philipgar · · Score: 1

      The first part that you propose is COMPLETELY unreasonable There is no way conferences or journals would remotely be able to "peer review" source code as part of their review process. You must think that scientific peer review is that strong, it really isn't. They read over the papers, and much of their decision is based on their own biases, although your ability to sell your idea in a paper can help it get in. I know conferences, and likely many journals have papers with grammatical errors in them (have you ever read some of the foreign submissions to these things?), they get the most glaringly bad ones out most of the time, but they don't remove them all. The lack of citations is sometimes seen, but not all reviewers know of all the relevant work that is out there. Often times as a submitter to a conference or journal, you look at who will likely be on your review committee, and ensure that you are citing their work, even if the work is only tangentially relevant.

      As far as "peer review" for correctness goes, that tends to be a joke depending on the field. Scientists aren't trying to reproduce your experiments before seeing if they should accept or reject a paper. They have to take it on face value that the work you do is real. Occasionally you'll hear stories about students who have been faking results for years, getting multiple publications etc. It's not until years later when this gets revealed, and the papers get invalidated. This can often ruin careers of relatively innocent bystanders, such as professors who trusted that their student was doing the work they said they were doing. There just isn't time to do all the fact checking necessary. Trying to add a code review to the process is akin to asking a scientist to stand behind another scientist whenever they're doing field or lab work to verify that their procedures are correct. Some things are taken at face value as being accurately done or not.

      As for simulations, at least in my field (roughly computer architecture), the simulator you use or base your research on can greatly influence what people think of your results. Using older less detailed simulators will get your work knocked down pretty quick. However, I will say that credit is MOST DEFINITELY given to the authors of the simulators used. This is generally done through the citations when describing the simulation infrastructure, or methodology of the work. Most simulators have a single paper written describing how it works etc. This is normally written a few years after the simulator was developed, and the group that developed the simulator has already gotten most of the use they could get out of the base infrastructure.

      Phil

    121. Re:Seems reasonable by 10101001+10101001 · · Score: 1

      Scientists would just be wasting valuable resources in dealing with people like yourself. Just like democrats would be wasting their time talking at a Tea Party convention, or Republicans would be wasting their time at a MoveOn.org convention.

      This is one part I disagree with. There is something of value when one sees "the enemy" and discovers that although their methods are different, they are interested in the same goals as you. When one starts to see that, one can start to work with "the enemy" towards those goals. As long as you believe your methods are sound, you should be able to stick to them, and perhaps your cooperation will convince them that your methods are better. Is it a perfect solution? Does it always work? Of course not. But, in the end, the people you speak of are part of the community of people which acts and are acted upon. Without consideration for them, how can you truly justify what you do to them? How do you really expect to reach your goals if you refuse to work with them?

      --
      Eurohacker European paranoia, gun rights, and h
    122. Re:Seems reasonable by Anonymous Coward · · Score: 0

      It depends on the paper, and the program, and the result. In short, is the difference statistically significant? Is it mathematically significant?

    123. Re:Seems reasonable by budgenator · · Score: 1

      Suppose I'm adding a floating point to an integer. Is that a problem? Does it ruin everything? Or is it just sloppy coding that doesn't make a difference in the long run? Understanding what the code is doing is required for you to do an audit which will produce any useful results.

      When I was about 5 years old, I decided to help out my father by filling up the gas tank, I used the garden hose; mixing integers and floats tends to be like that in most computer languages. I also learned to program Fortran on punch cards, RPG to me is a computer language and DOS was an operating system on the IBM 360, so it must be senility that prevents me from recognizing your obvious skills.
      There is a book you may find interesting Chaos: Making a New Science, James Gleick, the first chapter talks about how minuscule errors in a computer program are potentiating themselves in a feedback model.

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    124. Re:Seems reasonable by poopdeville · · Score: 1

      I have studied both physics and CS and I can hardly imagine a physics curriculum (especially one in theoretical physics) which doesn't include any course in numerical methods.

      Never mind that any physicist worth his salt knows that the real numbers are uncountable, and a computer is finite.

      --
      After all, I am strangely colored.
    125. Re:Seems reasonable by Toonol · · Score: 1

      The quasi/anti-science mad-dog drivel that makes up almost all the rest of the discussion is what they're circling the wagons about. It's like having Joe Sixpack looking come into your workplace screaming that your professional work is bullshit and you should be fired.

      And actually having people in power listen to him.


      But what if Joe's RIGHT? Joe may be smart. Joe may know how to program. Joe may have caught your error.

      Oh... never mind. He's unwashed. Got it.

      Crap; come to think of it, I'm not sure I'm worthy to respond to you.

    126. Re:Seems reasonable by poopdeville · · Score: 1

      The first part that you propose is COMPLETELY unreasonable

      No, it's not. Look up "literate programming". Instead of using LaTeX to typeset mathematics using mathematical symbols, typeset them using interpretable code.

      --
      After all, I am strangely colored.
    127. Re:Seems reasonable by jwhitener · · Score: 1

      "'m a bit frustrated by the apparent contradiction. For the first time perhaps in history in the USA, you have armchair folks trying to do technical audits of scientific tools, research, and publications -- for free."

      One of the problems I see, is that there hasn't been one single valid argument from the "deniers" or whatever you want to call the armchair folks. Every time the media and blog world latches on to some "gotcha" concerning climate change, it is shouted around the world 10 times, aired on Fox News for weeks, and the damage has been done: namely, public perception of climate change has been altered permanently. And yet, when I follow up on whatever the "gotcha" was, it is explained by numerous other blogs, sometimes the scientist himself, and eventually....usually:) ends up on the Jon Stewart show, making fun of the armchair deniers.

      When attorneys go about jury selection, if those jurors have been exposed to aspects of the case, whether true or false aspects, the juror is dropped. I feel like climate change is on trial, and every time the attorneys start jury selection, the town they are in has a big media storm about the "client" "climate change", and all the jurors are tainted from exposure, so they have to move the trial to a new city.

      There is a reason that science has peer review, and multiple people/teams constantly checking each others work. It is because 'armchair analysis' isn't skilled enough (or knowledgeable about the data set) to do it. And when you combine 'not skilled enough to do it' with some of the armchairs that have clear political motives, you end up with media storms based on bad information, yet directly influencing policy decision because of public perception.

      It is really sad actually. I would normally favor more transparency, more eyes watching. But when it is obvious that we can't trust the media to double check sources, and check the credentials of their guests, or to take into account what the consensus is versus what one non-peer reviewed person is saying..... transparency in that case hurts the public more than it helps.

      I think it would be amusing for a news show to actually have their guests debate, but the guest ratio would need to reflect the actual support for each side. You'd have Fox News with their 1 denier (who is most likely trained in something completely different than climate, like econ ) debating 5000 climate scientists.

    128. Re:Seems reasonable by joocemann · · Score: 1

      Careful, you are getting dangerously close to the conceited, "Holier than thou" attitude that many climate scientists are spewing out. You really don't know what you're talking about when you say the op doesn't know what he's talking about. I'm a software engineer, finding bugs, even when you don't know what the code is doing, is a lot easier than you would think.

      Climate scientists aren't 'holier than thou' in attitude; that is a creation from the defensive perception of the ignorant.

      The attitude is that they have very compelling evidence about something that can only be understood with a very strong education in that field and that it is too difficult to articulate the complex information into simple information.

      "holier than thou' implies the scientists are of godly origin and that the 'thou' (the ignorant) do not have accees to godliness. In reality you have access to education and can also gain 'godliness'.

      I'm sorry if it hurts to be ignorant and face people who know what they are talking about. I cannot fix that for you, but one thing that can help is a bit of humility and modesty; these are the tools that people should use more often when facing the such a complex and diverse world. It not only refrains the ignorant from silly skepticism, it helps the ignorant feel 'normal' in their situation and feel less negativity from it.

    129. Re:Seems reasonable by budgenator · · Score: 0

      The more non-linear and complex a program is, the more important formal validation becomes; it's called "sensitive dependence on initial conditions" or "the butterfly effect".

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    130. Re:Seems reasonable by budgenator · · Score: 1

      Watts, oh yes he's outthere, how can you trust a GW denier that runs a headline like UAH global temperature posts warmest January, January 2010 UAH Global Temperature Update +0.72 Deg. C?

      --
      Apocalypse Cancelled, Sorry, No Ticket Refunds
    131. Re:Seems reasonable by ChrisMaple · · Score: 1

      About a year ago I worked with a loudspeaker box design program. I found some peculiarities in the results, but at first I wasn't very concerned. Eventually I found I could get different results without changing the inputs. I became concerned and examined the code. The parts generating the display I didn't understand, the parts getting the input I didn't look at. I studied the formulae that determined the frequency response, and part of it was just plain WRONG.

      You don't have to understand everything to find and fix errors.

      --
      Contribute to civilization: ari.aynrand.org/donate
    132. Re:Seems reasonable by ChrisMaple · · Score: 3, Insightful

      Something like a climate model has a very exclusive audience

      The final audience of a climate model is (economically) every person alive. If the models are as good as some climatologists claim, the final audience is every living thing on earth.

      Making their code public doesn't mean they have to answer their phone. But they're going to have to answer to someone if it can be shown that their code deliberately produced false results, as was the case with the "hockey stick" scandal.

      --
      Contribute to civilization: ari.aynrand.org/donate
    133. Re:Seems reasonable by Anonymous Coward · · Score: 0

      The methods are documented. If you want the code, recreate it! Your argument that there are people smart enough to use it also implies these people are smart enough to recreate it.

    134. Re:Seems reasonable by Anonymous Coward · · Score: 0

      You just need to re-run the program with a specified set of inputs and check the output - also known as verified against its own test suite.

      If I knew what the output was meant to be, it wouldn't be research.

    135. Re:Seems reasonable by Anonymous Coward · · Score: 0

      I'd like to see a sort of scientific equivalent to an open-source licence. Something like:

      "You may use this code, but if you publish any papers based on your use of it, you must release any modifications you made to it."

      That'd give you however much time it takes you to publish a paper - and once you did, the code would be free to anyone else who wanted to develop it further, so long as they abided by the same condition. Viral like the GPL, but applied to data products rather than compiled binaries.

    136. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Release the code, and if some dumbass decides to dig into it, you either are in the position of having to waste time answering ignorant questions, or you ignore them, giving them ammo for "teh code is BOGUS!!!!" Far easier to just keep the code in-house, and hand it out to the few qualified researchers who might be interested. Unsurprisingly, a lot of scientific code is handled this way.
       

      Code is a tool, just like the rest of your lab equipment. If a chemist had some grand discovery, but refused to tell anybody WHAT equipment was used in the experiments, it's not reproducible. Thus, it's not scientific (or at least peer-reviewable) and this leads to the possibility that the tools are somehow altering the results. For example, sometimes you'll get a vastly different result in a chemistry experiment if you were to use a plastic beaker rather than a glass one.
      The source to the software is no different than knowing the specifics of a physical procedure, including the method of manufacture of that equipment. As another example, problems in manufacturing process for lab equipment can pollute and taint your research, those things need to be known.

      It'd be a whole different story if the public wasn't filled with a bunch of ignorant whack-jobs, trying to smear scientists. When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox. If you want access to the data and the code, go to a school and study the stuff. All the doors are open then. The price of admission is just having some vague idea wtf you're talking about.

      A degree from a school does not a scientist make. There have been plenty of scientists who have produced great work despite not having the sanction of a school or the title of "scientist". Yes, school is a good idea. No, it is not required.

      I certainly understand the desire to avoid the criticism of the idiot self-proclaimed 'scientists'. It's pisses me off every time our paper publishes a story on climate, and next thing you know a hundred assholes who failed high school algebra suddenly think their anecdotal observations have "proven" or "disproven" global climate theory. But that's no reason to not disclose your mathematical procedures or the methods you used to obtain your results.

      My research group has two major comparison studies of different climate models. We pulled in data from seven models from seven different universities, and analyzed the differences in CO2 predictions, among other things. The data was freely and openly given to us by these other research groups, and they happily contributed information about the inner workings of their models. This, in my book, is what it's all about. The relevant information was shared with people in a position to understand it and analyze it.

      Yes, yes you "analyzed the differences". And what, specifically, were the methods used? The formulas which were applied? In what order?
      Quite frankly, the likliehood of a "hack" digging up something in your source code is equal to the likliehood of the same "hack" digging up something in your pen-and-paper math that you published. It's not likely to happen- hacks very seldom bother reading the details. They'll smear you for where you got the data, or possibly question the validity of the data itself (without ever examining the procedures used, of course) but you'll get that no matter what.
      In fact, without publishing that data, a hack could use it against you with ease. All the hack has to do is say "Well, your super-secret code is obviously rigged, because I ran the same exact data through my own super-secret code, and it disagrees". So it really is, ultimately, in all of our best interests to have that data in the open.

    137. Re:Seems reasonable by Anonymous Coward · · Score: 0

      I'm seeing a pretty clear parallel between your view of how the code can be analyzed and the AGW ignoramus skeptic view of AGW science as a whole. I don't trust arguments for or against AGW that aren't by people with educations to demonstrate they at least *might* know what they are talking about.

      Which makes you a pompous ass, and not a scientist.

      All I should have to supply you is my data, methods, procedures, and results. Who I am, or what my experiences are is completely irrelevant. All you have to do is look at the data I've supplied, and you can determine if my point is valid or not.

      But when you hoard your code, you're no better than the run-of-the-mill hacks. Do you also refuse to reveal your data, how it was gathered, what procedures were used, etc.?
      If your results are not reproducible by a 3rd party, then your results are not science.

      Stop asking me to believe your code works based on my faith in your abilities. That's not science, it's religion.

    138. Re:Seems reasonable by belmolis · · Score: 1

      Yes, you can find some bugs. You won't always be able to tell whether they mean anything. If you find corner cases in which things don't work, you won't necessarily know whether those cases can occur with real input. More importantly, if you don't understand the science, you won't detect the bugs that are errors in the correspondence of intention to program. If the program solves the wrong equation or makes assumptions different from those of the model, there's a problem, but pure program analysis won't detect it.

    139. Re:Seems reasonable by Lunzo · · Score: 1

      Engineering a solution is a much worse idea than trying to cut emissions. Scientists have tried that before and there are almost always unintended consequences. See: Cane Toads being introduced into Australia.

      I agree with you that giving money to dictators in 3rd world countries could have unintended consequences and is likely a bad idea. I disagree with the do nothing approach too.

      And now we're back to needing to reduce our dependence on fossil fuels at home.

    140. Re:Seems reasonable by apoc.famine · · Score: 2, Insightful

      Spoken like a Software Engineer!

      A bug isn't just a bug. Either it affects the outcome of the program run or it doesn't. The issue is that if you don't know what the outcome should be, you won't be able to tell. Nobody in scientific computing just "re-run(s) the program with a specified set of inputs and check(s) the output". The input is 80% of the battle. We just ran across a paper which showed that the input can often explain 80%+ of the variance in the output of models similar to the one we use.

      So there's our dilemma - what we feed the model is very, very, VERY limited. If something crashes or returns an anomalous result when fed a string instead of an integer, we'll never notice. Why? Because we'll NEVER feed it a string. If all the climatological data we get to feed the model is from NCAR reanalysis, we'll make damn sure the model can handle that data input. Might there be serious issues if another format is fed it? Sure. But that will probably never happen.

      Scientific programming is garbage, by and large. Perform a code audit on it, and you'll find a lot of bugs. But largely, the parts that are in active use are relatively bug free. Why? Because we compare our output with that of other modeling groups. In my office there are two posters comparing seven models from seven different universities. I can tell you who treats oceanic uptake of carbon the same as our group does, and who treats it differently. If one model was a major outlier, we'd have identified that, and asked them what code they use to calculate oceanic carbon uptake.

      This is science, not Software Engineering. We troubleshoot and find bugs by comparing OUTPUT, not CODE. It's only when we find that output is significantly different that we look to code to figure out why. It's akin to having 7 browsers all try to render a page. If 5 of them render the same thing, one is close, and one doesn't look anything like the others, your first guess is to take a look at what that one oddball is doing. The same goes for scientific code.

      The people writing it aren't software engineers, by a long shot. But if they really screw up, everybody knows. It's not through a code audit - it's because their output doesn't match either what's observed in nature, or what other models output. Would rigorous code audits make our code better? Sure. Is CS volunteering to come do it for us? No. Would we have the time to deal with their nit-picking? No. We validate output, not code. And largely, it works.

      --
      Velociraptor = Distiraptor / Timeraptor
    141. Re:Seems reasonable by mhwombat · · Score: 1

      Yes, running the code against a test suite is a no-brainer - if you've got a test suite.

      There can be problems with scientific code which a software engineer without an understanding of the science would not find. These are to do with the accurate formulation of the scientific model you're implementing. If a software engineer is given a test data set for validation with broad enough applicability, of course they will find ANY bug, but if the researchers have such a test data set they will probably notice the existence of bugs themselves! In many cases such data sets, with results that are known to be correct under the model, are difficult to generate.

      I do however think that a software engineer can find a lot of bugs that are simply bugs, without having domain knowledge.

      In my experience the more common reason people hide their code is that they see it as their competitive research advantage. The ethics of this are open to debate.

    142. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Dang, I've modded so I'm posting AC.

      The code for the NASA/GISS Model E (one of the main General Circulation Models) and other GISS software is available here as is much of the NOAA code at their site. If you want to invalidate global warming go for the Model E code because much of the rest of it is just about corroborating evidence.
       

    143. Re:Seems reasonable by xiox · · Score: 1

      Fortunately my code isn't doing much - it's mostly simple scripts to automate various other software and a few basic models. I have released some of my more complex software and do an OSS project in my spare time.

      The main problems with releasing code is having to support it. It takes a lot of time. Code often contains hard coded paths, assumptions and so on, which would need to be documented before it was safe for others to use. That just takes too much time for the average researcher. Also for code working in interesting areas, you need some time to have the code to yourself to exploit that area of research and not give others advantage.

    144. Re:Seems reasonable by Anonymous Coward · · Score: 0

      The NASA/GISS Model E code (one of the main climate models is available here. The "hockey stick" graph (over 10 years ago now) has largely been vindicated by subsequent studies.

    145. Re:Seems reasonable by Troed · · Score: 1

      That's, again, a statement that displays ignorance as to how software development works. The fact that you have a model which should result in novel output has nothing to do with the testability of the software the model is implemented with.

    146. Re:Seems reasonable by Troed · · Score: 2, Insightful

      Sorry, no. You're just displaying your ignorance above. You cannot look at the output and say that just because it fits with your preconceived notions it's therefor correct. You do not know if you have problems in a farhenheit to celcius conversion, a truncation when casting between units etc (yes, examples chosen on purpose). You might get a result that's in the right ballpark. You might believe you have four significant digits when you only have three. Your homebrewn statistical package might not have been audited by a statician etc.

      You simply do not know all the things you claim above that you do know.

    147. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Actually it supports his argument perfectly. That hole in the o-zone layer was going to take decades to repair itself apparently, and all the while we were going to fry - or so I was told as a school-child. But nope, its actually doing pretty well we're now being told - which is fair enough, science corrects itself as more data becomes available - or at least it would be fair enough if we hadn't all been scared shitless by rhetoric-spewing politiks and their scientist pets.

      You can only cry wolf about the end of human-kind so many times before we all get a bit cynical about it - and you only have yourselves to blame for it.

    148. Re:Seems reasonable by MrResistor · · Score: 1

      No, I really don't need to know much about what the software as a whole is doing in order to find meaningful bugs. All I need is basic competence in computational mathematics and enough knowledge of the language you used to make sense of your algorithm. Questions like "is your implementation of the equation you are using stable?" will tell me all I need to know about the validity of your results, and chances are good that I can do that with an undergraduate level of education in applied math. Your assertion that I need your rarefied level of expertise in climate science in order to analyze your code is pure arrogance.

      I don't claim to be an expert in any of the relevant fields, but I've done enough work on error analysis to know how deceiving results can be, and how easy it is to induce seemingly significant patterns into an analysis through what seem like innocuous coding decisions. Once the validity of your results is established, I have no problem relying on you to interpret them. Without being able to examine your code though, you're no more reliable than a street preacher yelling about how The End Is Upon Us!

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    149. Re:Seems reasonable by MrResistor · · Score: 1

      yes, that is supposed to be the point of peer review. The problem is in defining who the peers are. How many mathematicians or software engineers are reviewing papers for climate science journals? I'm going to go out on a limb and say "zero". Considering the fact that most of the results being published are based on mathematical models that have been run on computers, it seems that having the code available to folks with expertise in those domains is necessary for validation.

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    150. Re:Seems reasonable by Anonymous Coward · · Score: 1, Interesting

      Wrong, actually. If it had been the CFCs, it would've taken another decade to see improvements. The most likely scenario is that the ozone hole (which we have no idea how it behaved until we first looked discovered it in the 50/60s) is created by UV and/or cosmic radition, and thus modulated by the solar cycle.

      Yes, there's a really nice correlation there, with a suggested causality.

    151. Re:Seems reasonable by TheTurtlesMoves · · Score: 2, Interesting

      And when your code output does not match theirs, its a bug in your code... because you know, we know its not a bug in our code. Trust me! To replicate the results code should be available. Its is a requirement to provide the source in many journals already.

      Science does not require trust. It requires transparency. Closed source is not transparent.

      --
      The Grey Goo disaster happened 3 billion years ago. This rock is covered in self replicating machines!
    152. Re:Seems reasonable by Anonymous Coward · · Score: 0

      (Yes, I'm a Software Engineer by education)

      LOL. Your education seems deficient.

      Your can only find obvious bugs by code inspection. For anything else, you need to understand the domain's problems and their solutions. This is especially true for the simulation stuff that the GP is running. This is based a lot on heuristics and estimation. You cannot simply validate this kind of software with simple unit test, which is what you proposed.

      However, not releasing the software source code at all is still wrong and simply not scientific.

      BTW: I'm a Software Engineer by education too.

    153. Re:Seems reasonable by bartwol · · Score: 1

      One of the problems I see, is that there hasn't been one single valid argument from the "deniers" or whatever you want to call the armchair folks.

      You describe a layperson who wants to examine the methodology behind a climate model as a "denier". You even use faux quotes and, in the style of passive-aggressive communication, intimate that that characterization came from elsewhere (that's what _you_ want to call them).

      Your implication: questioner equals denier. Do I misunderstand?

      Your bias and bigotry are dripping. You probably should have checked your baggage at the door. Anyway, thanks for revealing your position in your opening line...it should save many readers the trouble of reading the rest of your post.

    154. Re:Seems reasonable by wealthychef · · Score: 1

      You don't have to support it just because you publish it. People are not calling for that kind of code release. It's just available for reviews. The only people you have to "support" are your peers who have legitimate questions about its use and the only support you need give is to explain to them how the hell it works. You don't need to add new features or even fix any bugs -- just explain how the bug does not affect your results. I think it's appropriate you answer to them, don't you? "Peer review," you know? You are focused on the "review" and forgetting the "peer" part. I know you're busy but think of all the time the code saved you.
      Documenting your assumptions etc. sounds like something you had better damned well be doing, so I don't understand your point. And who cares if it's "safe" for others? You just have to demonstrate to your peers that it works as advertised, to their satisfaction.
      And if you are holding back to avoid others learning what you did, then your experiment and results are not reproducible by definition, which makes the idea of peer review a bit suspect in my mind.

      --
      Currently hooked on AMP
    155. Re:Seems reasonable by wealthychef · · Score: 1

      Fine, no computer science dude ever looks at your paper. So... publish it anyhow, bugs and all. The point is to have it reviewed by your peers. Anyone who says the point is to have CS majors review it is being silly I think, although yes it might be that CS people have something to contribute to it, in which case, would you have a problem with them looking at it? If they say something silly, fine, then you will learn to laugh at them. But some CS people say useful things sometimes, even in the area of climate modeling. Not always.

      --
      Currently hooked on AMP
    156. Re:Seems reasonable by JasterBobaMereel · · Score: 1

      The point is that the interface is likely to be very non-user friendly (probably almost non-existent), the code will be minimally commented, and the comments will be out of date, and the code will most likely have been modified multiple time by multiple people for their own purposes...without any regard for later fixes

      i.e. a Software engineers nightmare

      But it does not matter if it can be verified to be correct, this is not code to take raw data and produce a answer, it is code to take raw data and process it into a more usable form, the person using it already knows broadly what to expect and knows that it will be duplicated by other people using different tools, so anything unexpected in the output will show up ...

      This is not software engineering ...it's raw data analysis ... you can second guess the output and know when it's wrong

      --
      Puteulanus fenestra mortis
    157. Re:Seems reasonable by apoc.famine · · Score: 1

      Again, you're talking like a software engineer. These aren't "preconceived notions" we're talking about - they're physical processes. They're real-world observations.

      If I write a program to model ocean currents, and it spits out a map of oceans very, very similar to what's been well observed in the ocean. I can assume my code is good enough. If it spits out a map of sea surface temperatures which matches 90% of what's observed, I'm not really going to worry about the code behind it.

      This isn't "I have a fantasy, let me see if I can make a computer produce it." This is, "We've observed this in the real world, lets see if we can model it to try to understand it better." There is a definite check to most scientific computing - the real world. If your program doesn't match that, there's an issue.

      You're once again approaching this as a software engineer - your job is to make code that does arbitrary shit, and do it correctly, efficiently, and elegantly. My job is to use some code, churn out some data, then compare that to the real world and other people's data. I have an answer key to compare my data to. You don't. Thus, you rely upon code auditing to make sure the code is good. I rely on the answer key. If my data doesn't match that, then I go digging. Is it code, statistics, initialization data? (40% stats, and 40% initialization data, usually)

      And btw, what's "farhenheit"? You won't find conversions involving that in science!

      --
      Velociraptor = Distiraptor / Timeraptor
    158. Re:Seems reasonable by Troed · · Score: 2, Insightful

      If I write a program to model ocean currents, and it spits out a map of oceans very, very similar to what's been well observed in the ocean. I can assume my code is good enough.

      No. As long as you believe that, you're not doing science.

    159. Re:Seems reasonable by Anonymous Coward · · Score: 0

      Original AC you responded to here.

      You would be shocked how bugs crop up. Off by one on an array? Array full of junk but you dont know that about the lang you use? Sprintf into a buffer and write too much into it. Silly little things that can change the stack. Recently I went thru some code that had 'been in the field' for years. Applying what I told you here. I didnt even really care what the code was SUPPOSED to do (I had the experts on hand though). The code was crashing daily. Customers were mega ticked. There were subtle memory overruns all over the place. In interface layers. The original article talked to this a bit. For example if you wanted to use a BSTR in windows from C would you malloc it or call something else to allocate it for you? That is only something someone who uses those API's for a living would know. You can get away usually locally with a goof up. But pass that goofup into another API and the assumptions are very different. In my BSTR example another API may call the wrong free on your memory. It doesnt crash but subtly corrupts memory. As one dude I work with would say 'you corrupt memory all bets are off'. You can not say deterministically how that program will work.

      I would never be arrogant enough to think I can understand what science you are doing. But I can tell you if your code is working to what you want it to do.

      Just as you are an expert in your field you need to realize there are experts in other fields. I would expect you to be at the code reviews. In fact if you didnt show I would just cancel the meeting until you can show. I would also expect your code to be in a form that others can easily understand. As correct me if I am wrong here but part of science is describing what you are doing. Coding is just a form of that. You are describing to the computer what you are doing but you are also describing to some future self of yours and your peers.

      For example in carpentry I can build a desk. But if I want a desk that looks good and has drawers that are not out of wack I hire carpenter to build one. Another way is to just go out and buy a desk that I know is good built by professional carpenters.

      This is a trap many academics fall into (hey I used to be the same way). They feel that they have a 'higher' degree than someone else that they can do it all. Which may be true. But I am smart enough to know if I want it done very well I hire someone who can do it.

      Or as one of my boss's said once 'any shmo can write code, but it takes years to be able to write GOOD code.'

    160. Re:Seems reasonable by MrResistor · · Score: 1

      Yeah, that's what I was saying. If your results are derived from computer simulations, then "peer review" should include CS people and applied math people to make sure that your math was right and it was not implemented in a way that gives bogus results. (Good math implemented poorly is no better than bad math implemented well).

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    161. Re:Seems reasonable by apoc.famine · · Score: 1

      It's amazing that you can get an insightful mod on a comment buried in an article that's not on the front page, while not knowing what you're talking about.

      Hats off to you, good sir!

      --
      Velociraptor = Distiraptor / Timeraptor
    162. Re:Seems reasonable by Troed · · Score: 1

      I think the moderators understand that Software Engineers are quite knowledgeable on the topic of software. You're correct in that I would not know (much) about modelling ocean currents, but that's completely irrelevant for this discussion.

      As a tiny, tiny, example: Your implementation might produce valid looking results for exactly one dataset, and you thus assume the code is good enough. It's then used, by you, six months later with additional data producing results that you publish.

      Your implementation, however, didn't do a correct leap year calculation - which wasn't seen in your first run and only changes the end result by 2% for some values and you thus don't notice it.

      It does however lower the confidence level of your published result close to the where statistical fluctuations could've produced the same values. ... do you really want to pretend that fixing the above bug would be something a software developer wouldn't be able to do without knowing about ocean current modelling? :) Please.

    163. Re:Seems reasonable by Starlet+Monroe · · Score: 1

      Thanks, AC, this is the conversation I actually wanted to have.

      My particular angst is that I do come from a software background; architecture, specifically. I can write code, but it's not great code, and what's worse is that I know it. I'm also poignantly aware of the difference between how I'm doing things for this project and how I should be doing them. The gap there is due to money; they're paying me in a month what I can make in the business world in a day. And I'm the only person on my team. Behold the life of a graduate student, I suppose, but it's especially frustrating to do this after working in the "real" world for so long where if your client doesn't have the budget to do it right, they just don't do it. (Okay, and I wish that statement was actually true, too. If I had a dime for every time a client said, in the initial requirements gathering meeting, "Look, we don't have a lot of money, so don't comment it or anything, okay?" Right. Because that's the part that's costing you.)

      Anyhow, I don't have a great solution for my problem. I just can't imagine that this is the only project out there where there's a guy doing the calculations in a SQL database instead of in Excel spreadsheets or by hand, and importing data to that database using scripts that would break if you tried to pass them a bitmap image instead of a CSV file. What I do know is that we do our sciencey version of QA on it by analyzing test data to confirm that it returns what it should. The right answer? I have no idea. There is no right answer with this budget.

      --
      ++
    164. Re:Seems reasonable by MrResistor · · Score: 1

      What's wrong with adding a floating point to an integer? Isn't the resul a floating point? And if you store that result in an integer, then the result is a truncation to integer, isn't it?

      The problem is that it isn't defined behavior (in C/C++ anyway), so the results are what the person who implemented that function in your compiler/library thought they should be. Yes, what you describe is the most common result if you don't explicitly cast it, but is certainly not guaranteed. C++ (maybe C also) has a variety of casting options for float to int conversion if you want to define the behavior, including truncation and a few different rounding styles. If you're doing a simulation based on a mathematical model, that behavior is certainly something you should be aware of, otherwise you run the risk of having junk results.

      Of course, the only reason I know this is because I took an (elective) class from a guy who thought it might be important for us to know. I would bet that a large number of computer scientists, even those well versed in C/C++, have never given it any thought at all. I'm fairly certain that my Numerical Analysis professors weren't aware of it, and it's quite clear that GP isn't. If you're expecting truncation, but your compiler thinks rounding is better, that's certainly going to be introducing errors into your results. If the coders of all the other models you're comparing to (per another post by the GP) are making the same assumptions, you run a high risk of erroneously assuming that your results are more accurate than they really are.

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    165. Re:Seems reasonable by MrResistor · · Score: 1

      In fact, the arctic could melt completely and 20% of Florida could go underwater and you'd still deny anything was happening or that anything should be done about it.

      Well yeah... it's Florida!

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    166. Re:Seems reasonable by MrResistor · · Score: 1

      Errors are only errors if they are reported by the "right" people?

      They're still errors, they just might not be relevant to the science. The real problem is errors being reported by the wrong people.

      Let's say we have a climate simulation in which our model is accurate and the code doing the actual analysis 100% bug free, but we used a toolkit for the UI that has 100 bugs. Does Rush Limbaugh care that those bugs aren't part of the code we wrote? Or that they don't affect the results at all? Or that they're in parts of the widget set that we didn't even use in our UI? NO! He'll be on the radio as soon as he hears about it, screaming to his army of dittoheads that our results are invalid because our model is full of bugs, and that's going to be echoed by every other conservative pundit and politician by the end of the week, entered into Congressional testimony related to emissions policy as if it were fact, putting yet another barrier in front of the actual scientists who have actual valid results.

      I'm not intending to argue against openness here, just pointing out a very large pitfall (notice I did not say "potential" pitfall, it's pretty much guaranteed to go like that).

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    167. Re:Seems reasonable by MrResistor · · Score: 1

      To me, the issue is with peer review is "who are the peers?" if the peers aren't capable of vetting the methods, how valid are their reviews?

      For example, I've dated a few psychology majors, and spent time hanging out with their psych major friends. One thing is certain: they aren't mathematicians.

      This isn't surprising, since they don't want to be mathematicians, and any discussion of mathematical topics in such a group generally devolved into finding ways they could reduce the number of math classes they needed to take. Fair enough, I feel the same way about psychology classes. However, if any of them ever decide to do any research, and I have no doubt that a few will, their results will only be as meaningful as the quality of their statistical analysis. I reiterate here my point that they are not mathematicians, and add that I have seen no evidence to suggest that those with higher levels of education in the field are likely to be any more knowledgeable regarding mathematics, so how can they give a realistic opinion of whether the analysis is correct?

      Let's assume by some miracle that the author of the paper has the resources to get a mathematician to at least look over their math. As a computer scientist with a minor in applied math, I feel qualified to say that mathematicians are not computer scientists. There are whole classes of bugs, many of which are relatively obvious to someone with a reasonable grounding in computer science, that applied mathematicians seem to be unaware of. It's probably not a big deal if they're working in Matlab, Mathematica, or Minitab, but it certainly can be a problem on the occasions where they go out and write their own code (like, say, large climate simulations).

      All I'm saying is that, if the results of a paper are based on custom software, someone with some actual computer science training should be looking at the code as part of the peer review process. Additionally, I would hope that someone with some real math training is looking over the math, though I doubt this is the case for journals that aren't directly related to the field.

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    168. Re:Seems reasonable by l0b0 · · Score: 1

      So he does black box testing. That's certainly an important form of testing, but won't detect a huge swath of problems. For example, if your employer deals with enormous amounts of small sums per year, including currencies and interest rates, and the quality assurance is a "looks OK" from someone above you, would you be nervous about your savings? I know I would be, but unfortunately I am in no position to review their code. And some "error handling" code is really "error cover-up" code, which may end up creating sensible-looking data that is simply wrong.

    169. Re:Seems reasonable by jwhitener · · Score: 1

      I don't think you understood my point, or I did not convey it correctly.

      To date, every questioner I have seen (who's voice is public enough to be heard), in the public eye (Fox News, major web presence, etc..), has been nearly a fanatical denier of climate change. It is like their mind is made up, and they are grasping to find one little flaw to explode with.

      I have absolutely zero issues with people having full access to the code and complete transparency for even the unqualified to examine the data.

      The problem is that in this politically charged environment, even if 10 people question data set X, and 9 find it is completely fine, Fox News or something similar is going to grab the 1 unqualified person who doesn't like data set X and let them talk for weeks.

      Now, that is generally OK to let that one person talk, assuming they are somewhat respected in their field.

      But if that 1 person who disagrees with data set X is refuted, Fox (just an example, happens all over) never retracts the damage that was done. And they certainly don't have on the other 9 questioners who can point out why the data set X is just fine.

    170. Re:Seems reasonable by wealthychef · · Score: 1

      I wouldn't go so far as to say what should constitute peer review in a journal outside of my field. But it seems obvious to me the code should be published to the full extent possible.

      --
      Currently hooked on AMP
    171. Re:Seems reasonable by eh2o · · Score: 1

      The case currently in the US is that state and federal governments are rapidly divesting in basic academic research and the academic system as a whole. Many of the more successful departments in science and engineering are now heavily funded by contributions from individuals and corporations, and the IP flows back to the funders through technology transfer licensing and the hiring of fresh grad students into industry. Meanwhile many faculty positions in universities are paid for by private donations. At the most extreme end are the private universities that have enormous endowments and basically do everything with private funds.

      Its nice to say the taxpayers should get what they paid for but the truth is, they are not really paying for it, and until we elect a government that is actually willing to fund education and science in a serious way then that is not going to change in a significant way.

    172. Re:Seems reasonable by xiox · · Score: 1

      It's fine to give code to referees who want to see it under peer review. I have no problems with that.

      If you release code more generally, you need to support it. You will get questions. If you don't answer them, your work will be brought into question. What's this thing about "working as advertised"? Scientific code is quite often written to be used for a short time on specific inputs on a specific computer system. It won't "work as advertised" without a lot of support and hand-holding.

      By assumptions, I mean things such as filename standards, format of data, and so on. These aren't scientific assumptions, but assumptions of the code itself, so are different things.

      And keeping code private isn't to stop people reproducing what you did, but to not allow others an advantage in an area you are working on.

      Reproduction of results is about independent verification anyway, so they probably should be starting with the raw data and not working with an existing code.

    173. Re:Seems reasonable by wealthychef · · Score: 1

      It's fine to give code to referees who want to see it under peer review. I have no problems with that.

      Got it, great! We're 90% of the way there. LOL

      If you release code more generally, you need to support it. You will get questions. If you don't answer them, your work will be brought into question. What's this thing about "working as advertised"? Scientific code is quite often written to be used for a short time on specific inputs on a specific computer system. It won't "work as advertised" without a lot of support and hand-holding..
      By assumptions, I mean things such as filename standards, format of data, and so on. These aren't scientific assumptions, but assumptions of the code itself, so are different things.

      I think you are overstating this part. It's true it's a bit harder to have to explain yourself, but that's part of being a researcher. You do not need to explain every program feature or state your file format standards in general. Think outside your box for a second. YOu have this idea about supporting and it's getting us stuck. All you need to supply is your data, the code, and the commands you used to execute the code, perhaps spend a half hour or an hour explaining your data format and how the code works. Voila! You are done. Yes, you might get questions about poorly documented parts of the code, and as a computer guy, I think if you cannot explain parts of your code then you don't know if your code is accurate, but if your colleagues don't care, then I don't. The point is you released it and it's being looked at. I'll let your community decide the rest. :-)

      And keeping code private isn't to stop people reproducing what you did, but to not allow others an advantage in an area you are working on.

      Of course, but it potentially has both effects. If it's such an advantage to keep it secret, then nobody is forcing you to publish anything. The fact is that publishing is an advantage to everyone, that's why you do it. You want to have your cake and eat it too, and I applaud you for it, but the real tension is not between publishing code and getting things done, it's between keeping your stuff secret and publishing your results. If your code is your secret weapon, then write a paper about it and copyright it. And if it's that hard to document, I would argue it's not much of an advantage anyhow, but that's just me. LOL

      Reproduction of results is about independent verification anyway, so they probably should be starting with the raw data and not working with an existing code.

      Well, this is a very debatable point. I would argue that they have a right to know how your code works in order to verify your results. If a chemist said, "I went into my lab and mixed some chemicals together, but I'm not going to tell you how, but this is what I ended up with," then other chemists would probably want to know more details. Your code is a huge part of your process. It stands to reason that your results are not reproducible without a detailed description. The best easiest way to furnish the description is to publish the source.

      --
      Currently hooked on AMP
    174. Re:Seems reasonable by bartwol · · Score: 1

      The popular news media is not an outlet for dissemination of "science." This is appropriate...its audience has little interest in the tight details that are the realm of science.

      Instead, the news media is a platform for dissemination of information and entertainment that, in so doing, provides sufficient gratification to its audience that they choose to come back for more. I would guess, for example, that though you make many remarks about Fox News, you probably watch very little of it. (And it's not clear to me that you distinguish the news part of Fox's programming from the opinion/commentary parts.)

      So we should understand that putting a scientist in front of a camera and having him opine does not constitute the dissemination of scientific information; it constitutes the dissemination of a scientist's opinions. Effectively, the real science exists only in the form of the publications that constitute peer-reviewed research. Even the authors' remarks about their own papers stand outside the very critical borders of the peer-reviewed piece (that is presumed to have been produced through rigorous application of the scientific method).

      The bias of the media is more insidious than you suggest. While bias is obviously reflected in the opinions of the people speaking to the cameras, it is less obviously reflected by the choices of which people to put in front of the cameras, end EVEN MORE INSIDIOUSLY, in the choices of which stories to cover. Science news is in very small part a production of scientists, and in large part, a production of news editorial boards and science reporters (neither of whom is likely to be a scientist).

      In my lifetime, I have not seen any scientific theory get the kind of media support and air time that AGW has received. Even with the media's powerful inclination to make AGW a major public concern, the issue does get thrashed about and chewed apart by Joe Q. Public and his spinster aids.

      It is within this comfortable relationship between media outlet and audience that you want to see more what? Ahhh...yes...you want to see more bias toward what you believe. Oh. Wait. No. You want to see more bias toward "what science believes." No. Wait. You want more bias toward the "scientific consensus."

      Whatever it is that you want the media to put out, it isn't science. It's opinion. It's your opinion. And it's the opinion of many others. And you're just trying to help the "good guys" win.

      And to that, I say, get in line. Ethnic violence, civil wars, and totalitarian governments terrorize many millions of people in the world. Infectious disease plagues various regions, and the risks of future virulent epidemics are as real to me (and statistically more dangerous by my estimates) as potential damage due to climate change. Infant death is its own scourge. Proliferation of weapons fuels many destructive activities today and portends more destruction in the future. And the greatest scourge of all is the difficulty BILLIONS of people face TODAY in getting hooked up into an economy that can provide reliable access to food, sustenance, and social cohesion...for themselves and their families.

      Yes, the AGW battle is losing ground in the general public. That's no more a fault of media than are all our political problems. But if you want to score more wins for your side (yeah, yeah..."our" side), I suggest you dispose of your characterization of your opposition as being simple-minded "deniers", biased media, and the lack of science in the public consciousness. Those narratives are just the rhetoric that belie the greater political truths that they sustain: that we've got a bunch of really big challenges in this world and the funding of climate control quite reasonably faces formidable competition.

      In the debate about how to move forward in the world, I think climate science has already contributed about as much as it productively can.

    175. Re:Seems reasonable by jwhitener · · Score: 1

      The popular news media is not an outlet for dissemination of "science." This is appropriate...its audience has little interest in the tight details that are the realm of science.

      The audience might not care to hear the details, but I think the audience would care to know the general truth about an issue.

      Instead, the news media is a platform for dissemination of information and entertainment that, in so doing, provides sufficient gratification to its audience that they choose to come back for more. I would guess, for example, that though you make many remarks about Fox News, you probably watch very little of it. (And it's not clear to me that you distinguish the news part of Fox's programming from the opinion/commentary parts.)

      I try to watch Fox every so often just to see whats being said. And if you take an honest look at Fox, the "news programming" side of their organization is very much intertwined in the opinion/commentary side. I've seen the news programming side say "some folks are saying X", when those folks who said it are the opinion shows that just aired!

      So we should understand that putting a scientist in front of a camera and having him opine does not constitute the dissemination of scientific information; it constitutes the dissemination of a scientist's opinions. Effectively, the real science exists only in the form of the publications that constitute peer-reviewed research. Even the authors' remarks about their own papers stand outside the very critical borders of the peer-reviewed piece (that is presumed to have been produced through rigorous application of the scientific method).

      No one would argue that the news is an appropriate place to disseminate scienctific detail. However, the news should certainly research enough to be able to paint a proper picture of the science it reports on. At least let the audience get a sense of what most scientists believe, instead of only airing the minority dissenter. It paints a picture that there is some 50/50 debate about whether climate change is even occuring! That is bad reporting.

      The bias of the media is more insidious than you suggest. While bias is obviously reflected in the opinions of the people speaking to the cameras, it is less obviously reflected by the choices of which people to put in front of the cameras, end EVEN MORE INSIDIOUSLY, in the choices of which stories to cover. Science news is in very small part a production of scientists, and in large part, a production of news editorial boards and science reporters (neither of whom is likely to be a scientist).

      Agreed. And it is getting worse. The news show '60 minutes' marked the downfall of unbiased reporting. It was the first time that the major networks realized that news was not a public service anymore, but rather, a money maker.

      It is within this comfortable relationship between media outlet and audience that you want to see more what? Ahhh...yes...you want to see more bias toward what you believe. Oh. Wait. No. You want to see more bias toward "what science believes." No. Wait. You want more bias toward the "scientific consensus."

      Bias towards the scientific consensus? Are you kidding me? The public deserves to know what level of evidence supports certain ideas. They don't have to know the details, but they should walk away from a news show with a sense of what is most likely the correct answer concerning an issue. That isn't bias, that is responsible reporting. Fox has clearly not provided a clear picture to its viewers concering the levels of evidence on either side of this completely fabricated debate.

      Whatever it is that you want the media to put out, it isn't science. It's opinion. It's your opinion. And it's the opinion of many others. And you're just trying to help the "good guys" win.

      You probably don't fly in airplanes do you? After all, it

    176. Re:Seems reasonable by DarthVain · · Score: 1

      No offense, and I can understand your frustration with the current situation, however:

      "When we're trying to do science, we'd rather do science than defend ourselves against hacks with a public soapbox."

      Perhaps your in the wrong business then.

      Science has always existed within the political realm, and it has always been a problem.

      Science is all about that "defense", whack job or otherwise, that is part of what being a scientist is all about.

      Believe it or not I work in a "dirty" industry that is horribly misunderstood that the data also takes extreme specialization to comprehend. We take all kind of flack, mostly of the political kind, being influenced by potential voters, which are really just cottage and land owner associations (NIMBY groups) rather than legitimate environmental groups. Silly decisions are made in the form of compromise and conciliation, when really they are wrong and just don't understand. Most times, as I am sure you can relate, the audience has little or zero interest on actually understanding the subject matter, they have already decided, and no amount of data, reasoning, or logic will convince them otherwise.

      Having said all that, it is still our job to explain stuff, and defend why we feel that way. The fact that your audience either doesn't understand, or doesn't want to is immaterial. Hopefully over time, and with persistence, and mounting evidence, your views, if correct, will be accepted.

    177. Re:Seems reasonable by HiThere · · Score: 1

      Yeah, but the code being described was in FORTRAN. I may not remember what the rule was in FORTRAN, but I *do* remember that it was defined. (It could have been even that the variables are added together as bit patterns without conversion. Yuck, but that would be a defined result, and so would fit my memory sufficiently to be reasonable.)

      P.S.: I have a vague memory that said that automatic conversion only happened across the equal sign. This would have been FORTRAN IV, around 1965. I don't have any memory that said what happened if you multiplied or added an integer and a float. Perhaps it threw an error. I *DO* remember using an equivalence statement to flip bits in a float by treating it as an int. (This would have been on either an IBM 7094 or a CDC 6600, because floats and ints were the same length, and nobody bothered with double precision. Except numerical analysis people, astronomers, etc. Variable size ints didn't come in until byte addressed computers [i.e., the IBM 360] became common.)

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    178. Re:Seems reasonable by pod · · Score: 1

      Yes, thank you ACs, we still know squat all about the ozone hole, except that it has mysteriously disappeared, something you are causally linking to some minor human action, which was to take decades to be implemented and play out. Maybe it went away because of forces greater than the human activity of the western hemisphere?

      The same could be said of global warming. While CO2 is increasing, temperatures are decreasing globally, and northern polar ice is growing. Glaciers aren't melting, and Antarctic ice isn't shrinking one foot. There is well proven and documented research that shows temperature increase as correlating to CO2 concentration increase. We could roll back CO2 concentrations to pre-industrial levels, and that would reduce global temperatures less than .5 degrees. The thing with CO2 all the alarmists are missing is that it has very rapidly diminishing effects the more of it you stack. We could add twice as much as is there already and you would barely be pick out the temp difference form the noise. Maybe something else is at work here, hmm? I'm looking at that big fiery ball in the sky, the source of nearly 100% of the heat input into the planet.

      I could have also thrown in there acid rain, and how all our buildings and trees would melt one day because of it. Although it was certainly a measurable problem, like global cooling, ozone hole, global warming, and unlike them was almost entirely attributable to human activity, however the dire predictions were again simply hysteric and wildly exaggerated, making a mockery of the whole thing, and confusing the issues.

      We have enough problems on this planet that we don't need to make up new ones to spend money on. There is real air, water and ground pollution happening all over the place. It is much more immediate and much more dangerous to the environment and humans than any fantasy we've come up with to date. Except we don't need a world government and oppressive regulations regime to solve them, and it's not sexy like polar bears and going green, so no go on that front.

      --
      "Hot lesbian witches! It's fucking genius!"
    179. Re:Seems reasonable by bartwol · · Score: 1

      You're mis-stating the dynamics here.

      First, it is very wrong to characterize the media by citing Fox. Overwhelmingly, the media (broadcast and print) has been sensitive to the issues of climate change and, quite pointedly, affirms the theories of AGW. In the U.S., I would cite CNN, MSNBC, CBS, NBC, ABC, New York Times, Washington Post, LA Times, Boston Globe, USA Today, and many others as all having editorial stances that are not simply sitting on the fence, but in fact, favoring IPCC findings. I regret that the dominant convention of "balanced reporting" does, as you indicate, give presence to poorly supported opposing positions. But in so doing, they've spared you an even worse fate: presenting more nuanced and well-supported counter positions instead of the poor ones as they do.

      It is a fool who declares with certainty that warming is not occurring. And it is similarly a fool who declares with certainty that human born CO2 is not affecting climate. And yet, these are the "skeptics" that news organizations parade before us. Yes. Stupid "deniers". Perhaps if the news media presented the intelligent skeptics and the much narrower arguments that they are advancing, you would more readily acknowledge that the issues of global climate, particularly with respect to public policy and action, are not empirically deduced.

      Second, per this Pew survey, only 33% of Americans believe global warming isn't happening. 57% state definitively that they believe it is. And, yes, only 41% believe with certainty that the warming is anthropogenic. Overall, I'd say that's not a bad cheering section for such a complex issue.

      All that said, the focus on the deniers is a focus on the fools, and will not help you to understand why ground is being lost. It's being lost because the IPCC, many scientists, many politicians, governments, the news media (and you) are mixing science with politics, inferring public policy, and intimating that all this falls empirically from the tree of SCIENCE. Who is kidding whom? One need only read the IPCC's Summary for Policy Makers to witness this despicable breach of the borders of science. Imagine if an author of a study describing a clinical trial of the efficacy of a cholesterol-lowering drug published in his findings that people should take the drug. That would be absurd. The job of the scientist would be to report his methodology and results, perhaps showing a statistical correlation of mortality with the use of the drug. That's it. The scientist's job is done (and his integrity stands on the shoulders of the scientific method itself). Thereafter, it would be the job of pharmaceutical marketeers to use that information to get people to take the drug. The IPCC mixes the research with the marketing, and demonstrably blurs the lines between the two.

      As a lifelong consumer of good science, and as a tireless advocate of the application of empiricism and rational thinking, I teach people that the IPCC is only partially related to the doings of science. I teach them to look skeptically upon its guidance just as they should look skeptically upon the guidance of a pharmaceutical advertisement. I teach them how to spot the differences between deduction and inference. And I use the IPCC's own work products to demonstrate where empiricism ends and uncertainty (and falsely certain statements) begins. I teach them to look for the measurements, to assess provenance, to demand the well-established traditions of discipline in methodology. I explain to them the limitations of simulation and the challenges of establishing causation.

      Yes, you can swallow the conclusions of alleged experts. But that's dogma, and it's a poor substitute for looking behind the curtain, for there and only there is where you get to see and feel the true authority of science.

      I find the theory of anthropogenic global warming to be very compelling. Having spent some time examining its basis, I am not feeling significant inclination to doubt its u

    180. Re:Seems reasonable by c0d3g33k · · Score: 1

      Actually, I'm pretty sure everyone is fairly close with the current data they're generating to prevent other groups from beating you out the door with your idea.

      In the hard-scrabble world of the C-Level Scientist, perhaps. In the circles I moved in when I was a student (top-notch research institute with multiple Nobel laureates [Gilman, Brown/Goldstein, Deisenhofer] and NAS members), this was NOT the norm. A common occurrence was the PI (Principal Investigator) stopping by on their way to the airport to give a talk to see if there was any new data worth putting on a slide. The filter was whether or not the data was good, not whether it was too new to reveal. There was none of this secrecy you speak of. If the practice of scientific research has really descended to the levels you describe, then I have no response other than despair. It was a nice few hundred years while it lasted. Welcome to the New Dark Ages.

  2. great! by StripedCow · · Score: 3, Insightful

    Great!

    I'm getting somewhat tired from reading articles, where there is little or no information regarding program accuracy, total running time, memory used, etc.
    And in some cases, i'm actually questioning whether the proposed algorithms actually work in practical situations...

    --
    If Pandora's box is destined to be opened, *I* want to be the one to open it.
    1. Re:great! by xtracto · · Score: 1

      And in some cases, i'm actually questioning whether the proposed algorithms actually work in practical situations...

      The problem is not only the algorithms but their implementation. I have read thesis where you have certain algorithm explaining the dynamics of a simulation and when I actually looked at the code (closed for in-house analysis only) several things were different.

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    2. Re:great! by Anonymous Coward · · Score: 1, Interesting

      Yet more stupidity by people who know nothing. By the logic of the posters above since the government paid EDS under a contract to perform some work for the government ANY software developed by EDS should suddenly become freely available. Since Boeing was paid to develop tF-15 ANY software written to design or build the f-15 should now be freely available. I suggest you idiots try that with any company that does business with the government. I'll know when you do because the gales of laughter from the corporate offices will be heard around the world. Yet you think that software developed to perform a work under contract to the government by a researcher mysteriously should freely available. The funny part is if the software written by researchers under contract to government wasn't freely available how is it possible that the idiot writing for the Guardian was able to perform the analysis? Wups! I suggest he try the same analysis on the regenerative breaking software that Toyota has on its Prius or maybe the Airbus 330 fly-by-wire software (think AirFrance) Wups!

      Yes I am very aware of the legal requirements and their consequences. For the third I've have large chunks of my code copied verbatim within commercial products after my institution was forced release it to companies repeating the same "the research is publicly funded." line of bullshit. The companies actually had the brass balls to actually try to sell me my own software. Yes it's a long protracted process to get the companies to either remove the offending code or pay the university for it.

    3. Re:great! by mrxak · · Score: 1

      I'm quite sure there would be exceptions to state secrets. I'm also sure the folks who built F-15s did a heck of a lot more testing and verification than your typical scientific research at a university.

  3. More to the point, people increasingly don't by aussersterne · · Score: 4, Insightful

    seem to understand the very idea of scientific methods or processes, or the reasoning behind empiricism and careful management of precision.

    It's a failure of education, no so much in science education, I think, as in philosophy. Formal and informal logic, epistemology and ontology, etc. People appear increasingly unable to understand why any of this matters and they essentialize the "answer" as always "true" for any given process that can be described, so science becomes an act of creativity by which one tries to create a cohesive narrative of process that arrives at the desired result. If it has no intrinsic breaks or obvious discontinuities, it must be true.

    If another study that contradicts it also suffers from no breaks or discontinuities, they're both true! After all, everyone gets to decide what's true in their own heart!

    --
    STOP . AMERICA . NOW
    1. Re:More to the point, people increasingly don't by bsDaemon · · Score: 4, Insightful

      I think a lot of it has to do not just with failures in education, but also due to the way science (in particular, but everything in general) is reported in the media. One week a study saying coffee will kill you gets reported, then a couple of days later a story saying another study says coffee will make you immortal is reported on, both with equal voracity, neither with expert commentary or perspective. C+ students who look good on camera banter back and forth about it, laughing jocularly and ultimately creating a situation in which, by their own dismissal and misunderstanding, perpetuate that to their viewers.

      Its come to the point where many, many people just dismiss the whole business of science. "They can't even make up their minds!" they say, as if the point of science is to make up ones' mind. Of course, this is where the failure of education to actually educate comes into play. Classical liberalism has been turned over, spanked and made into the servant of corporate mercantilism and we're all just now supposed to sit down and shut up. Science, is in its essence, a libertarian (note small 'l') pursuit through which one questions all authority, up to and including the fabric of existence itself -- all assumptions are out the window and any that cannot pass muster is done away with.

      But, just like socio-political anarchism (libertarian socialism), the spirit of rebellion and anti-authoritarianism inherent in science has been packaged and sold in a watered down and safe-for-children package at the local shopping mall only to be taken out of the box when the powers that be feel that they can use it for their own purposes. Not to be a downer or anything, its just I really do think this is bigger than just science. It's to do with people willingly leading themselves as sheep to the slaughter on behalf of the farmer to make the dog's job easier.

    2. Re:More to the point, people increasingly don't by monoqlith · · Score: 1

      I'm with you that people don't see to understand the motivation for empiricism or the scientific method, but I think that's an overly complex explanation why.

      Science *is* somewhat an act of creativity, but in a different respect. In order to explain observation, one has to creatively intuit the path to the precise explanation in novel, non-obvious ways. Einstein was creative in his science, and he was also a brilliant scientist.

      You say, one has to try to create a cohesive narrative of a process. Well, yes, that's what science aims to do. What people don't seem to understand is that not all explanations are created equal, which is where I agree with you again. There is a critical difference between a framework that does explain and predict observation and is falsifiable and one that isn't. People by and large don't seem to get that.

    3. Re:More to the point, people increasingly don't by Anonymous Coward · · Score: 0

      I loved that post. But I would add that classical liberalism and Austrian economics and based on the old rationalist tradition and deductive reasoning, whereas modern science is based on empiricism and inductive reasoning.

      On the other hand, Einstein did say this:

      "There is no inductive method which could lead to the fundamental concepts of physics in error are those theorists who believe that theory comes inductively from experience."
      – Albert Einstein, The Method of Theoretical Physics, Oxford, 1933

      And I would assume that, though Edison said genius is 99% perspiration, Tesla would probably agree more with Einstein on the importance of imagination, and reason itself rather than grueling empiricism.

    4. Re:More to the point, people increasingly don't by ArcherB · · Score: 1

      so science becomes an act of creativity by which one tries to create a cohesive narrative of process that arrives at the desired result.

      And that is the problem. When the input is fixed, experiments are to determine a reaction when those variables are present, not to achieve a "desired result". Scientists should not "desire" any particular result when the inputs are unchangeable. They can predict the outcome, but they can not change the DATA to receive a desired result. For example, when determining the energy output of two colliding celestial bodies, scientists can not change the mass and velocity of those bodies. All they can do is input the data available. Changing that data changes can the predicted result, but will not actually change the mass, the velocity or angular momentum of the bodies therefor will not change the true energy output from the impending collision. The same rules apply when dealing with climate science. The data is the data. It is fixed.

      Changing the data to match your preconceived result invalidates the model, even when your predictions are based on other models. Either all the models give the same result from the same data or they are wrong. End of story.

      And your sig:

      STOP . AMERICA . NOW

      Says the guy using the Internet (an American creation), via a computer (an American product), to post "STOP . AMERICA . NOW" on a website run and owned, operated and run out of AMERICA.

      Kind of hypocritical, don't ya think?

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
    5. Re:More to the point, people increasingly don't by phantomfive · · Score: 2, Interesting

      people increasingly don't seem to understand the very idea of scientific methods or processes, or the reasoning behind empiricism and careful management of precision.

      Surprisingly, not true. In fact, it's getting better, despite what Idiocracy claims.

      An easy way to see this is to compare High School Musical to Grease. Both of them were roughly the same movie, separated by a few decades. In Grease, the smart kids were shown as dorks, and the cool kids were the ones who were most likely to drop out of school. In High School Musical, the 'brainiac' kids weren't portrayed as better or worse than the jocks, just different. So perceptions are changing.

      Mainly I don't know when this golden time period was that everyone understood formal and informal logic, epistemology, and ontology. At least now, most everyone understands [citation needed], twenty years ago most people had trouble with that (and Wikipedia spoils me: now when I read a newspaper I keep wanting to find the link to click on for the citation of their assertions.).

      --
      Qxe4
    6. Re:More to the point, people increasingly don't by blahplusplus · · Score: 1

      You act like what you said hasn't always been the case:

      "A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it."--Max Planck

      He said that over 60 years ago.

    7. Re:More to the point, people increasingly don't by Anonymous Coward · · Score: 0

      I wrote a program last week to model the energy output of two colliding celestial bodies. I decided to take the average estimated mass for each by pulling some data sets from my fellow scientists. I called up my buddy to tell him how cool it was to see just how much energy could be release upon such a collision. He told me: "Yeah, you'll have to show me that--wait, you didn't use that mass table I sent you, did you? Turns out my equipment was miscalibrated." Oh, no problem, I thought. I'll just recompute the averages after I toss out his faul--OH HOLIEST OF FUCKS I'VE JUST COMMITTED THE WORST SIN KNOW TO MAN. I CHANGED MY INPUT DATA WHICH IS SACROSANCT AND NEVER WRONG. DATA IS DATA. DATA IS PERFECT. IT IS FIXED.

      Seriously, if your apparent understanding of science is "either all the models give the same result from the same data or they are wrong," why even bother posting about it? You're simply ignorant. "End of story."

    8. Re:More to the point, people increasingly don't by nickthisname · · Score: 1

      I too have a deep longing to waste a lot of words without saying anything. Can I hold your hand?

      SHOP. AMERICA. WOW!

    9. Re:More to the point, people increasingly don't by ArcherB · · Score: 1

      I wrote a program last week to model the energy output of two colliding celestial bodies. I decided to take the average estimated mass for each by pulling some data sets from my fellow scientists. I called up my buddy to tell him how cool it was to see just how much energy could be release upon such a collision. He told me: "Yeah, you'll have to show me that--wait, you didn't use that mass table I sent you, did you? Turns out my equipment was miscalibrated." Oh, no problem, I thought. I'll just recompute the averages after I toss out his faul--OH HOLIEST OF FUCKS I'VE JUST COMMITTED THE WORST SIN KNOW TO MAN. I CHANGED MY INPUT DATA WHICH IS SACROSANCT AND NEVER WRONG. DATA IS DATA. DATA IS PERFECT. IT IS FIXED.

      Seriously, if your apparent understanding of science is "either all the models give the same result from the same data or they are wrong," why even bother posting about it? You're simply ignorant. "End of story."

      Talk about missing the point. I never said data could not be changed due to errors. I said, "Changing the data to match your preconceived result invalidates the model."

      In other words, to use your example, you completed your model, looked at the results and said, "Hmmm. That's not right." So you change the mass and/or speed of the objects to have the model's results match what you thought the output should be.

      You should really work on your literacy skills before you run around calling other people ignorant.

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
  4. Stuff like Sweave by langelgjm · · Score: 3, Interesting

    Much quantitative academic and scientific work could benefit from the use of tools like Sweave, which allows you to embed the code used to produce statistical analyses within your LaTeX document. This makes your research easier to reproduce, both for yourself (when you've forgotten what you've done six months from now) and others.

    What other kinds of tools like this are /.ers familiar with?

    --
    "Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
    1. Re:Stuff like Sweave by StripedCow · · Score: 1

      This raises the question in what programming language the scientific code should be published.

      Should there be a universal language, so that stronger guarantees are obtained on the reproducibility of the work?

      Of course, this is a difficult topic since a lot of scientific programs are specifically designed for (specific) clusters.

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    2. Re:Stuff like Sweave by shabtai87 · · Score: 1

      There's a lot of nice extensions using the listings package in LaTeX. I use a lot of MATLAB so I usually end up using the mcode.sty available on mathworks (http://www.mathworks.com/matlabcentral/fileexchange/8015). Its got the color coded parts right too, which is nice for readability. More importantly I'll save the current code at the time of that report with the report itself, just in case I get really drunk and decide try to "fix" any base code.

      --
      @humanity: *facepalm*
    3. Re:Stuff like Sweave by xtracto · · Score: 2, Interesting

      Should there be a universal language,

      It is called Z notation. I have seen it used in several articles and at least a book on multi-agent systems.

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    4. Re:Stuff like Sweave by xtracto · · Score: 1

      The problem with "adding code to paper" is the length of the paper.

      I find it is better to submit the actual code into a "publisher repository" which can make it available in a long term basis (as opposed to have it in the researcher's web page, which is closed when they leave the position, or which the researcher himself can remove after some time).

      Of course it may be useful to reproduce some snippets of the used algorithm in the article's text, however I won't suggest showing the actual code because not all the audience will know such notation (very likely outside Comp.Sci and Soft Eng. circles).

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    5. Re:Stuff like Sweave by John+Hasler · · Score: 2, Insightful

      > This raises the question in what programming language the scientific code
      > should be published.

      The one it was written in. What should be published is the exact code that was compiled and run to generate the data. Think of it as similar to making the raw data available.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    6. Re:Stuff like Sweave by John+Hasler · · Score: 1

      > I use a lot of MATLAB...

      Therefor MATLAB being closed-source, all of your software is not available.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    7. Re:Stuff like Sweave by c_sd_m · · Score: 1

      This raises the question in what programming language the scientific code should be published.

      Should there be a universal language, so that stronger guarantees are obtained on the reproducibility of the work?

      Of course, this is a difficult topic since a lot of scientific programs are specifically designed for (specific) clusters.

      Good luck. Even within single small research programs I haven't seen consistency in code. Heck, my master's used Scilab on the cluster which called C when it needed to be fast, Matlab for prototyping and trialing different methods and visualization, Python to mate the Matlab engine to stuff and for some low-level processing, some declarative optimization languages, ... Even within my one little topic it would have been a huge waste of time and effort to choose just one of these tools and try to make it fit everything. I may have had 10-50% overlap in languages with people in my lab, at most, and there were people in my program that could not have used my tools for their work.

      Enforcing a single language (ignoring who could possible police it) would result in a huge loss of productivity. Sometimes the point of research is to show something is possible, not to provide iron-clad proof for the 6 o'clock news bites.

    8. Re:Stuff like Sweave by StripedCow · · Score: 1

      The one it was written in. What should be published is the exact code that was compiled and run to generate the data. Think of it as similar to making the raw data available.

      That's nice, but then the experiment would be very difficult to reproduce. For example, if the code is in matlab, then you'd need a matlab license (as another poster noted). Also differences in operating system version, etc. could make it difficult. Further, if somebody wants to reproduce the experiment, say, 100 years from now, and the whole software ecosystem is different (very likely), then that somebody would have a real problem.

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    9. Re:Stuff like Sweave by TheThiefMaster · · Score: 1

      It should be published in the language it was written in.

      Errors during the translation process to another language could easily invalidate the entire program, which people would assume means flaws in the program output used in the paper. At least if it's published in its original language then any bugs found could genuinely matter.

    10. Re:Stuff like Sweave by StripedCow · · Score: 1

      Okay, but think in the longer term.

      Imagine something like an open version of the ".NET" framework, where you could have multiple languages mapping to a single virtual machine architecture.
      Also, you could have compilers which can transform a program from one language into another, etc.
      A language could be marked "scientifically approved" when it can be translated to the language which is the "scientific" standard (just like English is a scientific standard for most journals).

      Note that it doesn't need to be fast, reproducibility should have the highest priority.
      A "virtual clock" can be used to reproduce the timings.

      It may seem far-fetched, but again, we should think in the longer term.

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    11. Re:Stuff like Sweave by ColdWetDog · · Score: 2, Funny

      I think it should be Perl. Then it would be uniformly incomprehensible which would level the playing field. Nothing else would be as fair.

      --
      Faster! Faster! Faster would be better!
    12. Re:Stuff like Sweave by mrxak · · Score: 1

      The scientific community can actually run the code and verify that it produces the same output on another machine in another lab. What we're talking about here is programmers making sure the code is a perfect representation of the scientific formulas described by the research paper. Basically, did the researchers follow the methodology they claimed to? Is data being lost or altered due to limitations of the software? This doesn't take any special knowledge of the science involved, just the calculations involved.

    13. Re:Stuff like Sweave by SETIGuy · · Score: 1

      What we're talking about here is programmers making sure the code is a perfect representation of the scientific formulas described by the research paper.

      There's no such thing for any non trivial calculation. Do you, offhand know when it's best to use upstream derivatives versus standard derivatives versus downstream derivatives when solving a series of differential equations? There are is a very large number of techniques for calculating the gradient of a quantity in a time varying system. None is a perfect representation. To the untrained, both upstream and downstream derivatives look like an error, yet for many problems they are practically a requirement. Then there's the question of which method of integration to use.

      In some cases the only way to figure out if your code is "correct" is to figure out if it approximates the solution to a problem that can be calculated analytically. Or in the case of climate modeling, whether it can come close to fitting past data.

  5. It should be released and under a free licence! by bramp · · Score: 3, Interesting

    I've always been a big fan of releasing my academic work under a BSD licence. My work is funded by the taxpayers, so I think the taxpayers should be able to do what they like with my software. So I fully agree that all software should be released. It is not always enough to just publish a paper, but you should release your code so others can fully review the accuracy of your work.

  6. About time! by sackvillian · · Score: 5, Informative

    The scientific community needs to get as far as we can from the policies of companies like Gaussian Inc., who will ban you and your institution for simply publishing any sort of comparative statistics on calculation time, accuracy, etc. from their computational chemistry software.

    I can't imagine what they'd do to you if you started sorting through their code...

    --
    Hey mate, spare a sig?
    1. Re:About time! by Anonymous Coward · · Score: 0

      Why keep picking on a small company? I don't see Microsoft making Excel code available for checking, or SAS making their source code available to see that the statistics run correctly... Why not make the argument that if you want it open, make it non-commercial.

      I'm not even sure the lab instrument people make their code/schematics/whatever open - how do you know the experimental data's ok?

    2. Re:About time! by je+ne+sais+quoi · · Score: 2, Informative

      One thing to point out is that there are now plenty of open source codes available for doing similar things as gaussian so it can be avoided now with relative ease. Two that come to mind are the the Department of Energy funded codes: nwchem for ab initio work and lammps for molecular dynamics. I use the NIH funded code vmd for visualization. The best part about those codes is that they're designed to be compiled using gcc and run on linux so you can get off the non-open source software train all together if you wish.

      --
      Gentlemen! You can't fight in here, this is the war room!
  7. Re:Why release it? by ShadowRangerRIT · · Score: 2, Insightful

    Please apply Hanlon's razor before leaping to conspiracy theories. Or Occam's razor might inform you that a conspiracy among thousands of scientists is a highly improbable occurrence; look for a solution that doesn't involve a perfect lid of secrecy among a group of (frequently) socially inept people.

    --
    $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
  8. Engineering Course Grade = F by BoRegardless · · Score: 4, Interesting

    One significant figure?

    1. Re:Engineering Course Grade = F by savanik · · Score: 1

      One significant figure?

      Yeah. My eyes bugged out when I saw that, too.

      This is why Statistics should be taught to anyone attempting to do scientific research. If you don't understand why this is happening and how to prevent it, please turn in your PhD now.

    2. Re:Engineering Course Grade = F by natoochtoniket · · Score: 2, Interesting

      That actually surprised me, too. Loss of precision is nothing new. When you use floats to do the arithmetic, you lose precision in each operation, and particularly when you multiply two numbers with different scales (exponents). The thing that surprised me was not that a calculation could lose precision. It was the assertion that any precision would remain, at all.

      Numeric code can be written using algorithms that minimize loss of precision, or that are able to quantify the amount of precision that is lost (and that remains) in the final answers. But, if you don't use those algorithms, or don't use them correctly and carefully, you really cannot assert _any_ precision in the result.

      If you know your confidence interval, you can state your result with confidence. But, if you don't bother to calculate the confidence interval, or if you don't know what a CI is, or if you are not careful, it usually ends up being plus-or-minus 100 percent of the scale.

    3. Re:Engineering Course Grade = F by khayman80 · · Score: 2, Informative
    4. Re:Engineering Course Grade = F by khallow · · Score: 1

      There are interesting things that can be done with a merely single significant digit, namely, order of magnitude calculations. It's a useful to be able to determine the useful weighting of contributions to greenhouse gasses, for example. A classic case from climatology is figuring out the relative contribution to carbon dioxide emissions of human activity and volcanism (the former is much greater).

    5. Re:Engineering Course Grade = F by khayman80 · · Score: 1

      Yes, but you want that uncertainty to come from limitations in the experimental data, not inadequate guard digits. What the article was describing was a situation where the accuracy of the results dropped from 6 significant figures to just 1. In some rare situations, this could be intentional (to obtain an order-of-magnitude estimate using shortcuts that sacrifice accuracy for speed), but it's more likely to be your garden-variety roundoff error that the programmer didn't even know was corrupting his results.

    6. Re:Engineering Course Grade = F by khallow · · Score: 1

      Losing 5 digits of accuracy (especially if you don't realize you are losing that much accuracy) is really ugly. I see your point there. I was just saying sometimes it is useful.

    7. Re:Engineering Course Grade = F by bunratty · · Score: 1

      When you use floats to do the arithmetic, you lose precision in each operation, and particularly when you multiply two numbers with different scales (exponents).

      Why would multiplying numbers cause you to lose precision? I think a more common way to lose precision would be to subtract two numbers that are nearly equal to each other.

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    8. Re:Engineering Course Grade = F by Anonymous Coward · · Score: 0

      When you use floats to do the arithmetic, you lose precision in each operation

      This is false. You may lose precision, but not always. For example, using IEEE floating point, you can add 2+2 and get exactly 4.

      But, if you don't use those algorithms, or don't use them correctly and carefully, you really cannot assert _any_ precision in the result.

      This is also false. There are plenty of problems which are simple enough to be implemented by a naive programmer and have no trouble with precision.

      On the other hand, there are some operations that are very hard to get right. Large sums is a common problem. I'd also like to say how much I agree with your confidence interval and wish it was standard practice to compute and store that along with the actual data value.

    9. Re:Engineering Course Grade = F by SETIGuy · · Score: 1

      What the article was describing was a situation where the accuracy of the results dropped from 6 significant figures to just 1.

      But what they don't tell you is that the purpose of the program was to convert a floating point number between 0 (inclusive) and 10 (exclusive) into a truncated integer value.

  9. MaDnEsS ! by Airdorn · · Score: 4, Funny

    What? Scientists showing their work for peer-review? It's MADNESS I tell you. MADNESS !

    1. Re:MaDnEsS ! by Anonymous Coward · · Score: 0

      This is not madness, this is Sparta! Now, where's that darn well ...

    2. Re:MaDnEsS ! by Anonymous Coward · · Score: 0

      What? Scientists showing their work for peer-review?

      It's MADNESS I tell you. MADNESS !

      Madness? THIS IS SCIENCE!

    3. Re:MaDnEsS ! by c_sd_m · · Score: 1

      The madness would ensue when we try to find people to review it. I've seen some pretty loony reviewers comments and I can only imagine what they would come up with if we gave them code.

    4. Re:MaDnEsS ! by DarthVain · · Score: 1

      I know it is a joke... but the argument being made by most scientists is that everyone peer.

      Personally I think everything should be open, but I am a dirty hippy.

  10. I'd like to see the code... by argent · · Score: 1

    I'd like to see actual examples of the code failures mentioned in the T experiments paper.

    Or at least Figure 9.

  11. This is not science. by Coolhand2120 · · Score: 4, Insightful
    1. Re:This is not science. by Cyberax · · Score: 2, Insightful

      His colleague was _sued_ (by a crank) based on released FOIA data. It might explain a certain reluctance to disclose data to known trolls.

    2. Re:This is not science. by Idiot+with+a+gun · · Score: 5, Insightful

      Irrelevant. If you can't take some trolls, maybe you shouldn't be in such a controversial topic. The accuracy of your data is far more significant than your petty emotions, especially if your data will be affecting trillions of dollars worldwide.

    3. Re:This is not science. by Mashiki · · Score: 0, Flamebait

      I personally don't care if he was sued by the 4th Emperor of the Lastman Squealing dynasty. Post your work, put it up for review and suck it up buttercup when dealing with scientific review.

      --
      Om, nomnomnom...
    4. Re:This is not science. by Cyberax · · Score: 1

      I demand to post all your work, including all your post-it notes, personal notebooks, and written per-hour documentation on all your movements. Trillions might depend on it!

      1) Do you seriously think that the whole climate science depends on one scientist's data?

      2) CRU was trolled by FOIA requests. They are nuisance to deal with, as far as I was told.

      3) Scientists are people, people have emotions. That's why peer review is used.

    5. Re:This is not science. by Attila+Dimedici · · Score: 1

      His colleague was _sued_ (by a crank) based on released FOIA data. It might explain a certain reluctance to disclose data to known trolls.

      "known trolls" now equals people who have found significant errors in another scientist's released data?

      --
      The truth is that all men having power ought to be mistrusted. James Madison
    6. Re:This is not science. by Idiot+with+a+gun · · Score: 1

      1) Almost. Our politicians are retarded, and more interested in appeasing people than actually fixing things. They'll act on bad data. 2) So? There are trolls on the internet too. They're receiving grants, get some lawyers. 3) Clearly it didn't work too well. 1 sig fig.....

    7. Re:This is not science. by jgtg32a · · Score: 2, Insightful

      Shit like this is why I'm hesitant about going along with Climate Change. I'm in no way qualified to review scientific data, but I can tell when someone is shady, and I don't trust shady people.

    8. Re:This is not science. by Anonymous Coward · · Score: 0

      It seems somewhat unfair to foist the responsibility of trillions of dollars onto a scientist who does not get enough funding to validate his or her own research *to satisfy the trillions of dollars expectation* nor get personally compensated enough to shoulder that responsibility. The scientist is simply trying to uncover some truth: it is the response of the governmental officials that you should be worried about. To place the blame on the scientist would cause such a chilling effect that it would scare away *any* research into the topic at hand.

      I'm also trying to figure out how you equate *being sued* with "petty emotions".

    9. Re:This is not science. by Cyberax · · Score: 1

      1) Not even remotely close. CRU is an important institution, but we also have about 30 of other institutions of varying levels of importance.

      2) Ha. Hahahhahaha. Haha. You have some very strange ideas about the size of grants.

    10. Re:This is not science. by crmarvin42 · · Score: 4, Interesting

      1) Do you seriously think that the whole climate science depends on one scientist's data?

      No, but his work does include suggestions that regulators pay close attention to based on his status within the community. If he were posting on this very same topic, but was not being used as a primary source by regulators then I could see your point. However, that is not the case and theoretical situations are not really relevant.

      2) CRU was trolled by FOIA requests. They are nuisance to deal with, as far as I was told.

      Then hire someone to handle them for you, or have grad students do it.

      I could say the same thing about publishing and peer review. It's a major PITA to get formatting done just right, making sure that those outside of my small sphere of research can understand what I did without getting lost in all of the jargon. Suck it up! It is an unfortunate, but necessary part of doing research at a public institution.

      3) Scientists are people, people have emotions. That's why peer review is used.

      Not sure what this has to do with anything. Peer review is valuable and necessary, but it has never pretended to be about accuracy of the data. It's about cleaning up the presentation so that it is clear, reproducible, and free from OBVIOUS error.

      As a reviewer, I don't know what exactly was done, but if a list of numbers that should add up to 100 instead adds up to 120, then I can catch that. Whether the problem is due to a typo, or sloppy data fabrication, or a computer error is not something I can ascertain. I have to trust that the authors explanation and fix are true and accurate, in which case I am trusting that they are honest, competent and attentive. The more of their data and methodology that they expose to scrutiny, the less faith I have to have and the more I can ascertain for myself directly.

      --
      Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
    11. Re:This is not science. by khallow · · Score: 1

      I demand to post all your work, including all your post-it notes, personal notebooks, and written per-hour documentation on all your movements. Trillions might depend on it!

      Is the prior poster subject to FOIA laws? Do they do work that affects the global economy and billions of people like climatology and global warming does? No? Then it is a failed analogy.

      1) Do you seriously think that the whole climate science depends on one scientist's data?

      Yes. Need I remind you that Jones and the organization he managed, the CRU occupied the most critical niche in climatology, aggregation of wildly different, temperature sensitive data sets into a coherent picture of global temperature from the geological past to present. As far as I know, all climate models have been calibrated on this aggregation of data. It also formed the basis of IPCC announcements and decisions which is the internationally recognized organization for addressing global warming concerns.

    12. Re:This is not science. by Cyberax · · Score: 1

      "Is the prior poster subject to FOIA laws? Do they do work that affects the global economy and billions of people like climatology and global warming does? No? Then it is a failed analogy."

      FOIA in UK laws do not give you permissions to request any data at any time. There are valid and lawful ways to deny them.

      "As far as I know, all climate models have been calibrated on this aggregation of data. It also formed the basis of IPCC announcements and decisions which is the internationally recognized organization for addressing global warming concerns."

      CRU surely is important, but there are other independent aggregated datasetes: http://dss.ucar.edu/datasets/ - just look here.

      Most important data from CRU is in HadCRUT3 dataset, and I don't think its validity was questioned. CRU's analysis of this data (the famous 'hockey stick') was called into a question, but not the data itself.

    13. Re:This is not science. by acoustix · · Score: 5, Insightful

      "Why should I make the data available to you, when your aim is to find something wrong with it?"

      That used to be what Science was. Of course, that was when truth was the goal.

      --
      "A plan fiendishly clever in its intricacies"- Homer Simpson
    14. Re:This is not science. by tibman · · Score: 1

      Security guards get paid 30K$ a year to haul around truckloads of cash. If one of these guys left the doors unlocked, guess who would be at fault? So YES, your guy can shoulder the responsibility. Why would you turn down extra help?

      I personally like it when someone points out a problem in my work.. they go "This is wrong, this will cause all kinds of problems" and i go "ah, good catch *type typy* ok, product is fixed." And when they go "It would be better like this.." i say "that's in your opinion, but why do you think so?" What's the big deal?

      --
      http://soylentnews.org/~tibman
    15. Re:This is not science. by ae1294 · · Score: 3, Insightful

      1) Do you seriously think that the whole climate science depends on one scientist's data?

      Irrelevant, if you use public money to do your research your boss gets all that work.

      2) CRU was trolled by FOIA requests. They are nuisance to deal with, as far as I was told.

      Irrelevant, FOIA requests are part of the deal when you take public money. Don't like it? Don't take public money. The whole idea that FOIA requests can be labeled troll sounds like a very bad idea. I for one don't want to start hearing the government claim that the EFF are trolls and thus are ignoring their FOIA requests.

      3) Scientists are people, people have emotions. That's why peer review is used.

      Irrelevant, ???

    16. Re:This is not science. by khallow · · Score: 1

      FOIA in UK laws do not give you permissions to request any data at any time. There are valid and lawful ways to deny them.

      And as I noted, the most common way is to not be a party required by law to comply with the FOIA. Moving on, just because something is legal to do, which incidentally doesn't appear to be the case with the CRU FOIA requests, doesn't mean you should do it. Hiding important data and computer models is unscientific as noted by the original poster.

      CRU surely is important, but there are other independent aggregated datasetes: http://dss.ucar.edu/datasets/ - just look here.

      Ok, I looked. Not sure what your point was supposed to be. As I understand it, there are three prime aggregators of paleoclimate data, the CRU, a NASA unit headed by James Hansen, and something similar in the NOAA (the last being the only one that isn't running some obvious unscientific agenda). If there are other aggregators, then maybe you could just mention them by name rather than throwing links at me?

      Most important data from CRU is in HadCRUT3 dataset, and I don't think its validity was questioned. CRU's analysis of this data (the famous 'hockey stick') was called into a question, but not the data itself.

      Well, there's a pile of articles from Dr. McIntyre. Many of these criticize HadCRUT3 or its components. So yes, the data itself has been called into question repeatedly.

    17. Re:This is not science. by Anonymous Coward · · Score: 0

      Shit like this is why I'm hesitant about going along with Climate Change...

      I don't think you'll have much of a choice. Climate Change is going to take you along whether you want to go or not.

    18. Re:This is not science. by Znork · · Score: 1

      CRU was trolled by FOIA requests. They are nuisance to deal with, as far as I was told.

      Such requests are easy to deal with: publish all the data, code and any other relevant material on a public site and there is no longer any need for FOIA requests, and there certainly isn't any difficulty dealing with any that come anyway.

      Of course, FOIA requests may be a nuisance to deal with if you'd rather not disclose the data. Which is the actual problem here.

    19. Re:This is not science. by khayman80 · · Score: 1

      Well, there's a pile of articles [climateaudit.org] from Dr. McIntyre. Many of these criticize HadCRUT3 or its components. So yes, the data itself has been called into question repeatedly.

      Look at his peer-reviewed papers and follow their citations in google scholar. If there's a peer-reviewed paper that shows significant flaws in the HadCRUT3 dataset which hasn't been convincingly rebutted, I'd like to know.

    20. Re:This is not science. by Cyberax · · Score: 1

      "And as I noted, the most common way is to not be a party required by law to comply with the FOIA. Moving on, just because something is legal to do, which incidentally doesn't appear to be the case with the CRU FOIA requests"

      Mann was cleared of most of accusations: http://www.research.psu.edu/orp/Findings_Mann_Inquiry.pdf

      "doesn't mean you should do it. Hiding important data and computer models is unscientific as noted by the original poster."

      Certainly. However, models were available to other researchers and most of data is (and was) free.

      "Ok, I looked. Not sure what your point was supposed to be. As I understand it, there are three prime aggregators of paleoclimate data, the CRU, a NASA unit headed by James Hansen, and something similar in the NOAA (the last being the only one that isn't running some obvious unscientific agenda). If there are other aggregators, then maybe you could just mention them by name rather than throwing links at me?"

      Which data exactly do you need? I personally worked with CSIRO HADISST dataset for ice coverage. There is GHCN-Monthly for temperatures. And GPCC from Germans which are also the part of the conspiracy.

      A colleague here tells me that CISL also provides aggregated datasets: http://www.cisl.ucar.edu/dss/

      And anything (and I really mean _anything_) Steve McIntyre says is probably false and/or misleading.

    21. Re:This is not science. by IICV · · Score: 1

      If you're not qualified to review someone's scientific data, how do you know that he's shady? Is it because your shady-detector goes ping when you wave it over him? Then how do you quantify the accuracy of your shady-detector? How do you insulate it against the interference of the media, who love to paint things in an exaggerated light? How do you isolate it from the effect of your own emotions - after all, if global warming is happening, then you should probably behave in a different way and you might not want to.

    22. Re:This is not science. by bunratty · · Score: 1

      What about physics? Schön was a physicist who was discovered to have published papers based on outright fraudulent research. Do you not "go along" with physics because they're "shady people"?

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    23. Re:This is not science. by bunratty · · Score: 1

      It seems to me there's a difference between taking someone's data and doing your own science with it, and taking someone's data and nitpicking their science with it. Sure, you can go back and nitpick the science that Newton, Mendel, and Millikan did, but you're not showing that their conclusions were wrong. You're never going to show that a hypothesis is incorrect by critiquing the science of others. You're going to have to do your own science, and reach your own conclusions.

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    24. Re:This is not science. by khallow · · Score: 1

      Mann was cleared of most of accusations: http://www.research.psu.edu/orp/Findings_Mann_Inquiry.pdf

      Two things to recall here. First, Pennsylvania State University has a strong interest in clearing Mann's name. The investigation was not unbiased. Second, Mann was not a member of the CRU nor responsible for the attempts to evade FOIA requests there. If PSU had bothered, they'd have found me clear of most accusations too for similar reasons.

      Certainly. However, models were available to other researchers and most of data is (and was) free.

      Not the code though. And the identity of the data sources for IP protected data?

      Which data exactly do you need? I personally worked with CSIRO HADISST dataset for ice coverage. There is GHCN-Monthly for temperatures. And GPCC from Germans which are also the part of the conspiracy.

      Let me restate my concern here since I wasn't clear the first time. There's as far as I know only three paleoclimate aggregations that estimate global temperatures before modern times. Are there more of them?

      Also, CISL doesn't appear to aggregate data, it just provides it.

      And anything (and I really mean _anything_) Steve McIntyre says is probably false and/or misleading.

      So what? It's still criticism (and he has been other than "probably false and/or misleading" at times in the past, remember the "hockey stick" complaint?). You were claiming that the data hadn't been called into question even though here is an example where it has been.

    25. Re:This is not science. by khayman80 · · Score: 1

      It's still criticism (and he has been other than "probably false and/or misleading" at times in the past, remember the "hockey stick" complaint?).

      The NAS report found no significant problems with Mann's 1998 reconstruction, and it's been confirmed repeatedly by independent teams.

    26. Re:This is not science. by Anonymous Coward · · Score: 0

      But the scientists are not hired explicitly to direct policy. They're hired and paid to uncover truths. That's it.

      I think you'd be a bit annoyed if you had the same non-programmers coming up to you day in and day out claiming that you were fraudulent in your programming practices when you knew they had no evidence whatsoever and were just raging maniacs. Now let's take it one step further as was illustrated above and suggest that this same guy who said "it would go better like this ... " THEN SUES YOU and see how willing you are to talk to this individual.

    27. Re:This is not science. by Anonymous Coward · · Score: 0

      2) CRU was trolled by FOIA requests. They are nuisance to deal with, as far as I was told.

      Irrelevant, FOIA requests are part of the deal when you take public money. Don't like it? Don't take public money. The whole idea that FOIA requests can be labeled troll sounds like a very bad idea. I for one don't want to start hearing the government claim that the EFF are trolls and thus are ignoring their FOIA requests.

      I think you need to be very careful about the "you don't like it? Don't take public money" type attitude. This is used to shit all over public servants and people who work within the public sector. You know what happens? People stop working in the public sector, because they're tired of getting shit all over. At some point, all the bright people figure out that you can get much, much more done outside of this idiotic framework and then you're left with only sycophants and bureaucrats.

      I've found this type of thinking quite prevalent in the USA in particular. "You're in the public sector, therefore I own you and you must jump through all these ridiculous hoops." followed up by "why is the public sector so ineffective, the public sector sucks and we should privatize everything ..."

    28. Re:This is not science. by jgtg32a · · Score: 1

      "Why should I make the data available to you, when your aim is to try and find something wrong with it"
      In grade school I was taught that reproducibility was a required part of science, was I misinformed?
      If there is nothing wrong why won't he release it?
      Is it not logical to think he is hiding something, especially when the fundamentals of Sci say that you must be able to reproduce your work

    29. Re:This is not science. by Anonymous Coward · · Score: 0

      This is, unfortunately, a very naive approach. No data is perfect - it all requires an understanding and correct interpretation. Aside from the effort to perfectly catalogue every single experiment, observation and conclusion (I'm talking aside from that!!! Do you realize how much effort this is, to go beyond standard and required note-taking to documenting-everything-no-matter-how-insignificant-or-incorrect???)

      It still takes a keen eye and mind to understand what data is relevant and which is not. Sometimes the experiment has failed, and an untrained eye might not be able to distinguish important data from unimportant data.

      Of course, it's even worse when the eye perusing the data has an agenda to prove, and the current political climate is certainly ripe to be perverted by these people. Data is not always so clear cut as to prove unequivocally only one interpretation, especially when one party is not above deliberate distortion.

      One saying goes "There is no statement a scientist can make that another scientist may not find fault". I'm sorry, data just isn't perfect.

    30. Re:This is not science. by khallow · · Score: 1

      I see that they found no significant problems with the McIntyre and McKitrick papers either.

    31. Re:This is not science. by khallow · · Score: 1

      How about this tidbit where the UK Met denies the FOIA request to access CRUTEM3 data and claims that "records were not kept" of where the data came from. Where is the convincing rebuttal for the years of runaround from the CRU, UK Met, and associated parties?

    32. Re:This is not science. by khayman80 · · Score: 1

      I see that they found no significant problems with the McIntyre and McKitrick papers either.

      They weren't convened to critique the MM03/05 papers, so describing MM's misunderstandings of selection rules in principal component analysis would be outside the scope of the report. I've listed some peer-reviewed papers here (see item 7d in the index-- ~3 pages from the top) which cover those topics in more detail.

    33. Re:This is not science. by khayman80 · · Score: 1

      How about this tidbit where the UK Met denies the FOIA request to access CRUTEM3 data and claims that "records were not kept" of where the data came from. Where is the convincing rebuttal for the years of runaround from the CRU, UK Met, and associated parties?

      Please note that I asked for a peer-reviewed paper, which would contain some kind of physics-based argument. Conspiracy theories bore me; science is really much more interesting!

    34. Re:This is not science. by Belial6 · · Score: 1

      1) Yes. This one, that one, the other one, etc...

      2) FOIA requests would be MUCH easier to handle if everything were published. Just give them a link to the publicly available data and code. The FOIA request are made because the data and code are not publicly available, yet are being used to influence public policy.

      3) "Peer" is a loaded term. What is a peer? Is it someone that has all of the same biases as you? Is it someone that understands the material as well as you? Is it someone who has the same pieces of paper as you? Peer review is not an excuse to hide your work. This becomes an issue of "Who polices the police?", which is often the argument when corruption and/or incompetence is rampant, and the people with the power don't want to be outed.

    35. Re:This is not science. by ralphbecket · · Score: 1

      I think you're confusing verification ("you did what you claimed you did") with reproduction ("I tried my own experiment and got similar results").

      Put it another way, if I write a paper saying, "I have solved this [very difficult experimental problem]. The answer is seven." You have every right to ask me to show my working. If you can find an error in that then you know you don't have to move on to the much harder task of independently reproducing my result. Of course, the answer could still be seven, but not on the basis of my faulty reasoning.

      If, on the other hand, I refuse to show you my working, you have to take it on faith that seven really is the answer I got. Even if you try your own experiment and also get seven, that isn't really reproduction.

    36. Re:This is not science. by khallow · · Score: 1

      Please note that I asked for a peer-reviewed paper, which would contain some kind of physics-based argument. Conspiracy theories bore me; science is really much more interesting!

      I find your preferences irrelevant to this thread. Peer-reviewed papers aren't the only means of conveying knowledge. Cyberax claimed that this data set wasn't disputed. I showed otherwise. To be blunt, I don't know how to judge McIntyre's claims or the "convincing rebuttals". He sounds like something of a crank, but a crank who has gotten at least one or two things right.

      What I do know how to do is evaluate behavior of people for signs of fraud or incompetence. CRU and its allies (like the UK Met) exhibit such behavior. Given the stakes of AGW, I wouldn't be surprised to find that the CRU (and the corresponding NASA group under James Hansen) have been ideologically compromised and pumping out biased research for years. For example, we have:

      1. Supported by governments (for the CRU they are the UK under Blair and Brown, the EU, and the UN) that are invested in promoting AGW. That's even bigger than the oil companies.
      2. Hides its data and code from outsiders. Has little accountability to anyone. Controls information that is unmatched except in a very few places (the paleoclimate temperature estimates for the CRU, satellite data for the Hansen group, various climate models).
      3. Huge amount at stake. There is ample economic motive to generate biased science on the scale of the CRU and its associates.
      4. Makes extreme claims such as last Fall's paper expecting a 6C rise by the end of the century.
      5. A number of the scientists and bureaucrats in the program exhibit ideological bias. The infamous CRU emails show a lot of this bias.
    37. Re:This is not science. by khallow · · Score: 1

      They weren't convened to critique the MM03/05 papers, so describing MM's misunderstandings of selection rules in principal component analysis would be outside the scope of the report. I've listed some peer-reviewed papers here (see item 7d in the index-- ~3 pages from the top) which cover those topics in more detail.

      I see no evidence of your claim and they did discuss the MM papers. Second, the two papers you mention (Rutherford 2005 and Wahl and Ammann 2007) are based on CRU data, the Rutherford paper even has Jones and Mann as coauthors. There was ample opportunity to cook (deliberately or via unintentional observer bias) the CRU estimates to restore the hockey stick by 2005.

    38. Re:This is not science. by bunratty · · Score: 1

      There's a large body of research that shows AGW is happening. If someone points out a flaw in one paper that comes to the conclusion that AGW is happening, they have not shown anything about AGW. But what I hear many people say is that errors in AGW papers means that they don't trust claims about AGW. What you would need to do to show AGW isn't happening is to write a paper that comes to the conclusion that AGW isn't happening. No amount of nitpicking or critiquing papers in favor of AGW can ever show that the hypothesis of AGW is incorrect. Can you understand?

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    39. Re:This is not science. by bunratty · · Score: 1

      Reproducability doesn't mean you take someone else's data and get the same results. Reproducability means you do your own experiment (i.e your own design, your own data collection, your own analysis) and come to the same conclusion. There are hundreds of papers that come to the conclusion that AGW is happening. Therefore, the result has been reproduced. If you want to show it isn't happening, do your own experiment and show how you came to the conclusion that it isn't.

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    40. Re:This is not science. by ae1294 · · Score: 1

      I own you and you must jump through all these ridiculous hoops."

      I really don't understand which hoops you find to be so ridiculous or for that matter how you equate needing to respond to FOIA requests to being shit on. If you were to work in the private sector you would respond to them every day when your boss tells you to pull up what you've been working on or when he checks the companies servers to see who you've been emailing...

      all the bright people figure out that you can get much, much more done outside of this idiotic framework

      You can get out of one idiotic framework and into another but as far as I'm aware you can never escape them all. Working for someone, be it public or private, always has negatives as does working for one's self. The whole world is one big idiotic framework of laws, rules, procedures and the like.

    41. Re:This is not science. by khayman80 · · Score: 1

      While peer-reviewed papers aren't always correct, their signal to noise ratio is far higher than blogs, so I highly recommend learning science from them rather than the rantings of economists and miners. If you seriously think the overwhelming majority of the scientific community is spectacularly incompetent or involved in an evil conspiracy, then there's very little I can do. After all, that means I'm a drooling idiot or a conspirator too, right? I see no point in a conversation like that. Have a nice day.

    42. Re:This is not science. by ralphbecket · · Score: 1

      Can you understand?

      Charming. But let me address what you said anyway. Consider a paper X.

      (1) Verification is the first step one takes before moving on to replication. That is, you check that X's conclusion is supported by the data and methods. If it isn't, you don't need to take X seriously any more. There may be other papers with similar conclusions, but that in no way makes X's "(presumed?) right answer, wrong method" a contribution to our understanding of nature. Similarly, if the authors of X make verification impossible by refusing access to their data or code, I can't honestly build an argument on X (science is not about taking each other's word for things).

      (2) If I find that paper X is substantially flawed or simply cannot be verified, then my confidence in other papers that build on X or use X as a supporting argument is likewise reduced.

      (3) If there are enough flawed papers like X or papers that depend on X then I might start to wonder whether the scientific method is being properly applied (clearly peer review or verification hasn't worked in these cases).

      (4) I am not out to disprove AGW. It is, however, up to the people advancing AGW as a theory to present a convincing case. I do not consider talk of concensus to be serious argument; I would be swayed by a coherent theory backed up by experimental evidence (and when I say backed up, I mean strongly so, not just plausibly within some huge error bounds).

      For what it's worth, I don't have a problem with the A in AGW; what I have yet to see defended to my satisfaction is the work used to justify the scare stories motivating what seems to me to be a lot of very expensive legislation which will leave all of us worse off.

    43. Re:This is not science. by khayman80 · · Score: 1

      ... they did discuss the MM papers

      Of course, chapters 9 and 11 each mention McIntyre 3 times. Each time, their claim is briefly but not extensively discussed because their conclusions on page 117 include: "The instrumentally measured warming of about 0.6C during the 20th century is also reflected in borehole temperature measurements, the retreat of glaciers, and other observational evidence, and can be simulated with climate models."

      As far as I can tell, the largest caveats to emerge from the NAS report are concerns about the uncertainty estimates (especially prior to 1600 CE) and this point on page 115: Even less confidence can be placed in the original conclusions by Mann et al. (1999) that "the 1990s are likely the warmest decade, and 1998 the warmest year, in at least a millennium" because the uncertainties inherent in temperature reconstructions for individual years and decades are larger than those for longer time periods, and because not all of the available proxies record temperature information on such short timescales.

      Second, the two papers you mention (Rutherford 2005 and Wahl and Ammann 2007) are based on CRU data, the Rutherford paper even has Jones and Mann as coauthors.

      My point is that those papers can't be affected by the claimed MM PCA "mistake" because they use different methodologies.

      There was ample opportunity to cook (deliberately or via unintentional observer bias) the CRU estimates to restore the hockey stick by 2005.

      I've already linked the results of independent temperature reconstructions. And last year I said: Each time series in the graph I previously linked is referenced in chapter 6 here. Turn to page 469 and examine Table 6.1 (later, if you get bored, consider checking out column 2 of page 466 which reviews the claims of MM03 and MM05.) Every time series is referenced well enough to be found on google scholar-- for example here's one of them. As you've seen from the graph, they all support the abrupt temperature increase in Mann's graph. (I freely admit that all these authors could be drooling morons, sheeple incapable of independent thought, or evil conspirators... any of these scenarios or a linear combination of them would completely discredit my position.)

      Notice how all these reconstructions are consistent. Most interesting is PS2004, which reconstructs past temperatures using a borehole. By measuring the temperature of the ground at various depths, past temperatures can be reconstructed using heat conduction equations.

      This isn't based on CRU data at all, yet is consistent with it. That's not too surprising, because there's no evidence that the CRU data has been "cooked" as you imply.

    44. Re:This is not science. by MadMagician · · Score: 1

      Irrelevant. If you can't take some trolls, maybe you shouldn't be in such a controversial topic. The accuracy of your data is far more significant than your petty emotions, especially if your data will be affecting trillions of dollars worldwide.

      First, that sounds a lot like "if you're not willing to get beat up by my goons, don't say things I don't like."

      Second, your emotional attachment to dollars seems to be driving your brain.

    45. Re:This is not science. by MadMagician · · Score: 1

      "Why should I make the data available to you, when your aim is to find something wrong with it?"

      That used to be what Science was. Of course, that was when truth was the goal.

      That's still the goal of Science.

      But it's not the goal of everyone. Just as with tobacco and cancer, there are a lot of people with vested interests.

      But the ice is melting.

    46. Re:This is not science. by khayman80 · · Score: 1

      Just FYI, whenever I get a chance I'm going to copy my comments to my blog, which include your statements in blockquotes as above. As per usual, links will be provided to this page. I do this because I like to have a central archive of all my statements on a single page so I can find references using ctrl-F.

    47. Re:This is not science. by khayman80 · · Score: 1

      My comments have been copied here.

    48. Re:This is not science. by khallow · · Score: 1

      If you seriously think the overwhelming majority of the scientific community is spectacularly incompetent or involved in an evil conspiracy

      It doesn't have to be the overwhelming majority of the scientific community. It just has to be two or three gatekeeper groups controlling paleoclimate temperature data. It's feasible to suborn them. Globally, we're talking hundreds of billions to trillions of dollars per year at stake just in money, huge, powerful bureaucracies to control carbon emissions for an entire country, the opportunity to shape humanity.

      Having said that, there's a simple solution in my view. Wait and see. Even if there is a spectacularly incompetent scientific community or a vast evil conspiracy, they won't be able to get reality to fit over the span of coming decades. If there really is substantial global warming occurring (and I grant part of that may be masked by other kinds of pollution like particulate matter and sulfur dioxide), the effects will become too obvious to explain away with all but the more bug eyed-crazy conspiracy theories.

    49. Re:This is not science. by Lars+T. · · Score: 1

      His colleague was _sued_ (by a crank) based on released FOIA data. It might explain a certain reluctance to disclose data to known trolls.

      A crank who has yet to release his code that gives results that supposedly shows that everybody else's code is wrong. Gee, don't you wish this were a two-way street?

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    50. Re:This is not science. by Cyberax · · Score: 1

      "A crank who has yet to release his code that gives results that supposedly shows that everybody else's code is wrong. Gee, don't you wish this were a two-way street?"

      All code used in analysis was released several years ago and is public.

      Also, it's documented here: http://www.realclimate.org/index.php/data-sources/

    51. Re:This is not science. by Anonymous Coward · · Score: 0

      Way to completely misinterpret. I'm basically saying "Be right, and prove it". If you're government funded, your research and your methods belong to all of us. If public policy is going to be made on your numbers, you need to make sure your numbers are correct. When you say "your emotional attachment to dollars", basically you're saying "I want to make my decision on what I think is right, regardless of whether or not it's proven."

    52. Re:This is not science. by PastaLover · · Score: 1

      2) CRU was trolled by FOIA requests. They are nuisance to deal with, as far as I was told.

      Then hire someone to handle them for you, or have grad students do it.

      Grad students? Not a good idea if you don't want to get sued. And hiring someone to handle them for you, yeah, because scientists are really raking in the cash, right?

      It's not that I don't think you're right about the basic point but I don't think you thought through this particular comeback very well.

    53. Re:This is not science. by crmarvin42 · · Score: 1

      I have to disagree. You can have the grad students do the majority of the work and then verify that it has been done right yourself, before submitting the files. As a graduate student, I wrote grants (successfully), designed and ran studies, and wrote up the 1st draft of all final reports for studies that had corporate sponsors (most of them). Graduate students, especially PhD students are just as capable as a new faculty member. It'll be good practice for those PhD's that will end up as faculty and facing FOIA requests themselves in short order.

      Alternatively, if he decides to hire someone to handle the FOIA requests for him it wouldn't come out of his personal salary (which is paid by the university independent of current funding levels). It would come out of his operating budget. It might mean that he gets to hire one less graduate student, but it's all about priorities. My PhD advisor made about $112k/year, but spent far more than that each year on paying grad students (6 x 16 to 19k/year for MS and PhD student salaries without including ancillary costs like insurance, tuition remission, etc.). Never mind paying his lab tech, Post-Doc, secretary, actual cost of research projects, the Universities cut (around 50 cents for every $1 of grant money) for keeping the lights on, travel expenses for himself and students, etc.

      Researchers may not be rich, but they do have access to a lot of money if they are successful.

      --
      Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
  12. Slashdot Egocentrism. by stewbacca · · Score: 2, Insightful

    My bet is there is a simple explanation...namely that scientists outside of computer science are too busy in their respective fields to know anything about code, or even care. The egocentric Slashdot-worldview strikes at the heart of logic yet again.

    1. Re:Slashdot Egocentrism. by quadelirus · · Score: 2, Interesting

      Unfortunately computer science is pretty closed off as well. Too few projects end up in freely available open code. It hinders advancement (because large departments and research groups can protect their field of study from competition by having a large enough body of code that nobody else can spend the 1-2 years required to catch up) and it hinders verifiability (because they make claims on papers about speed/accuracy/whatever and we basically have to stake it on their word and reputation and whether it SEEMS plausible--this also means that surprising results from lesser known researchers might be less likely to get published).

      I think it our duty as scientists to ALWAYS release the code, even if it is uncommented and unclean. I'm very glad to be researching under an advisor who requires that we always release our code as open source after papers have been published so that other groups can build on what we've done. This should absolutely be universal.

    2. Re:Slashdot Egocentrism. by FlyingBishop · · Score: 2, Insightful

      What's your point? If a Biologist has no understanding of code, they have no business running a simulation of an ecological system. If a physicist has no understanding of code, they have no business writing software to simulate atomic processes. If a Geneticist has no understanding of code, they have no business writing software that does pattern matching across genes.

      Those who don't want to write software to aid in their research may continue not to do so (and continue to lose relevance.) But if they're going to use software, they have to use best practices. To do otherwise likewise makes their work quickly fading in relevance.

    3. Re:Slashdot Egocentrism. by Anonymous Coward · · Score: 0

      In my department (applied mechanics), that probably holds true for alot of cases (which is also very troubling, as anyway who reads your work will almost certainly want to look at the code eventually), but there definitely is a large part of egoism behind this as well.

      I honestly don't understand why, almost all the times, the code they have been tinkering with for a decade is not even remotely good enough to be sold anyway. And they already have a good job that pays well, so most of them probably wants a good reputation to go with it. The easiest way to do this is to open up the code. Would I ever have heard of X and Y if it wasn't for them releasing their meshing or computational code in some FLOSS license? No fucking way.

    4. Re:Slashdot Egocentrism. by AlXtreme · · Score: 3, Interesting

      that scientists outside of computer science are too busy in their respective fields to know anything about code, or even care.

      If their code results in predictions that affect millions of lives and trillions of dollars, perhaps they should learn to care.

      What I've personally seen of scientists is a frantic determination to publish papers anywhere and everywhere, no matter how well-founded the results in those papers are. The IPCC-gate is merely a symptom of a deeper problem within scientific research.

      If scientists are too busy because of publication quota's and funding issues to focus on delivering proper scientific research, maybe we should question our current means of supporting scientific research. Currently we've got quantity, but very little quality.

      --
      This sig is intentionally left blank
    5. Re:Slashdot Egocentrism. by Rising+Ape · · Score: 4, Insightful

      Nonsense, they're not trying to produce code, they're trying to produce science. It doesn't matter how ugly the code is, or how inefficient, as long as it produces correct answers. Since software engineering "best practices" seem to change every week (and do not prove program correctness in any case), what are they supposed to do, spend huge amounts of time learning as much as a professional software engineer would? Do you do that for all the tools you use?

      Does anyone have any evidence that the code is *wrong*? I.e. does it actually produce significantly wrong answers? I suspect not - this is just the latest FUD-spreading trick.

      This is just typical programmer "when your tool's a hammer" mentality. Software's not the most important thing in the world, and science has better ways to verify correctness - have several independent analyses of the same thing for example, or different ways of measuring the same thing to check for consistency.

    6. Re:Slashdot Egocentrism. by crmarvin42 · · Score: 1

      I tried finding an introductory class in programing for non-CS students while in graduate school. I was unable to find anything in the CS department, and instead found an introduction to programming for the life sciences. It was taught by a professor with a joint Biology/CS appointment. The class had been around for 5 years off and on so that his students could get class credit for him teaching them basic perl, html, and MySQL.

      Up until that point I was forced to buy programming books and teach myself. If you want researchers outside of CS departments to have high quality, documented, easily readible code it might make sense for classes to be offered that are targeted at students from the rest of the university. Kind of like statistics classes or english classes targeted at the rest of the community.

      --
      Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
    7. Re:Slashdot Egocentrism. by Daniel+Dvorkin · · Score: 1

      scientists outside of computer science are too busy in their respective fields to know anything about code, or even care

      This is true, but it's not an excuse. If your interpretation of the data, or even the data itself, depends on your code, then if you say "I don't care about the code" then you're really saying "I don't care about the science." Scientists tend to be very picky about the quality of their lab equipment, as well they should be, but all too often are willing to let sloppy, untested code make that quality pretty much meaningless.

      As a bioinformaticist who used to work in industry as a programmer and DBA, I've spent an enormous amount of time going through other people's code just to get it to the point where it even starts to make sense, so I can be sure it does what it's claimed to do. And very often, it doesn't, which means I have to spend more time fixing it. I'm always happy to do this, and my collaborators are usually happy for my fixes ... but it really has to mean that there are an awful lot of published papers out there which depend on code that's never been through this kind of review.

      Scientists who want to teach themselves good software engineering should do so. They're certainly capable of it; if you can get a PhD in any scientific subject, you can damn well learn to program at the level of a competent industry developer. Those who don't (and I don't blame them, since they are, after all busy doing other things) need to hire assistants who have, or are willing to gain, the necessary level of knowledge. It makes absolutely no sense to buy a half-million dollars worth of lab equipment and then run the output through code written for the equivalent of a hundred dollars worth of work.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
    8. Re:Slashdot Egocentrism. by c_sd_m · · Score: 2, Interesting

      What I've personally seen of scientists is a frantic determination to publish papers anywhere and everywhere, no matter how well-founded the results in those papers are. The IPCC-gate is merely a symptom of a deeper problem within scientific research.

      They're trained for years on a publish or perish doctrine. Either they have enough publications or they get bounced out of academia at some threshold (getting into a PhD, getting a post doc, getting a teaching position, getting tenure, ...). Under that pressure you end up with just the people who churn out lots of papers making it into positions of power. In some fields you're also expected to pull in significant research funding and there are few opportunities to do some without corporate partnerships. So if you're going to fund students to publish papers, you need to accept limits on what you can publish. The only alternative is to leave the field.

      There's no shortage of problems with the research community these days.

    9. Re:Slashdot Egocentrism. by Troed · · Score: 1

      I suggest letting the scientists do science - i.e the models - and let software engineers do the implementation.

      I'm quite sure your scientisits don't go out and chop wood to produce their own desks and paper either, right?

      (... and when the scientists want to do statistics, they should consult statisticians)

    10. Re:Slashdot Egocentrism. by david_thornley · · Score: 1

      If their code results in predictions that affect millions of lives and trillions of dollars, perhaps the people giving out the grants should budget for some real programming types rather than expect the scientists to get necessary software done somehow or another.

      Just because people throw around large numbers of dollars in discussions doesn't mean any of that gets to the actual scientists trying to get their work done.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    11. Re:Slashdot Egocentrism. by Anonymous Coward · · Score: 0

      Does anyone have any evidence that the code is *wrong*?

      Well, no... because they won't let us see the CODE THEY USE. That's kinda the crux of the problem here.

    12. Re:Slashdot Egocentrism. by stewbacca · · Score: 1

      Like I said, I know plenty of scientists that can't even run a spreadsheet formula like, =SUM(cell:range), let alone write lines of code.

      My car analogy is a race car driver doesn't need to know how to build a race car, only how to drive one. I imagine scientists are the same way. They use technology to crunch numbers, but don't know how the numbers are crunched (nor should they, as long as it is a peer-reviewed/industry accepted method).

    13. Re:Slashdot Egocentrism. by jpmorgan · · Score: 1

      How could we tell? The predictions being made are far off in the future. And the predictions made to date have been, in fact, wrong.

      It's fine to explore a world beyond what is visibly apparent. This occurs all the time in mathematics and theoretical physics. But when we leave the realm of immediate testability, we must rely upon strict rigor to ensure that our results are still correct. Looking at the source-code is a critical step in validating rigor.

    14. Re:Slashdot Egocentrism. by Anonymous Coward · · Score: 0

      AGREED!

    15. Re:Slashdot Egocentrism. by SETIGuy · · Score: 1

      I suggest letting the scientists do science - i.e the models - and let software engineers do the implementation.

      Do you have any idea what an NSF review panel would do to my proposal if I suggested paying a programmer? It would not be pretty. An NSF budget in my field is less than 10% of a faculty member's salary, plus a grad student, and maybe a trip to a conference. There's no way we would get more than 10% of a programmers salary from an NSF grant. Unless we get 20 million people to suggest that congress raise their taxes to pay for scientific programmers.

    16. Re:Slashdot Egocentrism. by SETIGuy · · Score: 1

      And the predictions made to date have been, in fact, wrong.

      What universe are you living in? The globe is warming rapidly in the universe the rest of us inhabit.

    17. Re:Slashdot Egocentrism. by enstrophy · · Score: 1

      I suggest letting the scientists do science - i.e the models - and let software engineers do the implementation.

      I'm quite sure your scientisits don't go out and chop wood to produce their own desks and paper either, right?

      (... and when the scientists want to do statistics, they should consult statisticians)

      Generally, the research grants I know of have just enough money to cover a grad student, some summer salary, travel to conferences. Software engineers as valuable as they may be, aren't in the budget. When the tools are available we do refrain from chopping wood, but often it is more cost effective to develop models in fairly dynamic research codes.

    18. Re:Slashdot Egocentrism. by Troed · · Score: 1

      I'm sorry, are you trying to say that it's ok to do shoddy science (yes, if you rely on bad software the science also becomes bad) just because your grants aren't large enough?

      (I'm quite sure you wouldn't need more than 10% of someone's time though - it's not like you need a full time developer since the code currently isn't likely being done full time and a proper developer would need a lot less time to do it anyway)

    19. Re:Slashdot Egocentrism. by SETIGuy · · Score: 1

      I'm sorry, are you trying to say that it's ok to do shoddy science (yes, if you rely on bad software the science also becomes bad) just because your grants aren't large enough?

      You are operating under a wrong assumption. You are assuming that only a professional developer can write code that functions properly and that there is no way to test whether a program is functioning properly. Moreover you are assuming that a typical software engineer would know whether 2nd order Runge-Kutta is sufficient for a problem or whether 4th order Runge-Kutta is required. Does a typical software engineer know when it's appropriate to add an artificial viscosity to a fluid mechanics problem? After all, if you look at the Euler equations, that would seem like an error. In interpolating across points in a spectrum, would you know why I'm weighting points by a sin() function rather than using straight linear interpolation? Is that an error or not? When is it proper? When is it improper?

      A programmer might be adept at detecting off by one errors, but pre-existing tools and a graduate student can do the same thing.

    20. Re:Slashdot Egocentrism. by Troed · · Score: 1

      Please read through all other posts by me in this thread. You're mistaken.

    21. Re:Slashdot Egocentrism. by PastaLover · · Score: 1

      My bet is there is a simple explanation...namely that scientists outside of computer science are too busy in their respective fields to know anything about code, or even care. The egocentric Slashdot-worldview strikes at the heart of logic yet again.

      I think it's worse than that. With some of the focus these days being on doing science in public-private partnerships (because the public money simply isn't there anymore) and generating spinoffs from ongoing research, the actual software often gets labeled "Intellectual Property". You can see where I'm going with this. Suddenly software is an asset, not to be shared openly with the rest of the world. Luckily this is a mindset the actual researchers almost never share and as a result plenty of software is out there in an open source form. In my field (bioinformatics) having freely available software out there is the norm rather than the exception, but that certainly doesn't apply to all research areas everywhere. In any field however I don't think you could get away with publishing results that nobody can verify (by rolling their own) because the basic algorithm is secret. "Trust me" doesn't quite cut it, unless you're publishing in Cranks R Us.

      You do come across the occasional result in a paper that is just not well documented enough to reproduce, even if you were to write all the software yourself. Any scientist would agree that this is just plain bad science (though you shouldn't assume it was done intentionally, often isn't). In a decent journal, the review process should catch that, which I think will increasingly be implemented in the future as people are more aware of the issue.

  13. This is news? by andyh-rayleigh · · Score: 1

    Nothing seems to change ...
    30 years ago it was a standard joke that most "fundamental particles" were bugs in the Fortran programs of the day.

    I wouldn't be surprised to discover that some of the programs inestigated are just the result of 30 years of further modification of the ones we knew ... and that nobody understands them now!

  14. Re:Why release it? by Anonymusing · · Score: 1

    And it's situations like this which make the general public distrust scientists, or even science in general.

    The media plays a major role, as well -- it oversimplifies and dramatizes scientific research as if it comes to conclusions that it usually doesn't -- but when it comes to light that a scientist has made a mistake, or that a research paper has had false premises or inaccurate results, then the average Joe Public thinks to himself, "Can't trust those scientists. Shoulda known."

    --
    Liberal? Conservative? Compare perspectives at Left-Right
  15. Re:Why release it? by Anonymous Coward · · Score: 0

    You are correct that it's not a conspiracy: it's more of an organic phenomenon. Politicians do not generally gain renown and reelection for doing nothing. So research dollars go to people who can show politicians that they should be doing something. Eventually, the profession selects for people who already have a certain lens for looking at the world. And here we are.

  16. Peer Review vs. Funding by stokessd · · Score: 2, Informative

    I got my PhD in fluid mechanics funded by NASA, and as such my findings are easily publishable and shared with others. My analysis code (such as it was) was and is available for those would would like to use it. More importantly my experimental data is available as well.

    This represents the classical pure research side of research where we all get together and talk about our findings and there really aren't any secrets. But even with this open example, there are still secrets when it comes to ideas for future funding. You only tip your cards when it comes to things you've already done, not future plans.

    But more importantly, there are whole areas of research that are very closed off. Pharma is a good example. Sure there are lots of peer reviewed articles published and methods discussed, but you'll never really get into their shorts like this guy wants. There's a lot that goes on behind that curtain. And even if you are a grad student with high ideals and a desire to share all your findings, you may find that the rules of your funding prevent you from sharing.

    Sheldon

    1. Re:Peer Review vs. Funding by PhilipPeake · · Score: 4, Insightful

      ... and this is the problem. The move from direct government grants to research to "industry partnerships".

      Well, (IMHO) if industry wants to make use of the resources of academic institutions, they need to understand the price: all the work becomes public property. I would go one step further, and say that one penny of public money in a project means it all becomes publicly available.

      Those that want to keep their toys to themselves are free to do so, but not with public money.

    2. Re:Peer Review vs. Funding by Anonymous Coward · · Score: 0

      Actually, this is one reason why many people are, rightly or wrongly, so skeptical nowadays of scientific studies funded by pharmaceutical companies.

  17. Does this apply to climate deniers too? by Anonymous Coward · · Score: 0

    Will ExxonMobil release all code that their scientists use?

    1. Re:Does this apply to climate deniers too? by harvey+the+nerd · · Score: 1

      ExxonMobil, et al, buy their large scale simulators that their business depends on from commercial 3rd parties.

    2. Re:Does this apply to climate deniers too? by Anonymous Coward · · Score: 0

      You software mean like Mathematica etc, bought from 3rd parties but running code (=scripts) written by ExxonMobil 'scientists'.

    3. Re:Does this apply to climate deniers too? by harvey+the+nerd · · Score: 1

      Exxon's are called reservoir simulators. Used to be, big companies in many industries had in-house simulators, pipeline, chemical, nuclear, etc. Decades ago I had professors, e.g. mathematical sciences and engineering, that worked on the mathematics and implementations of various early versions, we students also did derivations from the general equations (PDE) of everything. Errors, simulator abuses and wishful fantasies, similar in nature to today, cost companies millions, perhaps even billions. The difference is that young turks running companies' simulators were much more accountable. People could and did get fired quickly for fewer costly errors than what is going with the climate comedy channel.

  18. Absolutely by RandCraw · · Score: 1

    Aside from logistics, there is no excuse for not doing this. In my experience, software innovations are notoriously sensitive to subtleties in input data (e.g. data mining, AI, image processing). Posting both code & data (and a test driver, of course) should be mandatory for all publications that claim to have found a signal in data, better or faster.

    The question is, how to maintain code & data long after the publication publishes? IMHO, any peer-reviewed publication should be required to maintain such a repository for perhaps 20-30 years, ideally under a GPL (or its kin) so access to it would be free in perpetuity.

    Maybe such a service would finally justify peer reviewed pubs' exorbitant fees for non-subscriber access.

  19. That's all wrong by Gadget_Guy · · Score: 2, Interesting

    The scientific process is to invalidate a study if the results cannot be reproduced by anyone else. That way you can eliminate all potential problems like coding errors, invalid assumptions, faulty equipment, mistakes in procedures, and 100 of the other things that can produce dodgy results.

    It can be misleading to search through the code for mistakes when you don't know which code was eventually used in the final results (or in which order). I have accumulated quite a lot of snipits of code that I used to fix a particular need at the time. I am sure that many of these hacks were ultimately unused because I decided to go down a different path in data processing. Or the temporary tables used during processing is no longer around (or in a changed format since the code was written). There is also the problem of some data processing being done by commercial products.

    It's just too hard. The best solution is to let science work the way it has found to be the best. Sure you will get some bad studies, but these will eventually be fixed over time. The system does work, whether vested interests like it or not.

    1. Re:That's all wrong by quadelirus · · Score: 1

      Not entirely. Another problem is that a research group may have a large body of code that is required to do research in an area. A new research group entering the area would currently have to duplicate all that code in order to be able to add to it. In my field, there are certain subfields that we won't even touch because it would take a year of coding to build up the necessary platform to be able to compete against established groups. If all code were open, anyone could download and begin improving/extending it. The result is that certain subfields that require a large body of background code to do study in have only a few players and no-one else can really enter the subfield without sacrificing a few years of publishing. This is bad, because then all research in an area is being done by a very small number of individuals and there isn't any cross pollination from other fields since the cost of entry is too high. The simple fix is to make the code open. Then anyone can make improvements.

      Also, as to verifiability. You can't spend a year writing code to verify someone else's results. That may be the utopian ideal, but in practice it never happens. You don't get papers out of spending a full year doing nothing but verifying that yes, the research was actually done correctly. Having open code would most definitely lead to more verification.

      Credentials: I am a doctoral student in the sciences.

    2. Re:That's all wrong by insufflate10mg · · Score: 1

      The scientists aren't being asked to release every piece of code in their repository, just the code they used to reach the conclusions they published.

    3. Re:That's all wrong by aflag · · Score: 1

      If not even you can reproduce your own research, what does it tell you about your metodology?

  20. Conspiracy? by Coolhand2120 · · Score: 2, Insightful

    Nobody said conspiracy, just plain crappy code. You don't need a conspiracy if you are "trying to prove" something, your crappy code spits out what you want to see and you run with it. You just need plain old incompetence.

    1. Re:Conspiracy? by obarthelemy · · Score: 4, Insightful

      Yes and no. Which assertion do you think more probable:

      1- "These are not the desired results. Check your code".

      2- "These are the desired results. Check your code".

      No conspiracy, but a conspiracy-like end result.

      --
      The Cloud - because you don't care if your apps and data are up in the air.
    2. Re:Conspiracy? by bunratty · · Score: 2, Insightful

      Let's think through what would really happen if scientists released their code. The code has bugs, as all code does. People with an ulterior motive would point to the bugs and say "Look here! A bug! The science cannot be trusted!" And millions of sheeple would repeat "Yes! The code has bugs! And therefore I refuse to believe it!" It won't matter whether the bugs are relevant to the science; the fact that there are any bugs at all will cause people who want to disagree to say there's doubt about the results. Meanwhile, they will go about their business using computer systems that are riddled with bugs, but function well enough the vast majority of the time they're not even aware of the bugs.

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    3. Re:Conspiracy? by crmarvin42 · · Score: 4, Insightful

      And then they fix the bug and either...

      A. The results change, thus indicating that the bug was important in some way. In this case, fixing the bug gained us not only silencing the critics, but improving our understanding.

      or

      B. The results don't change, thus indicating that the bug, while still a bug, was not important to the final result. In this case, we've fixed a bug that the critics were using as a banner, and that they were mistaken in it's importance. We don't get the improved understanding, but we do get a chance to politely say STFU to the more vocal/less qualified critics.

      Either way looks like win/win to me.

      --
      Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
    4. Re:Conspiracy? by xtracto · · Score: 3, Informative

      Agreed 100%.

      You would not believe the amount and crappy quality of the code performed during "research projects", specially when the research is in a field completely unrelated to Comp. Sci. or Soft. Eng.

      I have personally seen software related to Agronomy, Biology (Ecology) and Economics. The problem with a lot of that code is that sometimes researchers want to use the power of computers (say, for simulation) but do not know how to code, they then read a bit about some programming language and implement their program s they are learning.

      The result? you can imagine.

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    5. Re:Conspiracy? by bunratty · · Score: 2, Insightful

      From recent events, I think both A and B are wrong. When an error is pointed out in research that shows AGW is happening, people use that error as an excuse not to believe any research that AGW is happening, even years after the error is corrected. When an error is pointed out in the IPCC report about a minor effect of climate change, people use that error to doubt all effects of climate change. Correcting the errors or pointing out they don't change the results will not silence the critics. It will only make the critics claim that their opinion is being suppressed even though the science has been indisputably proven to be flawed and therefore cannot be trusted!

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    6. Re:Conspiracy? by Anonymous Coward · · Score: 0

      Right, with the new results reported months later, after most people have stopped paying attention. Even if the new results confirm the old ones, most people just remember "buggy code". I think it's called the Primacy Effect.

    7. Re:Conspiracy? by Anonymous Coward · · Score: 0

      There is another possible issue, though. Computers are not perfect mathematicians. Here is an example where we simply want to multiply 1.1 by itself.

      int main (int argc, char *argv[])
      {
                      float f = 1.1;
                      float f2 = f*f;

                      printf("%.10f * %.10f = %.10f\n", f, f, f2);
              return(0);
      }

      Expected result would be:
      1.1 * 1.1 = 1.21

      Actual results will be something more like this:
      1.1000000238 * 1.1000000238 = 1.2100000381

      This is why minding signficant digits are important. It turns out that computers have a finite limit on significant digits and poorly chosen mathematical algorithms could rapidly diminish the value of scientific computing, even when using "long double" values. These issues are worsened by problems that involve a large ranges or domains. Algorithms may be used that minimize these impacts.

      Today, it is often taken for granted that computers can compute precisely. The reality is that they still have finite precision.

    8. Re:Conspiracy? by crmarvin42 · · Score: 1

      The goal when talking with critics is to address only those willing to consider your evidence. You are talking about the much smaller, although much more vocal, group of skeptics unwilling to listen. They've decided, and no amount of evidence will change their minds.

      You could call me a skeptic. Originally, I doubted that GW was happening. Now I don't doubt that global temperatures have been increasing, at least in the last several decades, but I do doubt the Anthropogenic explanation for GW. The data has been validated to my satisfaction, but I require further convincing as to the conclusions regarding the cause. (Probably because I'm involved in science where I can control most of the factors in the design experiments, as opposed to being dependent mostly on observational data.) However, I am willing to listen and consider new evidence as it is presented to me. I am equally frustrated with those unwilling to listen, because they make it much harder for me to get a reasoned debate going. They hijack threads and poison the waters with those in possession of the facts I need if I want to make up my own mind.

      Furthermore, I work in an industry that is plagued by just as many doubters. Although, they are not quite as unified in what specifically they are objecting to. I've learned that some are not worth my time, because they made an emotional decision and no amount of facts can really match faith. However, the vast majority of those with no training and a lot of questions are willing to listen. With those individuals I've made a lot of headway. The trick is discerning, with a high degree of accuracy, which camp someone new falls into as quickly as possible.

      Nobody said science was easy, and nobody said that communicating science to laypeople would be problem free. If anything that makes it more important that we communicate effectively with those actually interested in a dialogue, as opposed to the monologues so popular with the kind of people you are describing.

      --
      Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
    9. Re:Conspiracy? by Anonymous Coward · · Score: 0

      But thats a slippery slope. Put yourself in the place of the person that you claim has an ulterior motive. Would you particularly like it if someone told you that "no you can't see the particular's because you'll nit-pick it and I know I am right". They may very well nit-pick and run with completely stupid results based solely on preconceived notions. You can't shield them from facts just because they might do this.

      The Best approach is to release it and in advance explain the bugs that you know are there. If they find something you didn't know about, great they helped you. You might have to explain how the data wasn't corrupted by this but that is better than hiding it completely.

      The goal is truth or fact. Not you being right.

    10. Re:Conspiracy? by khallow · · Score: 1

      The problem here is what happens when the error is not corrected? For example, the IPCC's "assessment" reports have always had aggressive summaries that don't reflect the science present in the report. A lot more people read those summaries. There's also the attitude that has determined prior to consideration of valid evidence that AGW is a serious problem and carbon emission reduction is the required solution. Solving non-problems (which is what AGW may be) is an error. That sort of error can impoverish billions (see 20th century style Communism for a similar example).

    11. Re:Conspiracy? by pavon · · Score: 2, Interesting

      Yes, there are stubborn idiots that will believe what they want regardless of the evidence. There are self-entitled people that complain no matter how good of a service you provide. There are unreasonable assholes in this world.

      However, since nothing I do will appease them, why should I give a moments consideration to them whatsoever? I am going to base my actions on what will best convince/serve the reasonable people, on top of what makes the best science. Hiding data and and not being responsive to criticisms is counterproductive to those goals.

      Case in point. The recent inclusion of data that had not been peer reviewed in the IPCC report didn't convince me that everything in the report was garbage, but it meant that everything in there had to be weighed on it's own merits, as I couldn't trust the vetting process done by the IPCC. It didn't discredit climate change itself, but it did undermine the ability of the IPCC to act as a credible distiller of the state of climate change research.

      These are the issues that you need to be concerned about, not how the ideologues and pundits are going to react.

    12. Re:Conspiracy? by Anonymous Coward · · Score: 0

      All computations have finite precision, but computers have arbitrary precision. Floating point math has a well-defined precision. It's not useful to use FP without also being assured that your own precision requirements are no more than that available through float, or you can track the imprecision in a parallel calculation and check if it's over a threshold. You can do everything the hard way, though, with as much precision as you please, to the limits of your memory capacity and the time you'll wait for results. Which is to say, algorithms may be used to *eliminate* these impacts.

      Computers ARE perfect mathematicians, delta hardware errors. No human could give you a better answer if forced to write it down in floating point form. In this case, you explicitly requested float in your C program, and suggested "long double" later.

      It's essentially the same problem as integer overflow.

    13. Re:Conspiracy? by Anonymous Coward · · Score: 0

      Sounds good, but thanks to modern media you have certain news outlet(s) that only have an interest in headlining your case A and completely sweeping case B under the rug and possibly vice versa.

      I'll venture to say Fox news comes to mind for ignoring case B.

    14. Re:Conspiracy? by Urkki · · Score: 1

      Computers ARE perfect mathematicians, delta hardware errors.

      ...and software bugs!

      So end result is, computers are far from perfect mathematicians.

      I'll give you that they're pretty good calculators, delta hardware errors and software bugs.

    15. Re:Conspiracy? by mrxak · · Score: 1

      I never attribute to malice what I can attribute to laziness. All the more reason to allow those who are feeling particularly not lazy to come along and do the code QA that the original coder (likely an untrained scientist with no CS background) didn't know how to do or didn't want to because there were more pressing concerns like getting a new grant.

    16. Re:Conspiracy? by Anonymous Coward · · Score: 0

      Let's think through what would really happen if scientists released their code. The code has bugs, as all code does. People with an ulterior motive would point to the bugs and say "Look here! A bug! The science cannot be trusted!" And millions of sheeple would repeat "Yes! The code has bugs! And therefore I refuse to believe it!" It won't matter whether the bugs are relevant to the science; the fact that there are any bugs at all will cause people who want to disagree to say there's doubt about the results. Meanwhile, they will go about their business using computer systems that are riddled with bugs, but function well enough the vast majority of the time they're not even aware of the bugs.

      By your logic, we shouldn't release the data you gathered either, based on the fear someone might not understand what a bad data point is and say "Look here! You're ignoring evidence!".

      Every argument I've heard in this thread boils down to:

      Do not look behind the curtain. Place your Faith and Trust in the established Order of Scientists.

      If I wanted that line of bullshit I'd go to church. If you aren't providing the facts, you're asking me to believe you based on faith, and that's not science, it's religion.

    17. Re:Conspiracy? by Anonymous Coward · · Score: 0

      The problem here is that it costs money to do things.

      So A would be, lost a ton of money and fixed the issues.

      B would be losing a ton of money for no reason.

      If there's a chance of senseless spending of resources, some people are going to be against it (and rightfully so, in their case).

      Personally, I just despise anything that involves money. I live a very sad life.

  21. Not possible. by MindlessAutomata · · Score: 1

    Many scientists get their code from companies or individuals that license it to them, much like most other software. They're not in the position to release the code for many experiments...!

    1. Re:Not possible. by insufflate10mg · · Score: 1

      What if the code they use has errors that affect the outcome of their experiments? What should be done? Let it slide?

  22. one error will invalidate a computer program?!?!? by Anonymous Coward · · Score: 2, Insightful

    As it is written, the editorial is saying that if there is any error at all in a scientific computer program, the science is usually invalid. What a lot of bull hunky! If this were true, then scientific computing would be impossible, especially with regards to programs that run on Windows.

    Scientists have been doing great science with software for decades. The editorial is full of it.

    Not that it would be bad for scientists to make their software open source. And not that it would be bad for scientists to benefit from some extra QA.

  23. Science isn't set up for "political research" by Anonymous Coward · · Score: 1

    and by that I mean climate science.

    Science suffers from methodological "flaws", which are really just the rational interest of the people involved. One of them is that scientists do not tend to disclose data, defended and explained several times by scientists under the pro-climate banner, as effectively that if they DID publish, then a) people would misuse the data, but more commonly and much more importantly (after all, most science isn't controlversial) b) that other scientists would use their data and programs and publish papers on the basis of them. A complex data set is hence the same as JOB INSURANCE. I kid you not, look up the statements yourself.

    Another is that advanced science with multiple obscure data sets needs advanced statistical knowledge, which by its very nature requires significant professional judgement. This is also obvious from having read the debates, e.g. the 700-page hacked analysis document.

    Now, in most science, none of these problems are really a problem. Firstly, the science is rarely _very urgent_. The scientists can therefore sit in an ivory tower and debate for a decade or so before the research "leaks" into the outside world, or is even required or wanted by the outside world. People who disagree can say "is that really the case though? look at this little piece her over in the corner which looks strange" and everyone can gather around for tea. Life-critical science is always checked by multiple independent people and in small scales as much as possible - like drugs research, for example. If data and models diffuses slowly, and models are subject to judgement, this doesn't have much of an impact in the long run, and can be worked out amicably and in preservation of everyone's dignity and efforts wholly within the "scientist sphere". Getting personally attached to a cause is also meaningless, because few causes provide enough job security or money to even risk allegations of misconduct.

    I feel it does represent a problem in the climate case though. One reason is that the science will impact _everyone_ before the data is fully comprehensive. There is no 'scientist sphere' or 'trial runs', all conclusions are implemented as soon as they are out. Secondly, the conclusions couldn't even be checked by nonscientists whom it would affect. Thirdly, the extreme passion and personal stakes in a strong climate movement makes me very skeptical that the professional judgement of statistical analysis has been exercised dispassionately and objectively (if that is even possible). Scientists who say "is this really the case?" are pillored. Pro-Climate people will say that "that's noe true, if anyone could disprove climate science they would be heroes!". Anti-climate people would respond that "much like proving takes a ton of work and hundreds of articles, so would disproving, and in the meantime their lives would be hell'.

    As a result, I am convinced that the scientific community and method we have is totally unsuited to research something as complex as climate science and make a conclusion within a few years. I don't want to change my life and reduce my consumption on the basis of what might well be bullshit - so it's either very painful enforcement against the will and good conscience of a lot of people, or, the 'data' for a catastrophe would only be conclusively found when it happens.

  24. Reality is incompatible with academia. by Anonymous Coward · · Score: 0

    People understand the theory behind science just fine. The problem, however, is that it is nothing but theory. And theory, like most things in academia, only works properly under a highly controlled "reality".

    In the real world, people need to eat. People need vehicles. People need clothing. People feel the need for luxury items. To get these things, people need money. To get money, the vast majority of people need to do something of value for somebody who already has money.

    When it comes to scientists, they need to get their funding. Much of that funding these days comes from corporations. Corporations are often in the business of fucking over other people. Poor science often helps them achieve this. Who provides this lousy science? Scientists, of course.

    But wait, you'll say that some scientists get funding from the government. That is true. But keep in mind that in most Western nations, the governments are selected by the "people" from a slate of candidates funded by a small number of corporations or industry groups. So even the scientists who get their funding from politicians end up having to create "research" that fulfills the needs of the corporations funding the politicians.

  25. Then give legal liability shield too by orzetto · · Score: 0, Troll

    The reason many researchers, especially climate scientists, are not so happy about divulging their models and data is that they can be sued by crackpots, as it has already happened. Even if they are proven right, a lawsuit is an expensive business. I can already imagine hordes of Exxon sockpuppets suing any random climate scientist they don't like.

    Granting immunity from lawsuit should make them more willing to share data. Anyway, if something really bad is found in the research, the researcher will have their reputation tarnished, which in the environment is bad enough to ruin a career.

    By the way: I am a researcher, and I attached the source code of my models in the PDF version of my PhD dissertation.

    --
    Victims of 9/11: <3000. Traffic in the US: >30,000/y
    1. Re:Then give legal liability shield too by azaris · · Score: 1

      The reason many researchers, especially climate scientists, are not so happy about divulging their models and data is that they can be sued by crackpots, as it has already happened.

      If it has already happened, what additional harm can come from disclosure? In the US you can sue anybody at any time for any reason whatsoever.

    2. Re:Then give legal liability shield too by insufflate10mg · · Score: 1

      Off-topic, would you mind explaining the point of your signature?

    3. Re:Then give legal liability shield too by LordLucless · · Score: 1

      At a guess, it's a comment on the relative impact of terrorism and road fatalities, especially in view of the legislative changes rammed through on the back of the former.

      --
      Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
    4. Re:Then give legal liability shield too by mrxak · · Score: 1

      Truth is a defense against libel. If you disclose all your code, and problems with your code are found and fixed, it's actually harder to be sued over your results.

  26. Price of technocracy by tjstork · · Score: 1

    IT's the price of a society that doesn't actually value the liberal arts and the technology. Studying the greeks and romans matters and you need to be a well rounded thinker.

    --
    This is my sig.
  27. It will never happen. by Anonymous Coward · · Score: 0

    As an example: Releasing automotive code falls clearly in the interest of public safety. Do you really think any company will release source code? If WOZ had the source for his Prius, the current problem might have had a happy ending.

    When people have confidence in their code, they release sources. When they are afraid of what might be hiding in their code, they lock it up.

    Very few have confidence, whether in industry or science.

  28. I concur by dargaud · · Score: 4, Interesting
    As a software engineer who has spent 20 years coding in research labs, I can say with certainty that the code written by many, if not most, scientists is utter garbage. As an example, a colleague of mine was approached recently to debug a piece of code: "Oh, it's going to be easy, it was written by one of our postdocs on his last day here...". 600 lines of code in the main, no functions, no comments. He's been at it for 2 months.

    I'm perfectly OK with the fact that their job is science and not coding, but would they go to the satellite assembly guys and start gluing parts at random ?

    --
    Non-Linux Penguins ?
    1. Re:I concur by Anonymous Coward · · Score: 0

      the code written by many, if not most, scientists is utter garbage.

      That is an understatement. I can vouch for it too. And the reason is perfectly clear: most scientific courses do NOT include any serious CS notions; quite the opposite. Professors usually teach the shoddy programming habits they were themselves taught decades ago, and thus perpetuate computer illiteracy. Many, if not most, still write fortran (and fortran 77 or DEC specific at that) in full spaghetti style and without any non-naive algorithms. Try telling them about pointers and balanced search trees or even hash tables, or structured code (not even OOP!) and see their faces.
      So, while it is perfectly understandable that, say, physicists can't spend 5 years learning CS, at the very least they should be made aware that it requires trained people to write sane code and that they must hand the job to specialists, and spend their valuable time doing what the're skilled at. And the same can be said about numerical analysis, btw: throwing off-the-shelf Monte-Carlo or Molecular Dynamics at anything cannot make for a lack of mathematical skills.

    2. Re:I concur by Rising+Ape · · Score: 2, Insightful

      > 600 lines of code in the main, no functions, no comments

      Does that make it function incorrectly?

      Looking pretty and being correct are orthogonal issues. Code can be well-structured but wrong, after all.

    3. Re:I concur by Rising+Ape · · Score: 3, Insightful

      >So, while it is perfectly understandable that, say, physicists can't spend 5 years learning CS, at the very least they should be made aware that it requires trained people to write sane code and that they must hand the job to specialists, and spend their valuable time doing what the're skilled at.

      And where will they get these specialists, and who will pay for them?

      Add the overhead of explaining exactly what the code is supposed to do, and the fact that the specialist won't know the physics purpose of it all, and I wouldn't be suprised if there were more errors this way, not fewer. Most science code is fairly short, so all the fuss about "structured programming" (or is it OOP these days?) isn't as important.

    4. Re:I concur by Anonymous Coward · · Score: 0

      Actually, they do hire CS people, if they have funding for it. If they don't have funding, they have to do it themselves. Money is not easy to get. Was that simple?

    5. Re:I concur by Peter+La+Casse · · Score: 1

      > 600 lines of code in the main, no functions, no comments

      Does that make it function incorrectly? Looking pretty and being correct are orthogonal issues.

      If nobody can tell if it's correct, it's worthless. "Looking pretty" is essential to be confident that it is correct.

    6. Re:I concur by jafac · · Score: 1

      Oh - lets not even get started discussing the difference in coding style between software industry professional developers, and computer scientists.

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    7. Re:I concur by backwardMechanic · · Score: 1

      Most science code is fairly short...

      Well said. And lots of code is really important for a month, and then never used again. It's just a tool for getting something done. Sometimes abusing your tools to do something quick is perfectly valid. There really is no point writing a UML chart for a quick-and-dirty simulator, designed to check if your idea is crazy or not.

    8. Re:I concur by Rising+Ape · · Score: 1

      I disagree, it's neither necessary nor sufficient. It may be helpful, but its value is rather overstated. I've seen nicely laid out code that was buggy and spaghetti that worked fine.

      Since *proving* correctness in the mathematical sense is impractical for most programs, the best approach is to make sure it's well tested, and to have multiple analyses with different people and code.

    9. Re:I concur by Krahar · · Score: 1

      The point is that the nicely laid out buggy code was more easily recognized for what it was: buggy. It was also more easily corrected. A bunch of spaghetti might only work when the sun is out - if you can't read it then you can't know what it really does. Testing is necessary but it is no substitute for understanding, which is just as necessary, and proper code structure aids understanding. It also aids the understanding of the person who wrote the code in the first place, so on those grounds it is indeed more likely to work correctly than some nasty 1000-line function.

    10. Re:I concur by Anonymous Coward · · Score: 0

      They will get specialists from... the programming industry, like everyone else. And they will explain the purpose of the code through specifications... just like everyone else.

      This is not a new phenomenon, except in the sciences apparently, where it's perfectly acceptable to have what amounts to amateur coders writing complex models, on the basis that they're 'smart enough' to know what they're doing (and nobody else could possibly comprehend it).

      That's not how any other industries operate in a best practices sense, and those who do are labelled 'cowboys', and avoided where possible. I wonder how much of this is the scientists ivory tower / ego thing, which is apparent in every single reply from a scientist in this thread (and other's I've read).

  29. On what basis? by tjstork · · Score: 1

    The reason many researchers, especially climate scientists, are not so happy about divulging their models and data is that they can be sued by crackpots, as it has already happene

    On what basis of damages can a researcher be sued?

    --
    This is my sig.
    1. Re:On what basis? by orzetto · · Score: 1

      Libel, for instance.

      --
      Victims of 9/11: <3000. Traffic in the US: >30,000/y
    2. Re:On what basis? by tjstork · · Score: 1

      But that's not suing over science. One doctor attacked another. Whose right? Let the courts decide.

      “I never intended any specific damage to Tim Ball’s reputation,” Dan Johnson said today. “But climate change is a critical global issue and I thought it was important to set the record straight. If people want to argue the science, I’m all for that, but Tim Ball was claiming expertise and specific credentials that he does not have. That needed to be corrected.”

      Essentially, it was two guys doing a hatchet job on each other.

      --
      This is my sig.
    3. Re:On what basis? by orzetto · · Score: 1

      Essentially, it was two guys doing a hatchet job on each other.

      No, it was an industry shill trying to intimidate a respectable scientist who had caught him in a lie through legal threats.

      --
      Victims of 9/11: <3000. Traffic in the US: >30,000/y
  30. Observations... by kakapo · · Score: 4, Informative

    As it happens, my students and I are about to release a fairly specialized code - we discussed license terms, and eventually settled on the BSD (and explicitly avoided the GPL), which requires "citation" but otherwise leaves anyone free to use it.

    That said, writing a scientific code can involve a good deal of work, but the "payoff" usually comes in the form of results and conclusions, rather than the code itself. In those circumstances, there is a sound argument for delaying any code release until you have published the results you hoped to obtain when you initiated the project, even if these form a sequence of papers (rather than insisting on code release with the first published results)

    Thirdly, in many cases scientists will share code with colleagues when asked politely, even if they are not in the public domain.

    Fourthly, I fairly regularly spot minor errors in numerical calculations performed by other groups (either because I do have access to the source, or because I can't reproduce their results) -- in almost all cases these do not have an impact on their conclusions, so while the "error count" can be fairly high, the number of "wrong" results coming from bad code is overestimated by this accounting.

    1. Re:Observations... by Anonymous Coward · · Score: 1, Insightful

      Exactly this,
      In my field (Hydrology and Statistics) writing the code in its final form is a long process of experimentation with different approaches and tests. Publishing it before one is truly done with a subject is the same as inviting other people to "scoop" you.

      Also, many people seem to fail to understand the protectionism of scientists. Of course we like to build on other peoples results, and see the field grow. However, if we make this too easy (like handing them our code), they just might scoop us. Hence, we make it a little bit harder to ensure that we can feed ourselves as well.

      Finally, about the errors. I have yet to see a single piece of error-free scientific code, however, the results are rigorously tested with an array of tests. The chances of these all coming up the same over a coding error is small.

    2. Re:Observations... by quadelirus · · Score: 1

      I think conferences and journals should require complete code submission alongside paper submissions. Then the code won't be published unless the paper is published, but we will create a more open and honest system and stimulate more advancement (since other groups can then build on top of a good sturdy platform instead of always having to start from scratch to build up their own code base).

    3. Re:Observations... by Anonymous Coward · · Score: 0

      > (and explicitly avoided the GPL)

      this is probably the most boneheaded decision you will make this year.

      read the damn thing instead of relying on biases.

      you still own the copyright to your code and can, in parallel, sell it without restriction to the highest bidder. but 3rd parties can not.

      long live the GPL and the scientific method, they go hand in hand.
      rah rah

  31. Code isn't good enough. by FlyingBishop · · Score: 2, Interesting

    Back in college, I did some computer vision research. Most people provided open source code for anyone to use. However, aside from the code being of questionable quality, it was mostly written in Matlab with C handlers for optimization.

    In order to properly test all of the software out there you would need:

    1. A license for every version of Matlab.
    2. Windows
    3. Linux
    4. Octave

    I had our school's Matlab, but none of the code we found was written on that version. Some was Linux, some Windows, (the machine I had was a Windows box with Matlab) consequently we had to play with Cygwin...

    I mean, basically, you need to distribute a straight-up VM if you want your results to be reproducible. (which naturally rules out Windows or Matlab or anything else proprietary being at the core.)

    1. Re:Code isn't good enough. by shabtai87 · · Score: 1

      Well, you can't really make someone code in what's convenient for you, at least all their code was available, so someone with the money to get the license could test it. You don't see physicists handing out large hadron colliders (or any other piece of large expensive machinery) with their results so that you can have fun with them in your basement... The point is that they did everything in their power to make you able to replicate their results.

      --
      @humanity: *facepalm*
    2. Re:Code isn't good enough. by wfolta · · Score: 1

      Yes, that's one of the things I dislike in Machine Learning and Computer Vision: the lingua franca seems to be Matlab. Ugh. An ugly, primitive language with a culture that seems to value Perlesque code obfuscation and the proprietary lock-in you get with Matlab. Octave is helpful, and at least if the Matlab code is published you stand a chance of reproducing the experiment in a reasonable alternative. I'm the odd man out, though, as I've used R throughout graduate school, even for ML and Biometrics classes.

  32. all by rossdee · · Score: 1, Insightful

    So if scientists use MS Excel for part of their data analysis, MS should release the source code of Excel to prove that there's no bugs in it (that may favour one conclusion over another)
    Soumds fair to me.

    And if MS doesnt comply then all scientists have to switch to OO.org ?

    1. Re:all by vlm · · Score: 1

      So if scientists use MS Excel for part of their data analysis,

      If they use excel, they should have their PHDs/Tenure revoked.

      An absolute complete piece of dung for data analysis.

      Its on the order of using micrometers as c-clamps, or bypassing safety switches, or not wearing goggles in the chem lab.

      And if MS doesnt comply then all scientists have to switch to OO.org ?

      And the problem would be... Um... Trust me its a lot easier to switch from old office to OO.org than from old office to new office.

      --
      "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    2. Re:all by John+Hasler · · Score: 1

      > So if scientists use MS Excel for part of their data analysis...

      Then their results are suspect since Excel is known to be unreliable.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    3. Re:all by mjwalshe · · Score: 1
      >>If they use excel, they should have their PHDs/Tenure revoked."

      Quite, if you want to do real analysis use mathcad or spss.

  33. social.... science. by Anonymous Coward · · Score: 0

    Sure it's nice for peers to review, but to make it a mandatory thing? I thought science was based on rules with the assumption there are no rules.

    I wonder what Newton, Einstein, Kepler, or Goddard would have thought if they were demanded to follow this type of review. Really. Peer review has become the Java of the research world if you know what I mean (the solution for everything)--there should be multiple forums for informal discussion/verification, formal proposals/verification.

  34. Re:Why release it? by INT_QRK · · Score: 1

    You're right, Occam's Razor. Conspiracy is generally too hard, even if you know what you're doing. Who needs conspiracy? Group-think, socio-political cliques, popular public funding streams, fashion, peer pressure, yearning for acceptance by an in-crowd. Know what really brought the US to its knees in Viet Nam? Hippy Chicks. Wanted to get laid? You were anti-war.

  35. Also true for CS research by DoofusOfDeath · · Score: 2, Interesting

    I'm working on my dissertation proposal, and I'd like to be able to re-run the benchmarks that are shown in some of the papers I'm referencing. But must of the source code for those papers has disappeared into the aether. Without their code, it's impossible for me to rerun the old benchmark programs on modern computers so that I and others can determine whether or not my research has uncovered a better way of doing things. This is very far from the idealized notion of the scientific method, and significantly calls into question many of the things that we think we know based on published research.

    1. Re:Also true for CS research by Anonymous Coward · · Score: 0

      I also conduct research in CS, and it seems this topic pops up once in a while and never seems to have a definite answer. Minimally, when we publish work we include the algorithm or approach used to obtain the results. We also keep mandatory data backups for source code, documentation, data, etc in our lab servers. We occasionally get requests for access to these files, however, as others have mentioned, not everything is well intended. There are no guarantees that other researchers will not trash your code or data, or even do minimal modifications and push it as their own research (or for comparison purposes). With so many independent journals, conferences, and workshops, I would not be surprised if many researchers that do post their work freely are hurting themselves in the process too. In addition, others have brought up the issue of a single error in a program invalidating all results. This generalization, however, may not necessarily be true. Errors are not errors are not errors. An error in the model of what you are trying to solve may definitely invalidate results, however, an error in a display module may or may not necessarily have the same effect. There is a reason why stronger papers typically conduct a battery of experiments and present statistical measures to draw conclusions with some meaning. After all, isn't this the same issue with other physical sciences, where you always run into measuring errors and variances in the observed results?

  36. fear over fact by xzvf · · Score: 1

    Humans are hardwired for fear and have to learn to think factually. Like most scientific issues that become political, fear and misinformation dominate over political fact. There will always be a certain segment of the population that believes vaccines cause autism and global warming is a trick to tax us with cap and trade. With vaccines you wait for the kids of autism avoiders to die of measles and polio. With global warming change the message from tax to disincentive to tax credit to incentive. Energy independence (make it a defense issue), tax credits for solar and wind that make the payback for a home owner less than a decade (I suspect a five year payback will get homeowners and home builders forking over for energy improvements).

    1. Re:fear over fact by INT_QRK · · Score: 1

      "Political fact"? Freudian slip?

  37. Not a good idea by petes_PoV · · Score: 5, Insightful
    The point about reproducible experiments is not to provide your peers with the exact same equipment you used - then they'd get (probably / hopefully) the exact same results. The idea is to provide them with enough information so that they can design their own experiements to [b]measure the same things[/b] and then to analyze their results to confirm or disprove your conclusions.

    If all scientists run their results through the same analytical software, using the same code as the first researcher, they are not providing confirmation, they are merely cloning the results. That doesn't give the original results either the confidence that they've been independently validated, or that they have been refuted.

    What you end up with is no-one having any confidence in the results - as they have only ever been produced in one way and arguments thatt descend into a slanging match between individuals and groups of vested interests who try to "prove" that the same results show they are right and everyone else is wrong.

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    1. Re:Not a good idea by Anonymous Coward · · Score: 0

      I have to agree, the best way to see if a program is calculating incorrectly is to have a second program doing the same calculation a different way.

    2. Re:Not a good idea by shabtai87 · · Score: 1

      What about comparison? Sometimes the goal isn't to reprove the same concept but to test speed/efficiency etc. In these cases the results should be able to be cloned so that one person can say "A is better than B" without the person who proposed "B" sitting up and saying, "well you must have run that on a bad machine" or "your code for our algorithm was obviously not as optimized as ours". Even so in some cases at least the raw data should be provided (garbage in/garbage out).

      --
      @humanity: *facepalm*
    3. Re:Not a good idea by John+Hasler · · Score: 1

      > What you end up with is no-one having any confidence in the results - as
      > they have only ever been produced in one way...

      Software does not produce results. Experiments produce results. Software assists with the math required to analyze those results. We are asking that you publish all the steps in your analysis. That means publishing the software.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    4. Re:Not a good idea by Anonymous Coward · · Score: 0

      I think you are missing the point of putting the code out there. The point is not, as you suggest, so that others can run your software and see the same results. This is more akin to writing up your experimental methodology so the reader can judge if the experiment was performed correctly, and can make their own assessments if the methodology introduced any potential experimental errors or confounds. In the case of software, it would involve assessing the quality of the code and looking for places where the software itself may have introduced errors, not running the code and seeing if the same results come out (though if it doesn't that is even more damning).

      Of course, the secondary reason to share the source code is that it is frequently part of the product of the research, and could be used by others to take the research further, etc... However, that is not the main thrust of this article.

    5. Re:Not a good idea by Rising+Ape · · Score: 1

      If they use an oscilloscope, do they have to publish full details of its internal structure? Or can we just assume that it behaves in a certain, standard way?

      Similarly, with code, what's published is a description of the process that the code implements. Then anyone can go off and write their own code to do the same thing. Results should not depent on a particular implementation - that would be quite as silly as demanding that the analysis should be done by one particular individual.

    6. Re:Not a good idea by quadelirus · · Score: 1

      The problem with what you are saying is that a code base for a single experiment may take years to write. No one will ever validate that result if they have to start coding from scratch. Somewhat faulty verification based on having an open code base is much better than no verification at all. The code should always be open.

      Credentials: A doctoral student and coder in a scientific field.

    7. Re:Not a good idea by Goalie_Ca · · Score: 1

      I find that scientists dont often bother replicating results because their funding is to do new stuff.. not old stuff. It can be very costly and time consuming to thoroughly review.

      --

      ----
      Go canucks, habs, and sens!
    8. Re:Not a good idea by petes_PoV · · Score: 2, Insightful

      Experiments produce results

      Errrm, experiments produce data. It's the analysis of that data plus the insight and knowledge of the analysts and scientists that turn it into results. The problem is that if everyone uses the same software they'll never notice any systemic failures in the processing it performs.

      --
      politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    9. Re:Not a good idea by dj_tla · · Score: 1

      By your logic, no one should have access to a programming language because what if that language is wrong. A ton of scientific research is done in Matlab, yet you don't hear anyone complaining that code produced in Matlab may not be correct because you aren't looking at each individual bit of information as it is processed.

      The idea of providing your code, in fact, solves the exact problem that you're describing. The point of research is not to reproduce results -- certainly, that's a first step in many lines of research, but it by itself is not a result. However, if you find that the code used to produce some results is incorrect, then you do have a result, and this furthers that field of research immensely, because no longer are a string of papers going to build on an incorrect result. Everyone can be confident in a result and move on from that point.

      Your line of reasoning may make sense when you talk about "equipment" in terms of sensors and proprietary hardware, but when we're talking about software that runs the same on whatever hardware you're using, being able to reproduce a result is only a first step. Shortening that step and allowing everyone to scrutinize how you did it only catalyzes further research.

    10. Re:Not a good idea by DwySteve · · Score: 1

      If they use an oscilloscope, do they have to publish full details of its internal structure? Or can we just assume that it behaves in a certain, standard way?

      Are you purposefully putting a strawman out there so you can 'prove' you're right? Do it the same way you do it in industry: document the model number of the oscilloscope, the firmware revision and every important setting you can get your hands on. Then someone can buy the same scope, put the same firmware on it, load the same settings and get the same results. If they don't there's two options: you left something out of your description (purposefully or not) or there's some voodoo happening on the manufacturer's part.

      Documentation is not rocket surgery. It's standard practice everywhere people don't like wasting their time. Who's the bad person in this exchange:

      Player 1: 'Hi, I used the same data you did and the same algorithm but I didn't get the same results you did. Do you know what version of PROGRAM you used?'
      Player 2: 'Uh, nope'
      Player 1: 'Well, do you have the test setup you used so we can take a look?'
      Player 2: 'Naw, the university tore it down to use it for someone else after I was done.'
      Player 1: 'So... you couldn't even reproduce the results if you wanted to?'
      Player 2: 'LOL NOPE BUT I'M SO RIGHT!111'

      --
      http://angryee.blogspot.com
    11. Re:Not a good idea by khallow · · Score: 1

      The point about reproducible experiments is not to provide your peers with the exact same equipment you used

      If that could be done easily and cheaply, then it would be required that you provide the experiment as it was run as part of your publication. Imagine if you will, that you could put on page 34 of your paper, the lab set up, not a facsimile or drawing, the actual lab as it was set up when you were gathering data and the very actions you followed as you went. All someone had to do was take the lab out of the paper and watch the experiment unfold exactly.

      That's what you can do with computer programs and why they should be included as part of any publication.

    12. Re:Not a good idea by mrxak · · Score: 1

      Other scientists shouldn't be using the same software to reproduce the results. That's pretty obvious.

      What we're talking about here is an added layer of assurance that the initial results are as good as they can be. Having an independent auditing of code means that the methodology described in a paper is the one actually used in the software. It fixes simple errors, and ensures that there's no fraud. The programmers who would do this are not being taken away from creating new scientific research, because they're not scientists, nor are they competing for the same research grants. What's more, it's free QA that keeps the science more scientific.

      You have a whole lot of coders looking at the code and making sure it's a perfect implementation of the model or algorithms described by the scientists, and it won't matter if somebody uses the same code to reproduce the results, because you'll know the software is working perfectly. All that's left then is for other scientists to criticize the methodology and assumptions of the research as part of the normal peer review process. It turns the code into just another tool at a scientist's disposal.

      We're talking about science here. The whole point of science is challenging something repeatedly and thoroughly, and if it stands up, you know it's true. Adding more challenges is the most scientific thing you can do.

    13. Re:Not a good idea by Rising+Ape · · Score: 2, Insightful

      >Do it the same way you do it in industry: document the model number of the oscilloscope, the firmware revision and every important setting you can get your hands on.

      There's no reason on earth why you'd even do that. Just say "the voltage was measured to be x +- y". Results from science experiments should *not* depend on specifics of equipment any more than they should depend on a specific scientist. In fact, the wider the variety of equipment, code and analysis methods used to measure the same thing, the better - it makes the result more robust.

      In your example, both people should recheck their results independently, perhaps try different methods, even do another experiment.

      There are some situations where seeing the code is useful, but only after all other methods to reproduce the result have failed. Sharing code is just inviting common errors.

      In your hypothetical scenario below, the result could be reproduced by writing a new program to do the same thing.

    14. Re:Not a good idea by John+Hasler · · Score: 1

      > The problem is that if everyone uses the same software they'll never notice
      > any systemic failures in the processing it performs.

      I'm not suggesting that everyone use the same software. I'm suggesting that everyone have the opportunity to examine everyone else's software.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    15. Re:Not a good idea by ralphbecket · · Score: 1

      Hmm, you should contact the econometrics journals which require all code and data to be submitted alongside articles.

      As I understand it, even Nature requires something similar, although for some reason climate scientists seem to be given a free pass.

  38. Recent example Keith Baggerly vs Duke Clin. Trials by bloosqr · · Score: 1

    If you ever get a chance take a look at some of Baggerly's (MD Anderson / bioinformatics/stats) analysis of the number of rather embarrassing mistakes were used in developing genomic biomarkers used for a clinical trial at Duke. He has been giving talks around at stats conferences (and pharma's about this), its one of the best talks i've heard in recent years. But what it boils down to is the analysis (and input) programs used by Duke had a series of fundamental mistakes in it causes the results to be incorrect leading to an incorrect conclusions which unfortunately lead to a series of clinical trials which certainly should not have happened. After Baggerly attempted to respond negatively to the original series of articles being posted he reposted in a stats journal and basically got the clinical trial shut down. For slashdot readers, one of the rather many egregious mistakes here was the analysis program used has in its instructions the need for a header line, the input the Duke researchers used did not include a header line causing a shift in the results with regards to their input. My understanding is nature medicine refused to publish baggerlies initial correspondence with full details as it was "too negative" so he published in a stats journal which then got the critical coverage to shut everything down..

    Here are some random links

    Here is the original Potti genomics article:
    http://www.nature.com/nm/journal/v12/n11/abs/nm1491.html

    Here is one of the baggerly nature medicine letters describing what is wrong in summarized form:

    http://www.nature.com/nm/journal/v13/n11/full/nm1107-1276b.html

    here is the halt of the trials :

    http://cancerletter.com/tcl-blog/copy108_of_whats-going-on-with-nih

    http://cancerletter.com/tcl-blog/copy111_of_whats-going-on-with-nih

  39. Re:Why release it? by Foolicious · · Score: 0, Offtopic

    Since you brought up the socially inept idea, I might also suggest taking a look at Bic's razor or Gillette's razor.

    --
    Please don't use "umm" or "err" or "erm".
  40. Don't use that word by Anonymous Coward · · Score: 1, Insightful

    ...so science becomes an act of creativity by which one tries to create a cohesive narrative of process that arrives at the desired result.

    As someone who listens to Talk Radio on occasion, that sounds like you're creating a work of fiction. Rush and Hannity would have a whole week of shows based on that statement.

    I would put it more like "piecing the narrative from the evidence" or "from facts" or something like that.

    Scientists need to realize that if they're going to get public support, they really need to be very careful with their choice of wording. Like it or not, the scare mongers, and I mean scare mongers in the sense that there are people who are trying to scare folks into believing that Global Warming is some sort of wealth redistribution scheme by the socialists, are going to use any hint, real or not, that scientists are making up their findings.

    1. Re:Don't use that word by harvey+the+nerd · · Score: 3, Interesting

      Real scientists don't use simulators with incomplete equations and fudge factors to match highly manipulated historic data to "prove" their case with game machines that have no predictive capability or other external validation. That simply is not the way you build a valid fundamentals based model starting from the equations of motion. IPCC reports previously noted whole terms in the equations' energy terms that were inadequately described or represented, then have done no research to fill the terms, modellers just zeroing them out or putting in small constants for significant *variables*. These are not real scientists, their processes and practices have been clearly shown to be antithetical to valid science.

      These models are just primitive speculative tools, often reflecting personal biases in data selection and derivation, NOT fundamental equations. The models are NOT valid physics data or experiments.

      On prediction failure, Hansen's 1988 "A,B,C" forecasts of rising temperature are rapidly diverging from the cooling we are actually experiencing right now, where case C assumed we massively limited CO2 also. Missed the side of a barn with a shotgun, tsk, tsk, tsk.

    2. Re:Don't use that word by khallow · · Score: 1

      Like it or not, the scare mongers, and I mean scare mongers in the sense that there are people who are trying to scare folks into believing that Global Warming is some sort of wealth redistribution scheme by the socialists

      Truth is a terrible weapon in the arsenal of the scare-monger. For example, consider the carbon emission credit markets in Europe. People are complaining because they don't cost enough to force businesses to change over to non-carbon emitting technologies. Ideology remains to them more important than the process. And where does the money from these credit sales go? That's wealth redistribution by a government, classic symptom of socialism.

      The point ultimately though is that you are absolute right. There are huge stakes and plenty of powerful special interests. Sloppy terminology will be exploited by someone. But my view is that it is fundamentally a losing strategy to obsess over making bad sound bites. If you have to speak often in public, you'll make them. Good science will eventually trump propaganda.

    3. Re:Don't use that word by ArcherB · · Score: 4, Interesting

      Scientists need to realize that if they're going to get public support, they really need to be very careful with their choice of wording. Like it or not, the scare mongers, and I mean scare mongers in the sense that there are people who are trying to scare folks into believing that Global Warming is some sort of wealth redistribution scheme by the socialists, are going to use any hint, real or not, that scientists are making up their findings.

      Scare mongers? Let's take a look at some of these "hints" that scientists are making up their findings. From May 7, 2002

      Dozens of mountain lakes in Nepal and Bhutan are so swollen from melting glaciers that they could burst their seams in the next five years and devastate many Himalayan villages, warns a new report from the United Nations.

      From January 17, 2010:

      In the past few days the scientists behind the warning have admitted that it was based on a news story in the New Scientist, a popular science journal, published eight years before the IPCC's 2007 report.

      It has also emerged that the New Scientist report was itself based on a short telephone interview with Syed Hasnain, a little-known Indian scientist then based at Jawaharlal Nehru University in Delhi.

      Hasnain has since admitted that the claim was "speculation" and was not supported by any formal research.

      Do I need to pull the quotes that claim NY and Florida will be underwater?

      As for the "fear mongers" saying that GW is a socialist wealth redistribution scheme.

      Some officials from the United States, Britain and Japan say foreign-aid spending can be directed at easing the risks from climate change. The United States, for example, has promoted its three-year-old Millennium Challenge Corporation as a source of financing for projects in poor countries that will foster resilience. It has just begun to consider environmental benefits of projects, officials say.

      Industrialized countries bound by the Kyoto Protocol, the climate pact rejected by the Bush administration, project that hundreds of millions of dollars will soon flow via that treaty into a climate adaptation fund.

      Strange. When did Rush and Hannity start writing for the NY Times?

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
    4. Re:Don't use that word by bunratty · · Score: 3, Informative

      On prediction failure, Hansen's 1988 "A,B,C" forecasts of rising temperature are rapidly diverging from the cooling we are actually experiencing right now, where case C assumed we massively limited CO2 also

      In the past ten years, we've seen warming of 0.18 degrees Celsius, which is less than the 0.25 degrees Celsius that was predicted, but it certainly hasn't been cooling. This is why the Arctic ice and Antarctic ice are melting. Yes, stop the presses, the globe is warming!

      --
      What a fool believes, he sees, no wise man has the power to reason away.
    5. Re:Don't use that word by enstrophy · · Score: 1

      I'm curious, which of the references above are from scientists?

    6. Re:Don't use that word by ArcherB · · Score: 1

      I'm curious, which of the references above are from scientists?

      Well, let's see. The first two deal with the IPCC and the third deals with politicians. Hmmmm. Looks like you right. There are no scientists there.

      It's a shame that these are the people will make the policies that will shape our lives. These are also the same people who pay for the science and decide what science gets done. Hmmmm. Let's see. People who love power are paying for science that gives them more power. What could possibly go wrong? And people will gladly hand over your rights to them for the "security" they do not deserve (according to B. Franklin).

      --
      There is no "I disagree" mod for a reason. Flamebait, Troll, and Overrated are not substitutes.
    7. Re:Don't use that word by SETIGuy · · Score: 1

      We're not experiencing cooling right now (unless you think a single cooler year means we're in a long term cooling trend). That you think we are shows that you're getting your data from biased sources.

      Just because it's colder today than it was yesterday doesn't mean spring is canceled.

    8. Re:Don't use that word by harvey+the+nerd · · Score: 1

      Biased sources like satellites, instead of pre-selected urban heat islands.

    9. Re:Don't use that word by SETIGuy · · Score: 1

      Yeah, satellite measurements seem really different from ground based ones... http://en.wikipedia.org/wiki/File:Satellite_Temperatures.png

  41. engineering or science by Anonymous Coward · · Score: 0

    For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C.

    Ah, classic example of the difference between s/w engineering and s/w development. Problem is it's hard to tell and engineer to thinking like a scientist and vice versa.

  42. Well known problem... by Wdi · · Score: 1

    The code quality of many well-known scientific software packages is abysmal.

    In chemistry, you should at least expect that the outcome of descriptor computations on a set of molecules is independent of the order of atoms and bonds in a molecule, and the order of file records.

    Well, this is disturbingly often not the case, as we discovered in a recent study.

    In an attempt to raise awareness of this problem, we have launched a public Web-accessible computational result verification service (http://www.xemistry.com/cv). A poster explaining this app and some background, including sample test results, can be found at http://www.xemistry.com/Presentations/verifier_panel_2009.pdf.

    Unfortunately, the worst application we have encountered so far appears to be a standard tool for adding Wikipedia data for chemicals, systematically poisoning it with incorrect data.

  43. Re:one error will invalidate a computer program?!? by alan_dershowitz · · Score: 1

    That statement was kind of breathless, but the study he was citing focused on bugs that specifically affected the accuracy of the output, and found that they were a common occurrence. I agree with the author, if you are going to use a computer program to get results, you need to publish code otherwise your methods are packaged in a black box. A lot of people don't want to do this because scientific code is not usually done by people we can say are knowledgeable in how to write reliable, verifiable code. It's usually a pieced together means to an end. Not that there's anything wrong with that, IF it can be available for verification. I HATE reading studies that for example constantly refer to a dataset and then never give you the dataset. I guess unlike many people I don't naturally trust the authors to be perfect.

  44. What about McIntyre's faulty data? by Anonymous Coward · · Score: 1, Interesting

    What about McIntyre's faulty data?

    Ah, no FOIA there, because he's toeing the party line.

    Note: He's not the only denial ditto who refuses to release his code:

    http://www.realclimate.org/index.php/archives/2009/12/please-show-us-your-code/

    Oh, the meeja is quiet about that, isn't it...

    1. Re:What about McIntyre's faulty data? by Anonymous Coward · · Score: 0

      You didn't get the memo? We don't post links to realclimate anymore. It's an opinion blog filled with lies where no dissident is accepted.

  45. Re:Why release it? by jgtg32a · · Score: 1

    Who said it was a conspiracy of a thousand people? Isn't the claim that the ipcc reports are written by thousand people? That's not exactly true. The actual report was written by something like 50 people the thousand number comes from the supporting data. There were a lot of groups of people who did independent research that showed "whatever", and the "whatever" supports the "conclusion" so it is used, and things that don't support the "conclusion" are suppressed.
     
    Now we have a fancy report, we pass that around, and it is solid and everyone agrees with what they see

  46. Nothing to do with CS by nten · · Score: 2, Insightful

    I am suspect of the interface reference. Are they counting things where an enumeration got used as an int, or there was an implicit cast from a 32bit float to a 64bit one? From a recent TV show "A difference that makes no difference is no difference." Stepping back a bit there will be howls from OO/Functional/FSM zealots that look at a program and declare its inferior architecture, lack of maintainability etc. indicate its results are wrong. These are programs written to be run once to turn one set of data into a more understandable and concise one. A truth test set run through it is good enough, they don't need iso compliant, triply refactored, perfectly architectured code to get the right answer. I don't think any of my CS proffs would have cared about such inane drivel they barely paid attention to what language we each picked to solve the assignment in. My software engineering proff would have yelled about comment density and coding standards compliance, but I consider that a different discipline primarily applicable to widely used and/or safety critical code.

    *However*
    Keeping track of digit precision through a calculation isn't CS, its fundamental grade school science. That is only one step from forgetting to do unit analysis for a sanity check. If they are forgetting that, they are probably also not looking at numerical conditioning, or trying to get by with doubles when they need bignums. None of this is CS egocentrism, its stuff we learn in math and science courses.

    --
    refactor the law, its bloated, confusing and unmaintainable.
  47. This should probably be tagged RANDU by Anonymous Coward · · Score: 0

    After the flawed pseudorandom number algorithm whose use may have invalidated quite a few statistical simulations.

  48. Why the fuck do people increasingly do by xtracto · · Score: 1

    start their posts in the title? it make it seems as if they do not think what they are going to write.

    On topic with the article, I completely agree with this "release the scientific code" position. I am currently working within a EU project in which we are developing ABM*. In my project it was made clear from the beginning that the code license will be GPL.

    However the place where I work has some program they have used to conduct simulations, these programs are complete closed (only a handful of people have access to the code, everyone within the institute). Nevertheless, simulations performed with such program has been used for several publications (journals, congresses, symposiums and even PhD thesis!).

    And people in my line of work wonder why simulations are not taken more seriously (e.g. accepting papers) by people in more "classical" research fields.

    --
    Ubuntu is an African word meaning 'I can't configure Debian'
  49. Not that simple by khayman80 · · Score: 3, Interesting

    I'm finishing a program that inverts GRACE data to reveal fluctuations in gravity such as those caused by melting glaciers. This program will eventually be released as open source software under the GPLv3. It's largely built on open source libraries like the GNU Scientific Library, but snippets of proprietary code from JPL found their way into the program years ago, and I'm currently trying to untangle them. The program can't be made open source until I succeed because of an NDA that I had to sign in order to work at JPL.

    It's impossible to say how long it will take to banish the proprietary code. While working on this project, my research is at a standstill. There's very little academic incentive to waste time on this idealistic goal when I could be increasing my publication count.

    Annoyingly, the data itself doesn't belong to me. Again, I had to sign an NDA to receive it. So I can't release the data. This situation is common to scientists in many different fields.

    Incidentally, Harry's README file is typical of my experiences with scientific software. Fragile, unportable, uncommented spaghetti code is common because scientists aren't professional programmers. Of course, this doesn't invalidate the results of that code because it's tested primarily through independent verification, not unit tests. Scientists describe their algorithms in peer-reviewed papers, which are then re-implemented (often from scratch) by other scientists. Open source code practices would certainly improve science, but he's wrong to imply that a single bug could have a significant impact on our understanding of the greenhouse effect.

    1. Re:Not that simple by Anonymous Coward · · Score: 0

      Fragile, unportable, uncommented spaghetti code is common because scientists aren't professional programmers.

      There's a large amount of scientific code that's written to be run once, for a one-off analysis for which there will never be an exact duplicate. Why would the author go to the trouble of finding an optimal solution when a brute-force method is faster in meat-time?

    2. Re:Not that simple by mrxak · · Score: 1

      Hopefully your experiences will cause you to use OSS to begin with instead of using proprietary code in the future. It may be a pain now, and I hope you keep at it. In the long run, it'll be worth it.

    3. Re:Not that simple by wfolta · · Score: 1

      Scientists describe their algorithms in peer-reviewed papers, which are then re-implemented (often from scratch) by other scientists.

      I recently went through this exercise for a graduate class in biometrics. In researching a particular fingerprint-evaluation algorithm, I found 6 papers that had an algorithmic description, and they all disagreed. Eventually, I could see that 5 out of the 6 were due to various kinds of typos. They simply would not work as written.

    4. Re:Not that simple by Anonymous Coward · · Score: 0

      In the part of NASA I'm familiar with, data is publicly released along with software tools.

      I searched for GRACE and found that they make data available: http://www.csr.utexas.edu/grace/asdp.html.

    5. Re:Not that simple by khayman80 · · Score: 1

      Yes, but I'm not using the vanilla level 1-b or level 2 data products. I'm using accelerations that have already had background models like FES2004 and various dealiasing products subtracted. That data was only given to me when I promised not to share it or publish papers on certain topics using it.

  50. Maybe it's my Berkeley roots: we release source by PeterM+from+Berkeley · · Score: 1

    All the software I and my team write is version controlled (CVS and SVN) and releasable (some of it with export restriction.) I do admit that our software engineering is not the best, but we also have a rule that code cannot be committed before an extensive suite of tests is run on the code (for our main scientific application--'helper' tools are not so tightly controlled.)

    Our scientific colleagues, provided they can satisfy the export control restrictions, can get source code and poke around all they want. Some have even contributed back valuable changes, and most have contributed back valuable feedback.

    In fact, we look down upon colleagues who do not release source code. Are you really doing serious science if others cannot delve into your methods?

    --PeterM

  51. The problem is with government funding standards by Anonymous Coward · · Score: 1, Interesting

    NIH funding standards promote commercialization of publicly funded software. This appears to have been implemented before the modern internet, and the idea may have been that a commercial product would make the code more available, and perhaps fix some of the quality issues with code cobbled together by "non-programmers". The result is that companies like Accelrys own a huge amount of software developed under public funding. Now, the public has to pay to use software trhat they paid to develop, and it is impossible for other scientific research to extend that publicly funded effort.

    I want to see an NIH version of SourceForge, and mandate all government funded software development to be stored there. Unlike SourceForge, there could be delayed release to the public so that researchers have time to publish their work.

  52. Peer Review / publication process by Wardish · · Score: 2, Insightful

    As part of publication and peer review all data and providence of the data as well as any additional formula's, algorithms, and the exact code that was used to process the data should be placed online in a neutral holding area.

    Neutral area needs to be independent and needs to show any updates and changes, preserving the original content in the process.

    If your data and code (readable and compilable by other researchers) isn't available then peer review and reproduction of results is foolish. If you can't look in the black box then you can't trust it.

    --
    Ward

    . Silence! Be thankful thy species is unpalatable! .
    1. Re:Peer Review / publication process by Xyrus · · Score: 1

      So let's say I supply the source code and you run it and get the same answers. That has proved jack shit.

      Research papers provide the ways and means, including algorithms, for the work being reviewed. The reviewers are responsible for verifying the research. Handing them the source you used it counter productive at best, as it does nothing to validate the research. If you implement code following the means and methods in the paper and you find the answers are different, then you have potentially found a flaw.

      If you can't recreate the program from what's in the paper in a reasonable amount of time, then either the paper is lacking critical information or you don't have the understanding you need to legitimately review the research. A lot of papers in regards to climate research already make assumptions that the readers have a fairly good understanding of climate dynamics. And most code used to support these papers are fairly short as well. In fact, I'm using several papers in my current work which describe algorithms which each one can be written in less than 100 lines of code.

      ~X~

      --
      ~X~
    2. Re:Peer Review / publication process by drizzd · · Score: 1

      That is not always possible for practical reasons, such as proprietary software that was used to find the results. Besides, submissions which describe a black box with the only value being supposedly correct simulation results will not be accepted anyways. Simulation is often simply a means to visualize and quantize results which are already expected from the theory developed by a publication. Simulation results prove a theory neither right nor wrong.

    3. Re:Peer Review / publication process by Wardish · · Score: 1

      Being as I have reviewed many things over the years, including software.

      Getting a different answer only tells you it's different. Could be your code.

      Reviewing his code can not only show a problem(s) but can let you know if it's significant. As stated before, most but not all coding errors are not relevent to the final conclusions.

      Last but not least, if you are going to complain about my work, it might be helpful to offer some useful information. I got different results isn't useless, but it's in the neighborhood.

      --
      Ward

      . Silence! Be thankful thy species is unpalatable! .
    4. Re:Peer Review / publication process by Wardish · · Score: 1

      I agree, however if you make it a requirement to publish, I believe that the necessity of publishing will quickly outweigh the "need" for proprietary code.

      Simulations can be the key to turning data into something understandable. Especially for those who can't see patterns in datasets. Not absolutely required but reproducing someone's simulation... There are so many ways to massage a large complicated dataset.
       

      --
      Ward

      . Silence! Be thankful thy species is unpalatable! .
  53. mechanically verifyable proofs too please by StripedCow · · Score: 1

    Besides code, it would be nice to have mechanically verifyable proofs too!

    But code would be a nice first bit of progression.

    --
    If Pandora's box is destined to be opened, *I* want to be the one to open it.
  54. Accuracy decreasing during execution? by Bromskloss · · Score: 1

    the accuracy of results declined from six significant figures to one significant figure during the running of programs

    What is that supposed to mean?

    --
    Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
    1. Re:Accuracy decreasing during execution? by Areyoukiddingme · · Score: 1

      Exactly what it says. Read about significant figures here. For practical hands-on use to gain a better understanding, take a high school or community college chemistry class.

    2. Re:Accuracy decreasing during execution? by Bromskloss · · Score: 1

      Exactly what it says. Read about significant figures here.

      I know what significant figures are, of course. The question is what they mean when they say that the accuracy decreases during the running of programs - as if they had the answer with excellent accuracy already from the start and then messed it up along the way.

      --
      Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
    3. Re:Accuracy decreasing during execution? by Areyoukiddingme · · Score: 1

      Ah. They didn't have the answer with excellent accuracy at the start. They had input values of excellent accuracy that then lost accuracy because of their limited understanding of how math works in computers. Binary circuits can't represent certain values exactly, so they store an approximation. If you're not careful about how you handle your data during calculations and tabulation, binary circuits will lose significant digits.

      The worst culprits of accuracy loss, mentioned by other posts, is a workflow that performs some floating point calculations, then converts the numbers to a human readable string, then converts them back into floating point and performs more calculations. "Human readable" almost never means "all available accuracy." IEEE 754 single precision floating point can represent roughly 7 significant digits. When printing into tabular form, many people truncate or round to as little as 3 or even 2 significant digits. Depending on the nature of subsequent math, this can be a dramatic loss in accuracy.

      Consider global warming. The IPCC report middle of the road scenario predicts an average of one Fahrenheit degree of warming globally over the course of the next century. That of course means an average of 1/100th of a degree annually. That means you'd really like to have temperature measurements accurate to at least 6 significant digits, such as 102.435F degrees. If at any time during your climate modeling you converted your temperature values to, for instance, 102 degrees, you may have completely eliminated any possibility of detecting the deviation you're expecting. (Not that I'm saying anybody has done that. It's just an example.)

      Modern science revolves around high accuracy measurements, for the most part. We've solved all the easy problems, that can be worked out by eyeballing lengths and weights and things, and we've solved all the not as easy problems that you have to work out by using a nice set of calipers for your lengths. What's left are the finicky picky difficult problems that can't be modeled correctly if your input data doesn't count the number of atoms correctly. (More or less. Little bit of hyperbole there...)

      So if you're sloppy in your data handling when running your processing programs, it's quite possible to come to erroneous conclusions, because you screwed up your resulting numbers so badly that they're giving you the wrong impression. Yet it might not be obviously wrong; they're the right order of magnitude, and look roughly like what you expected given the input data, so you may not notice that there's a problem until some bastard statistician analyzes your code and points out the dumb parts.

  55. They should write their code in Type Theory... by Anonymous Coward · · Score: 0

    ...so they can also release the proofs.

  56. Re:Seems reasonable - up to a point by Jaydee23 · · Score: 2, Interesting

    Code should be release, but this should not be confused with replicating scientific results. ] If you want to replicate research, you need to write your own code according to the methods described in the research. your answer then needs to match the original to test the code.

  57. Lucrative by zogger · · Score: 1

    Not sure on any "lucrative" fields exclusion as basically all science eventually has engineering then lucrative aspects to it, even if it might be some time down the road and is obscured now.

        Climate science and this huge acrimonious debate has trillions associated with it, such as the artificial carbon credits market,(among others) which had no previous business market or demand that existed, else there would have been a big demand for carbon credits previously. If anything, this lucrative "science" should therefore be as open as possible, precisely because trillions is a huge inducement for outright fraud and obfuscation of same.

    Personally,if you want a cleaner environment with the possible long range benefit of possibly keeping the climate more moderate, the carbon tax then on to the wall street gamblers idea is quite a stretch as to real effectiveness.

        I will contend there is a much cheaper way, a way wherein those same trillions would go *directly* to deployment of the new technologies, and that is the tax credit instead of the tax itself. If governments were serious, they could offer 100% tax credits for deployment of new cleaner tech, from a personal level to any size corporate level, then that would be that, the tech would get used, quite willingly, and all that money go to it, rather than getting most of it shuffled off to the already bloated artificial financial products "industry", which already takes an inordinate amount of wealth out of the system and places it into the international casino.

    1. Re:Lucrative by fuzzyfuzzyfungus · · Score: 1

      I don't suggest any exclusion of scientists in potentially lucrative fields. If the public is paying for your research, they get your results, period. If they aren't, they have no necessary right to them; but you should still expect to have them peer reviewed if you want to publish anywhere.

      I was just noting that it would almost certainly be much easier to get code out of scientists whose work has no immediate commercial applications than out of scientists who could just slap a python GUI on their work and have an immediately saleable product.

      They guy who is working on modeling arctic grouse populations as a function of seasonal temperatures and introduced predator interactions is almost certainly not publishing his code just because he is a busy person, and there are no easy, obvious(particularly for people whose primary specialization isn't in software development) means and norms for doing so. He has nothing to gain by hiding it, and would probably publish immediately if reputable journals had an "upload your source tarball/CVS checkout/GIT/whatever" option and expected researchers to use it.

      Somebody doing research on some sort of machine vision stuff, or automated mineral deposit detection based on seismography data, or high speed modeling of some interesting drug target site, though, might well be able to dump a little spit and polish onto their existing code during a sabbatical and have a product worth real money. They would probably be more reluctant, in general, to provide more than what was explicitly demanded of them.

      As for the area of climate science: The assertion that CO2 emissions are, in fact, a huge externality implies economic consequences in the trillions; but the actual science saying so has little direct commercial application. All the real money is in activities implied by; but not directly connected to, the climate modeling(ie. if some sort of carbon credit trading scheme is adopted, Goldman Sachs will probably make a huge pile of money on it somehow; but they will have no need of Dr. Climate Model and his arctic ice cores to do so. It'll be a purely financial-instrument-manipulation game at that point).

      The contrast to this would be something like a hypothetical cure to malaria, which would have economic consequences in the billions(due to improved health and productivity and reduced mortality across the tropical and subtropical world); but would put the scientists involved more or less directly in line to reap some of the rewards: their research would be needed in order to manufacture the drug. Climate scientists, on the other hand, will be largely superfluous once they've defined the size of the problem. Most of the real money will then be in the hands of the finance guys and legal types(with a modest amount ending up in the hands of carbon credit/geoengineering researchers).

  58. They have a solution to the inaccuracies ... by tomhudson · · Score: 1

    This is hugely worrying when you realise that just one error -- just one -- will usually invalidate a computer program.

    They hire statisticians to argue that the errors average out ...

    ... because after all, when neither lies nor damn lies will do the job ...

  59. Re:Seems reasonable - up to a point by khallow · · Score: 1

    Code should be release, but this should not be confused with replicating scientific results. ] If you want to replicate research, you need to write your own code according to the methods described in the research. your answer then needs to match the original to test the code.

    Just in case there is some confusion, code release is part, but not the whole of replicating scientific results.

  60. Undeclared function prototypes a source of error?? by Anonymous Coward · · Score: 0

    People are taking the article the wrong way if you actually read it you will see that most of the errors are "Undeclared function prototypes". I hardly see how that can scientifically effect the outcome of any result. Next, the article is awful in that it hardly defines what it is saying at any moment, it doesn't mention actual programs except Motif and X11??? Are you serious? Since when is X11 a scientific program?

  61. OO.org seems more suited to science anyway by maccallr · · Score: 1

    I switched from MS Excel to OOcalc for some analysis I'm doing at the moment. Excel was slow, made huge files and the bar charts were hideously shaded by default.

  62. Film at 11 by Hognoxious · · Score: 0, Flamebait

    OMG there is bugzorz in teh codezor'z? Oh Noooooesonehundredandeleventyoneoneone!!!!!

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  63. Wolfram Alpha, IDL, MATLAB, etc.? by forand · · Score: 1

    This brings up an interesting point: what about all the different closed source packages which are used in the hard sciences? On a regular basis I am required to use code written using closed source framework al la IDL or MATLAB. These frameworks are invaluable in the sense that we do not have to write the interface between the detectors and a standard PC, but they are often essentially black boxes. Do people really want to require scientists to write (inevitably bad code) to interface their scope to a PC just so they can release that code to the public?

    Finally, I waste enough time fixing bugs in code that will likely never be used by anyone else that I would loath to have to clean it all up for others to use and understand. Scientists are writing tools to get things done, if they had the resources most would love to hire a programmer to write and document the program from a requirements document but very few if any do have such resources. There should be a balance between the publics desire to look over the shoulder of scientists are scientists desire to do science and not waste their time getting 60k emails regarding a bug which has no effect on their result.

    1. Re:Wolfram Alpha, IDL, MATLAB, etc.? by quadelirus · · Score: 1

      We want you to release your MATLAB code. It is fine that the platform is closed. MATLAB isn't the scientific part of what you are doing. Your code is.

    2. Re:Wolfram Alpha, IDL, MATLAB, etc.? by Anonymous Coward · · Score: 0

      Really? Haven't written much code have you? So you think that there isn't a difference in results when switch between versions of Matlab running on Linux versus Windows? If you are so interested in the "scientific part" why wouldn't the paper describing the methodology be better since if you implement the method yourself you would understand the science and by implementing the method using a different tool set you might catch numerical limitations. If you are so interested in the code why don't you download it and take a look at it? I haven't seen anything yet that I cann't download in a few minutes. That's source, results and documentation. FOSS is so much apart of the scientific community these days few scientists make everything freely available. No,your not interested in the "scientific part" your just repeating the garbage you hear from the tinfoil hat brigade.

    3. Re:Wolfram Alpha, IDL, MATLAB, etc.? by quadelirus · · Score: 1

      Really? Haven't actually read very many good papers, have you? The point is that many results take a body of code that takes YEARS to develop. And most of that code isn't that scientific and therefore doesn't need to be verified, but without it you can't do the verification. It is framework, or a body of code built up as a research program runs over years that enables research. Many of these are closed source. FOSS is not "so much apart [sic] of the scientific community these days." That is just wrong. I'm a doctoral student in computer science and almost NONE of the important movers in my field publish any of their code. You are simply wrong. The bottom line is that all too many research groups are NOT publishing the code and they should be.

      And since you went for a personal attack: my credentials are that I am a doctoral student who has been coding for years both professionally and as a student and I am a scientist with published work in several important journals. Judging by the fact that you posted anonymously, used incredibly poor grammar, and displayed a lack of true understanding about what goes on in scientific research, I'm guessing you aren't qualified to comment.

    4. Re:Wolfram Alpha, IDL, MATLAB, etc.? by quadelirus · · Score: 1

      I feel somewhat the same way as you about cleaning up code. But forget about the public. It is too important that other researchers be able to evaluate and use the code. For instance, let's say I don't really want to verify that your work is correct, but I want to build on it. If your code is not available I have to duplicate effort and work through all the annoying engineering details that you, presumably, have already solved in order to get to the same place as you were when you published, just so I can test an addition to your ideas. If you released the code, however, this would not be the case. I could download your code and work from there. Not releasing the code hinders scientific advancement and the only benefit is that you (or me, I do it too) can hide your sloppy code, or worse, you can protect your sub-area of research by making the cost to enter to high for another research group. I say this because there are some interesting areas of research in my field that I would like to enter but I've been warned to stay away because to catch up would take 1-2 years of straight coding without any publishing. Why? Because in some areas you need extensive frameworks of code just to be able to do research (let's not even talk verification). This means that the one or two groups that started in these areas early are the only ones investigating. If there code were open, then anyone would have the ability to start adding to the body of knowledge on the subject and advancement would accelerate.

  64. Re:one error will invalidate a computer program?!? by mjwalshe · · Score: 1

    yes but its the Guardian and their grasp of technology is a little weak at the best of times.

  65. Scientific research as a real open source project by __roo · · Score: 1

    I would go further than just publishing the code used in scientific research. I would build the code by running it a real open source project. In fact, I've done exactly that, and it worked out incredibly well. I believe our open source approach lead to better science, and also better software.

    I worked with researchers from MIT and Columbia on a research project that involved gathering and analyzing a large amount of publication data. The results of the study are about to be published (you can read the working paper at the lead researcher's website).

    We intended the code for this project to be released from the beginning, so we ran it as an open source project. I followed the basic formula from Karl Fogel's excellent (and free to download) book, Producing Open Source Software: set up a website for the project, created lots of documentation, tried to make it as easy as possible for someone to get up and running, made the source available via Subversion, and made it easy to contact us.

    Quality was really important for us, so we put a lot of effort into testing. I definitely believe that the fact that we intended the project to be open source from the beginning helped with that. We weren't treating the code as some piece of throwaway or replaceable lab equipment. I'm convinced that treating it as a real product of the research caused us to take the development and the quality much more seriously than a lot of researchers. I've since heard from other researchers who are starting to use the software as well, and everyone who sees it feels that it came out really well.

    There was another scientific benefit that should definitely appeal to anyone who lives in the publish-or-perish world of science research. We published a paper specifically on the project (Azoulay P, Stellman A, Zivin JG. PublicationHarvester. An open-source software tool for science policy research. Research Policy 35 (2006) 970-974. -- there's a link to the PDF on the lead researcher's website.)

    It's funny -- I wrote an article a few years ago with Jennifer Greene for O'Reilly ONLamp called What Corporate Projects Should Learn from Open Source. I'm now convinced that science research projects can also learn a great deal from open source as well.

  66. You mean scientists who don't toe the party line by Anonymous Coward · · Score: 0

    You mean scientists who don't toe the party line, that AGW is just a communist plot to let Obama take all USians into a Maxist state and kill their grannies.

  67. Precisely by Sycraft-fu · · Score: 3, Insightful

    The more important the research, the larger the item under study, the more rigorous the investigation should be, the more carefully the data should be checked. This isn't just for public policy reasons but for general scientific understanding reasons. If your theory is one that would change the way we understand particle physics, well then it needs to be very thoroughly verified before we say "Yes, indeed this is how particles probably work, we now need to reevaluate tons of other theories."

    So something like this, both because of the public policy/economic implications and the general understanding of our climate, should be subject to extreme scrutiny. Now please note that doesn't mean saying "Look this one thing is wrong so it all goes away and you can't ever propose a similar theory again!" However it means carefully examining all the data, all the assumptions, all the models and finding all the problem with them. It means verifying everything multiple times, looking at any errors any deviations and figuring out why they are there and if they impact the result and so on.

    Really, that is how science should be done period. The idea of strong empiricism is more or less trying to prove your theory wrong over and over again, and through that process becoming convinced it is the correct one. You look at your data and say "Well ok, maybe THIS could explain it instead," and test that. Or you say "Well my theory predicts if X happens Y will happen, so let's try X and if Y doesn't happen, it's wrong." You show your theory is bulletproof not by making sure it is never shot at, but by shooting at it yourself over and over and showing that nothing damages it.

    However that this process is done right becomes more important the bigger the issue is. If you aren't right on a theory that relates to migratory habits of a sub species of bird in a single state, ok well that probably doesn't have a whole lot of wider implications for scientific understanding, or for the way the world is run. However if you are wrong on your theory of how the climate works, well that has a much wider impact.

    Scrutiny is critical to science, it is why science works. Science is all about rejecting the ideas that because someone in authority said it, it must be true, or that a single rigged demonstration is enough to hang your hat on. It is all about testing things carefully and figuring out what works, and what doesn't.

  68. Only if... by captainpanic · · Score: 2, Insightful

    Only if the real programmers out there promise to be nice to us scientists.

    Most scientists will know a lot about, well, science... but not much about writing code or optimizing code.

    Like my scripts. All correct, all working... lots of formulas... but probably a horribly inefficient way to calculate what I need. :-)

    the last thing I need is someone to come to me and tell me that the outcome is correct but that my code sucks.
    (And no, I am not interested in a course to learn coding - unless it's a 1-week crash course).

    1. Re:Only if... by mrxak · · Score: 1

      There's going to be people criticizing code for improper documentation, but I feel that's something that should be a concern in general for scientific research. If you don't document your code properly, you aren't properly documenting your methodology. Code that works, though, regardless of style will probably not get too much criticism. I can't imagine some suggestions for efficiency would hurt though. If you can do your calculations more quickly, or with less computing resources required, you can accomplish more science in less time, for cheaper.

      Really what we're talking about here is verifying that your desired algorithms are what actually ended up in the code. Having some extra eyes on your code can only be a good thing for that reason. Programmers will leave the science behind the algorithms up to the scientists.

  69. outrageous by pydev · · Score: 1

    Not all code needs to be released; sometimes, the code isn't needed for reproducing a result or the formulas in the paper are sufficient.

    However, climate modeling is such a complex process, and its implications so serious, that it is outrageous if climate modeling code isn't fully and completely available. Climate modeling code needs to be extensively peer reviewed and until people can reproduce the results and show that the models behave reasonably under different conditions, the results cannot be trusted.

    1. Re:outrageous by Anonymous Coward · · Score: 0

      Try

      http://www.ccsm.ucar.edu/models/ Freely downloadable since June 1996
      http://www.giss.nasa.gov/tools/modelE/ (Hansen's model) Freely downloadable since the mid 1990's (version dependent)

      Then of course there is the one stop shopping site

      http://www-pcmdi.llnl.gov/

      Where data and model source has been collected and distributed since 1989

      At least take the time to use google before you make an even bigger fool of yourself

  70. Re:Why release it? by mcgrew · · Score: 1

    I don't subscribe to Hanlon's razor; I have my own. Mine says "never ascribe to incompetence or stupidity that which can be explained by greedy self-interest".

    Although in this case, Occam rules. The "mcgrew's razor" almost never applies to scientists and almost always applies to corporate types.

  71. Comment removed by account_deleted · · Score: 2, Insightful

    Comment removed based on user account deletion

  72. It's an old story by jc42 · · Score: 4, Informative

    This is hugely worrying when you realise that just one error -- just one -- will usually invalidate a computer program.

    Back in the 1970s, a bunch of CompSci guys at the university where I was a grad student did a software study with interesting results. Much of the research computing was done on the university's mainframe, and the dominant language of course was Fortran. They instrumented the Fortran compiler so that for a couple of months, it collected data on numeric overflows, including which overflows were or weren't detected by the code. They published the results: slightly over half the Fortran jobs had undetected overflows that affected their output.

    The response to this was interesting. The CS folks, as you might expect, were appalled. But among the scientific researchers, the general response was that enabling overflow checking slowed down the code measurably, so it shouldn't be done. I personally knew a lot of researchers (as one of the managers of an inter-departmental microcomputer lab that was independent of the central mainframe computer center). I asked a lot of them about this, and I was appalled to find that almost every one of them agreed that overflow checking should be turned off if it slowed down the code. The mainframe's managers reported that almost all Fortran compiles had overflow checking turned off. Pointing out that this meant that fully half of the computed results in their published papers were wrong (if they used the mainframe) didn't have any effect.

    Our small cabal that ran the microprocessor lab reacted to this by silently enabling all error checking in our Fortran compiler. We even checked with the vendor to make sure that we'd set it up so that a user couldn't disable the checking. We didn't announce that we had done this; we just did it on our own authority. It was also done in a couple of other similar department-level labs that had their own computers (which was rare at the time). But the major research computer on campus was the central mainframe, and the folks running it weren't interested in dealing with the problem.

    It taught us a lot about how such things are done. And it gave us a healthy level of skepticism about published research data. It was a good lesson on why we have an ongoing need to duplicate research results independently before believing them.

    It might be interesting to read about studies similar to this done more recently. I haven't seen any, but maybe they're out there.

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.
  73. Losing significant digits? Then RTFM. by rnturn · · Score: 1

    "For example, interface inconsistencies between software modules which pass data from one part of a program to another occurred at the rate of one in every seven interfaces on average in the programming language Fortran, and one in every 37 interfaces in the language C. This is hugely worrying when you realise that just one error -- just one -- will usually invalidate a computer program. What he also discovered, even more worryingly, is that the accuracy of results declined from six significant figures to one significant figure during the running of programs.'"

    The best way to keep this from happening is to avoid passing formatted, human-readable data between programs. That's what FORTRAN's unformatted I/O was meant for. Same thing for C. Don't convert to a convenient human-readable form until the very end.

    --
    CUR ALLOC 20195.....5804M
  74. For climatology, this is a non-issue by azgard · · Score: 2, Informative

    While I am fan of open source and this idea in general, for climatology, this is a non-issue. Look there: http://www.realclimate.org/index.php/data-sources/

    It's more code out there than one amateur can eat for life. And you know what? From the experience of people who wrote these programs, there isn't actually much people looking at it. I doubt that any scientific code will get many eyeballs. This is more a PR exercise.

  75. Not necessary by Sycraft-fu · · Score: 1

    The problem isn't so much one of not releasing source code, but not releasing any code at all, not releasing the methods used. If someone uses an excel sheet to do calculations and releases that sheet, that's good enough. You can then test it and see how it works. You don't need to audit the excel code, you only need to verify that the results you are getting are valid.

    What is important is full disclosure of methods. Someone else should be able to replicate your findings independently. That means you need to disclose all your data (and how you obtained it) as well as all your methods. Well, of your wrote custom code, that code would be part of your methods. If you used an off the shelf program, say which one.

    In the case of the Excel situation so long as they released the excel sheet everything would be fine. You could of course load it up in excel yourself and see what happened. You could also test excel if you though there was a bug with your own data. You could take the calculations they used in excel and put them in to OO or SPSS or whatever other calculations package you liked and see if the result was the same. You could even do it by hand, if you had the time.

    The problem starts when methods and/or data are kept hidden away. When there is a "We analyzed this data using our special program and got this result but no you can't see it," situation. The experiment can't be replicated at that point, and there's all sorts of room for error. On point with the climate thing would be the "hockey stick" program. When it was analyzed, by running it not by looking at the source, it turned out the thing just liked to generate graphs that curved up like a hockey stick, regardless of the data input. Ok well clearly there was a problem with the method, the code. Doesn't really matter what that is, you don't need to audit the code and find it, just to find out that the method was flawed, so the conclusion doesn't follow the data.

  76. Example of Disastrous Programming Error by structural_biologist · · Score: 1

    A good example of how disastrous one error in data analysis code can be comes from the field of biochemistry, specifically analyzing the x-ray diffraction patterns from crystallized proteins to determine the three dimensional structure of the proteins. Geoff Chang, faculty at the prestigious Scripps Research Institute in San Diego, had published a series of landmark papers describing the structures of various important membrane proteins, whose structures had never been solved before. Although these results did not mesh other researchers' models for how these proteins worked, many researchers used Chang's structures as starting points to develop and test new models for how these proteins might work. However, in 2006, things came crashing down:

    In September, Swiss researchers published a paper in Nature that cast serious doubt on a protein structure Chang's group had described in a 2001 Science paper. When he investigated, Chang was horrified to discover that a homemade data-analysis program had flipped two columns of data, inverting the electron-density map from which his team had derived the final protein structure. Unfortunately, his group had used the program to analyze data for other proteins. As a result, on page 1875, Chang and his colleagues retract three Science papers and report that two papers in other journals also contain erroneous structures.

    (from a Science news article (subscription required to view full article):

    Of course, there were other factors that caused this situation (because the proteins are notoriously difficult to work with, the data were fairly poor quality. Had the data quality been better, Chang would have likely realized the mistakes prior to publishing the papers). However, it is sobering to note the resources wasted on research following up on these incorrect structures produced by a simple coding error (I know a few people whose entire theses were invalidated by these retractions).

    It is, however, unclear whether releasing the data analysis code would have fixed this situation. The software for analyzing x-ray diffraction data is fairly standard (I don't know why Chang was using his own homemade software) and various open-source software are available. Furthermore, in his field, it is common to release the raw diffraction data (I'm not sure if it was released in the case of these five structures), so it may have been possible for others to double check his work with their own analysis software. Perhaps the greater error here is in Chang, the peer reviewers of his publications, and the scientific community for believing Chang's conclusions (based on relatively poor quality diffraction data) over the conclusions of many other researches whose techniques may not have been as sophisticated, but who had generated data of much higher quality.

  77. Qualification is what matters! by laxsu19 · · Score: 1

    So what if there are bugs, isn't what matters that the answer is correct? This is done, at least in my organization, as follows: 1) with a test problem. you can run a calculation that is easily solved analyitically and then compare results. You can also challenge yourself a little more by running a problem with ready experimental results for, and then again, comparing results. Only after this first step is done (many, many, many times)does confidence in the code build up and it can begin to be used in new areas. This is called qualification, verification, or validation. 2) When attacking problems not previously solved (which, after all, is the reason for writing the code), the scientist/engineer/end-user must have an expectation of what the results will be like. They may not know values, but they should expect what has changed since the last model they ran (i.e., "if I increase temperature by 10 degrees in my model, then this should happen..."). 3) While this may not be possible in all fields, either an experiment, or manufactured product should be tested to ensure that you got out what you predicted with your code. Engineering organizations can do this. Climatologists probably can't, I'd imagine. While I don't care about the debate of opening up the source code, I take issue with the fact that a comp sci guy looked at some scientist/engineer's code and said 'omg bugs!' Sure, it may not be how the comp sci expert would program, but it doesn't matter in the end (provided qualification is done adequately). Personally, I don't want to open up my code beyond who is necessary to see it - the code is not the end, its just a means. Its like a Doctor allowing everyone to see his personal diary on patients - its only work to support the diagnosis, and not the diagnosis itself.

  78. Models & Algorithms by drooling-dog · · Score: 2, Insightful

    It seems to me that what's important is the theory being modeled, the algorithms used to model it, and of course the data. The code itself isn't really useful for replicating an experiment, because it's just a particular - and possibly faulty - implementation of the model and as such is akin to the particular lab bench practices that might implement a standard protocol. Replicating a modeling experiment should involve using - and writing, if necessary - code that implements the model the original investigators intended to implement, but distinct from that which they actually used.

    Running the same code on the same data demonstrates very little, and finding bugs in the original code tells you nothing about what results would/should have been achieved had the model been implemented correctly. But of course it's great for throwing stones and "discrediting" a result without actually adding anything constructive to the issue at hand.

    1. Re:Models & Algorithms by Troed · · Score: 2, Interesting

      Wrong. You're taking two separate issues and try to claim that since there are two one is irrelevant. Of course it's not - both are. However, verifying the model DOES take more domain knowledge than verifying the implementation. We're currently discussing verifying the implementation, which is still important.

    2. Re:Models & Algorithms by Anonymous Coward · · Score: 0

      Troed, If you're such a good software engineer go get the NASA/GISS Model E code and analyze it. It is available here. The Model E is one of the biggies in climate modeling.

  79. Re: tools by jrvz · · Score: 1

    I recommend the "reproducible research" methodology described at http://reproducibleresearch.net/index.php/Main_Page . The idea is that for each paper you publish, you make available an archive with the software, data, scripts, etc., so the user need only type "make" to reproduce every figure in the paper.

  80. Arrogant Scientist Are Not Project Managers by Shannon+Love · · Score: 2, Insightful

    I hate to break it to you but all programming is highly specialized. Climatology is in no way special in this regard.

    Neither do programmers have to understand the abstract model of the program to write it or evaluate it. The vast majority of professional programmers do not understand the abstract model of the code they create. You do not have to be a high-level accountant to write corporate accounting software and you don't have to be a doctor to write medical software. Most programmers spend most of their time implementing models created by non-programmers from fields of which the programmers have no detailed knowledge.

    Does that mean that programmers can't spot crappy code just because they don't understand the details of the model? No, it does not. Most software errors don't arise from the model but from sloppy practices in the management of the software project itself. An experienced programmer doesn't even have to know the language of project to see that it's creation and maintenance was incompetently handled.

    You don't have to be a climatologist to know that the CRU software was utter crap that would produce sound outputs only by divine intervention. For any experienced programmer, it was immediately obvious that it was a great reeking gob of amateur coding with no structure, no plan and no standards. In my experience, most scientific software is like the CRU software. It evolves in an ad hoc manner over many years with no governing organizational structure.

    Commercial software developers have created a wide range of tools and procedures to manage large, vital projects. In the main, scientist use none of these tools and most of them appear unaware they even exist much less why and when they are needed. As a result most scientific software project management is completely amateurish. If most scientific software were written for commercial applications, the developers would be sued or imprisoned for fraud.

    Scientist tend to be arrogant and dismissive of the work of others especially those who work in the commercial sector. You believe that because you understand climatology that you therefore understand all the tools you are using. Well you don't. You think that because no one can understand your abstract model that therefore they cannot find significant errors in your code. Well, they can. You think we should reengineer our entire civilization based on your unquestioned and unexamined computerized ivory tower auguries.

    Well, we won't.

    Your just going to have to suck it up and withstand the at least the same scrutiny we give important commercial software.

    1. Re:Arrogant Scientist Are Not Project Managers by mrxak · · Score: 1

      Exactly. Let the scientific community evaluate the algorithms, and let the computer science community evaluate its implementation as code. The end result is better code and better science. How is that a bad thing? Any true scientist can't possibly object to something that will result in ensuring the validity of experiment results. As I understand it, scientists are after the truth, not merely research grants and political attention, right? Right??

      I really don't think anybody actually expects a few million people challenging the basis of the algorithms here. Programmers will just be challenging the implementation and making sure it's correct.

    2. Re:Arrogant Scientist Are Not Project Managers by poopdeville · · Score: 1

      Are you offering to pay for software engineers to do code reviews and audits? I know thousands of scientists who would love a software developer to work with.

      --
      After all, I am strangely colored.
  81. Duh? by b4upoo · · Score: 1

    I'm old school but we were drilled to death on knowing the degree of accuracy of any mathematical results. I can't imagine scientists who do not know the degree of inaccuracy in any work product.

  82. hmmmm. by jafac · · Score: 1

    Does this include the "science" of Economics?
    If so - I think that would be a VERY good thing.
    Particularly in the area of derivatives trading.

    It's estimated that 20% of the global economy now relies on trading derivatives, whose valuation is based on formulae, that are considered "proprietary" - and therefore, you're supposed to just trust the seller on the valuation. (by the way, I've got this bridge I'm selling, in Alaska . . . )

    Ironic that Climate science is regarded with such skepticism, yet people in the high institutions of banking or the University of Chicago, Economics department so feverishly cling to the belief in some "invisible hand".

    --

    These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    1. Re:hmmmm. by ChrisMaple · · Score: 1

      The "invisible hand" quote from Adam Smith is an analogy, and people like you are engaging in nothing but smear tactics when you claim that Chicago School economists belief in a literal invisible hand.

      The gears in a transmission are the invisible hand that links the engine to the wheels.

      --
      Contribute to civilization: ari.aynrand.org/donate
  83. Reproducible Research with sweave by akakaak · · Score: 1

    If you do your data analysis in R and write your paper in latex, then you can use sweave to create a single file with the R code embedded in the latex. When you process the latex file, the R code is run, generating your stats and figures on the fly for the resulting document. If this file (plus any data files it reads, and any non-standard code it calls) is posted as "supplementary material" along with the PDF journal article, then there is no question about the software or analysis that led to the figures in the paper. Of course, the data itself is still open to question.

  84. Dirty Laundry by ThatsNotPudding · · Score: 1

    No one likes their And_Then_A_Miracle_Occurs() subroutines aired out in public.

  85. Re:Seems reasonable - up to a point by mrxak · · Score: 1

    Verifying that the code used by the researchers is consistent with the methods described, without accidental errors, is basically what we're talking about here, and is essentially equivalent to writing your own code. Eliminate researcher errors in their coding, and you're left with one less possible flaw in an experiment.

  86. We need a good central repository by dj_tla · · Score: 1

    If a proposal like this is to succeed -- and I hope so hard that it does -- we need a central repository to store code and meaningfully link it to the papers that use it.

    This repository should have the same amount of peer review and, therefore, authority that scientific journals have now. Maybe that can happen by existing journals adding the ability to link code to a paper (and enforce that any code used to generate results is included), maybe a new organization has to rise up to the challenge (I would love to see a code.arxiv.org).

    Already I can hear the outcry of scientists claiming that their code is "sloppy" and "not ready to be released," but those concerns are simply irrelevant: all that matters is that the code produces the output cited in the paper given the input cited in the paper. That's it. If another researcher finds your result interesting, then let it be up to them to parse out your code -- it's probably still way better than trying to reproduce your result based on the prose that describes your algorithm.

    1. Re:We need a good central repository by mrxak · · Score: 1

      As for a central repository, why not make it government, if we're talking about government-funded research? Something like a Library of Congress for scientific OSS.

  87. reproducible research = code to generate figures by peter303 · · Score: 1

    My thesis advisor rewrote his textbooks every five years or so. But bemoaned losing the original copies of figures and being able to regenerate them in ever-improving computer/print media. So he started the requirement of reproducible documents : the computer programs, both scientific & graphic along with raw data much me assembled together for every figure in your thesis and scientific paper. These would be assembled into makefile-like system to create whatever portion one needed. Then they were archived for posterity. The ultimate test was to "burn" your figures, i.e. erase them from the upcoming document. Then the programs would be run to regenerate them. In practice there are "degrees" of regeneration. Sometimes the figure data is the output of a multi-month supercomputer run. So just the document formating programs would actually be run on the raw output data.
    This system was just mentioned in Science magazine Jan 22 2010 p 415. (no free link)

  88. In the word of Feynman by BlueParrot · · Score: 1

    "Philosophy of science is about as important to scientists as ornithology is to birds." - Richard Feynman

    You know how you guys usually cringe when some clueless Economist or Law professor wants to introduce some new law for how the internet should work? Like demanding that all TCP connections be logged and the data kept for 10 years ( or something even more stupid ).

    Well a fair share of you should now know exactly why they do this, because you keep doing the exact same thing every time a Slashdot article about science is published. To put it simply, if you're one of the idiots who thinks its a good idea to demand that every science paper to be published should document everything right down to what color of toilet paper the researchers used to blow their noses, then you never again get to complain when management wants all decisions to be accompanied with a cost-benefit analysis.

  89. I've seen it by Anonymous Coward · · Score: 0

    Its been 15 years since I worked in university labs, but I clearly recall that the code written by most Physics professors was horrid.

    It was only the people like myself who had studied Computer Science who knew now bad most of that software was. Nightmares to debug and maintain.

    But it doesn't stop there. Consider the data handling techniques used in the social sciences. I've seen first hand how the data from survey's were manipulated over and over, replacing the original data sets on each step. The original data was completely lost and the steps to get from the original to the published data were completely untraceable.

    My conclusion is that (in general) academic scientists are no better than the general public at data handling and data verification techniques.

  90. Impediments to releasing scientific code. by Anonymous Coward · · Score: 0

    There are benefits and drawbacks to releasing code. First, let me say I am a scientist, and I do release my code, primarily under the GPL unless I am contractually obligated by license agreements to do otherwise. The exception would be trivial single use code or limited use code. Despite that, I don't get a lot of eyes looking at my code.

    Some people here have claimed that journal policies are a part of the problem. Not as far as code releases go. Journals don't publish code. I think there are real impediments to code release, even when most scientists would claim that the release is a good idea.

    • A large amount of code is written by students with limited programming experience. Therefore it is difficult to read, poorly commented, and poorly optimized. That's a strong argument that it should be released. But a lot of people would be embarrassed if that code was released. So there will be some push back. Especially by established, famous, and respected professors who really don't want their crappy code to be a source of ridicule.
    • A lot of large bodies of code are controlled by scientific consortia. A library for simulating particle accelerator collision products might have several hundred authors. In many of the files the authors will have placed a copyright statement naming themselves as the copyright holder, in others they may have named the institution they work for as the copyright holder. Since the code was donated to the consortium, the consortium can claim its members have the implied right to used the code. But the consortium probably can't release the code to the public, since it would need the permission of all authors (many of them deceased) and the institutions they were working for when they developed the code in question.
    • A lot of scientific programmers use code that they don't have a license to release. For example, we've all pulled a routine out of Numerical Recipes, because it was easy, did the job we wanted, and was fast enough. Oops, sorry, can't distribute that code. We've all used a GPL library in an application linked to closed source or limited distribution libraries. Oops, sorry, it's illegal to distribute that code. (Don't tell my University about this, because they'll make me consult with a lawyer before releasing any code.)
    • Scientists may, in general, be intelligent and well informed, but when it comes to the intricacies of licensing agreements and copyright law most don't have more than a very limited understanding.
    • A lot of scientists don't want to be scooped by competitors.
    • There are some people who think releasing code or making it available to colleagues works against the scientific method by discouraging reimplementation of an application for the purpose of replicating earlier results.

    In the case of climate modeling code, I'm guessing that #2 is probably the biggest hurdle. In my code, #3 is probably the biggest hurdle, and it does take significant effort to work around it.

  91. Re:Why release it? by WhiplashII · · Score: 1

    Please read up on the history of behind measuring the charge of an electron. You are wrong in essence, the conspiracy theorists are wrong about motivation.

    But either way, you cannot trust scientists data as truth. Especially early science, when the experiment has not been copied by millions.

    Scientists are human, just like anyone else. To misquote, "the scientific method is the least terrible method of finding truth that we have yet discovered."

    --
    while (sig==sig) sig=!sig;
  92. Impediments to releasing scientific code. by SETIGuy · · Score: 1

    Damn slashdot and its awful posting interface.

    There are benefits and drawbacks to releasing code. First, let me say I am a scientist, and I do release my code, primarily under the GPL unless I am contractually obligated by license agreements to do otherwise. The exception would be trivial single use code or limited use code. Despite that, I don't get a lot of eyes looking at my code.

    Some people here have claimed that journal policies are a part of the problem. Not as far as code releases go. Journals don't publish code. I think there are real impediments to code release, even when most scientists would claim that the release is a good idea.

    • A large amount of code is written by students with limited programming experience. Therefore it is difficult to read, poorly commented, and poorly optimized. That's a strong argument that it should be released. But a lot of people would be embarrassed if that code was released. So there will be some push back. Especially by established, famous, and respected professors who really don't want their crappy code to be a source of ridicule.
    • A lot of large bodies of code are controlled by scientific consortia. A library for simulating particle accelerator collision products might have several hundred authors. In many of the files the authors will have placed a copyright statement naming themselves as the copyright holder, in others they may have named the institution they work for as the copyright holder. Since the code was donated to the consortium, the consortium can claim its members have the implied right to used the code. But the consortium probably can't release the code to the public, since it would need the permission of all authors (many of them deceased) and the institutions they were working for when they developed the code in question.
    • A lot of scientific programmers use code that they don't have a license to release. For example, we've all pulled a routine out of Numerical Recipes, because it was easy, did the job we wanted, and was fast enough. Oops, sorry, can't distribute that code. We've all used a GPL library in an application linked to closed source or limited distribution libraries. Oops, sorry, it's illegal to distribute that code. (Don't tell my University about this, because they'll make me consult with a lawyer before releasing any code.)
    • Scientists may, in general, be intelligent and well informed, but when it comes to the intricacies of licensing agreements and copyright law most don't have more than a very limited understanding.
    • A lot of scientists don't want to be scooped by competitors.
    • There are some people who think releasing code or making it available to colleagues works against the scientific method by discouraging reimplementation of an application for the purpose of replicating earlier results.

    In the case of climate modeling code, I'm guessing that #2 is probably the biggest hurdle. In my code, #3 is probably the biggest hurdle, and it does take significant effort to work around it.

  93. Seems to me that there are several issues here... by wfolta · · Score: 1

    Seems to me that there are several issues here:

    1. Making code available. What do we mean by "code"? Do we mean 100K lines of f77 that's been hacked for 30 years? Or do we mean R, SAS, or Mathematica code? The former is probably not so useful to others, though the latter may well be incredibly illuminating, even for non-experts in the specific field.

    2. Making DATA available. Yes, in the leaked climate emails, there was some amateurish code. But the big issue there was actually sharing DATA, not code. In fact, emails indicated that there were threats of deleting data, and data that was never actually confirmed, hence a scandal.

    3. Unqualified people using your data/code to attack your position. Seems to me that unqualified people are already attacking positions without data/code access, so it's not really a winning position to refuse to share. In fact, it's quite legitimate to suspect someone who is unwilling to share their data/method for arriving at a conclusion. The 1950's "according to scientists" simply doesn't fly anymore.

    4. Unqualified wasting your time answering questions about your data/code. The open source movement has had to deal with this for quite some time. Not only that, the presentation of a question or objection: a) can often be judged by its own statement, and b) may be a common question that needs to be answered. In the first case, someone writing to a climatologist and saying "I downloaded your data and put it in excel and used that curve fitting thing and my curve shows temperatures peaking and going down" needs nothing beyond a canned reply, whether in public or private. (And a public discussion of why this is NOT a scientific objection would actually advance the overall state of education and science in the world.)

    I'm not sure that 300K of crufty f77 code would be very useful to anyone to see. Though I'd also say that knowing someone's conclusions were based on 300K lines of crufty f77 code would be a point in the "not so sure" column. Which I think is much of the objection of releasing code: it takes guts to put not only your conclusions on the line, but also your assumptions and reasoning and most people are simply not willing to do this. Scientists (capital "S") should be willing to do this, but it would be pretty embarrassing to say, "My conclusions are based on a model that involves 400K lines of fortran code that has been tweaked for the last 30 years and which no one living actually understands. It seems to interpolate data very well, and we have reasons X, Y, and Z to believe that it extrapolates well also."

    You may be right, but how many people are willing to say this? Easier to say, "Our proprietary model, developed by [Cue authoritative music] the most EMINENT SCIENTISTS on the PLANET, says A, B, and C," and then Appeal To Authority (tm).

  94. Never trust code by drizzd · · Score: 1

    It is already common knowledge among researches that simulated results can not be trusted. A program is only as good as it is tested. Only for scientific programs, testing is not usually as trivial as with application software. For example, you may have to analyze a system statistically and verify that the simulation results match the expected statistics. The scientific value is in that analysis and simulation results verified against such an analysis. While I welcome the idea of making the code used in scientific work public, I believe a much better reason to do so is that other researchers can improve on it. Of course, research is not always meant to be for the common good...

  95. Keep the lid on the trash closed by namgge · · Score: 1

    I believe that scientists should publish their algorithms and methods, but publishing code may be counterproductive for several reasons. Firstly, another group trying to replicate and/or verify the method should start from scratch to ensure their work doesn't simply import flaws from the original. Secondy, I don't believe it is possible to debug science code to the point where it is defect free - people keep debugging only until the results agree with their intuition. Thirdly, scientists should not waste their lives sifting through thousands of line of Fortran written by long-gone grad students hoping to find errors, they should be creating and investigating new stuff. Namgge

  96. Yeah right by SoftwareArtist · · Score: 1

    This is hugely worrying when you realise that just one error -- just one -- will usually invalidate a computer program.

    If you believe that, you are forced to concluded that most likely no valid program has ever been written in the history of the human race. All programs have bugs in them (yes, I truly mean all), and in most cases those bugs do not invalidate all their results.

    Don't misunderstand me. I definitely support releasing the source code for scientific programs, and I believe that finding bugs in them will ultimately lead to better science. But nothing useful is achieved by absurd hyperbole like the quote above.

    --
    "I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
  97. The actual paper is not quite as convincing by Krahar · · Score: 1

    From reading the actual paper, the analysis of source code is based on automatic static analysis, NOT on humans actually reading the code. Static analysis software can generate a high amount of false positives, so knowing that lots of software triggers the static analysis criteria for a fault doesn't necessarily mean anything at all about number of bugs or quality of the software. It does mildly suggest that there might be a problem, though.

    The second part of the paper is about comparing different commercial software for doing calculations on seismic data. The results of this is that even though the operations that these software systems are supposed to carry out are completely mathematically stringently defined algorithms, the answers you get in the end differ wildly, and in some cases the different packages agree on only 1 figure! There is no one package that seems to generate better output than the others. As the paper states, this could result in drilling a 20 million dollar oil well in the wrong place, since up to three significant figures can be needed for that purpose.

    To sum up, this is absolutely scandalous, and there is no reason to suspect that software for seismic calculations is any worse than for lots of other areas.

  98. Feeding Climate Denialist Trolls? by benjfowler · · Score: 1

    Who's gonna wager their hard-earned that this is a tactic by the climate denial industry to open up another avenue to discredit researchers?

    And how long until idiots like Monckton miraculously become expert software engineers overnight, then start ripping climate researcher new assholes because their computer models contained logic errors?

  99. The article is old by Anonymous Coward · · Score: 0

    The article cited for the error rate in scientific programs is from 1997, on research done in 1990 & 1994. Programming has changed since then, for example the "Fortran" examined was Fortran 77, rather than Fortran 95/2003 which provides interface checking. This isn't to say that all scientists are good programmers.

  100. Some practical steps by mhwombat · · Score: 1
    I work in "IT for research". I see a lot of researchers from different disciplines who write code, or in some cases hire developers. In my experience there are a few things that can make a big difference:
    • Appropriate Methods journals - scientists are MUCH more likely to go to the effort of releasing their code if they will get cited by scientists who re-use it, and cited in a journal with some respectability. In fields with these kinds of respected journals there is much more code publication. And if they're going to release it, they will look harder at the quality.
    • Pressure on instrument manufacturers to release standards - proprietary standards can be a major barrier in developing open code, not to mention a huge waste of time. These instruments are very expensive and bought with public money. Funding bodies which fund hardware purchases could apply a LOT of pressure here.
    • Research IT support from universities (not just IT support for the educational branch of the university). The quality of this can be wildly variable.
    1. Re:Some practical steps by mhwombat · · Score: 1

      And let's not forget - a lot of research is done not with researcher-written code, but with standard bought software packages which are proprietary and not available for review.

  101. Yay for public research code by Frigo · · Score: 1

    I did always wonder what would happen if we feed random data into these proggies they use to calculate temperature anomalies. Would they still produce the hockey stick?

  102. Anonymous Coward by Anonymous Coward · · Score: 0

    Not so anonymous: I am Darrel Ince who wrote the article. It has generated a lot of reaction. It is a really tricky subject which 800 words in an article does no justice to. A major point is that there are a number of valid reasons why someone will refuse to divulge their code which seems fine by me: commercial agreements, the issue of intelelctual property rights (something which clouded the whole issue of the hockey stick code). Also there is the problem that developing computer code is tough and a scientist should be allowed time to develop publications from the work; Steve Scneider of MIT has suggested a two year moratorium.

    There is also the problem of a vexatious demands. As far as I am concerned if you have released the code and it is complete then that is it. The recipient is on their own. The computer is chaning so much about science that one of the good things that has come out of the CRU incident is that it has highlighted it and the work that fourth paradigm people are doing.

    Darrel Ince

  103. Re:Why release it? by Anonymous Coward · · Score: 0

    What brought the US to its knees in Vietnam was the fact that the Vietnamese knew eventually we would have to leave and they would still be there. The cost (not just monetary costs either) were too great for US to stay. The same thing is happening in Afghanistan.

  104. Simply look for the value of Pi by niftymitch · · Score: 1

    Having looked at some of these "research" codes I was astounded to see "PI=3.14" in one. I got access to this bit code because a parallel version would differ in the 19th digit of an IEEE float as the number of processors changed and I was supposed to fix the machine.

    --
    Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.