UK University Researchers Must Make Data Available
Sara Chan writes "In a landmark ruling, the UK's Information Commissioner's Office has decided that researchers at a university must make all their data available to the public. The decision follows from a three-year battle by mathematician Douglas J. Keenan, who wants the data to do his own analysis on it. The university researchers have had the data for many years, and have published several papers using the data, but had refused to make the data available. The data in this case pertains to global warming, but the decision is believed to apply to any field: scientists at universities, which are all public in the UK, can now not claim data from publicly-funded research as their private property."
There's more at the BBC, at Nature Climate Feedback, and at Keenan's site.
The public pays for gathering the data, the public should have access to that data. Kinda hard to find fault with that.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
making publicly funded non-military research has nothing to do with privacy. Public money is spent for the public good and there is no good justifiable reason to keep it hidden from the public... especially if its meant for the betterment of society.
if you want your data to be private, get your own privately funded money
Science journals have long fought this, because their profit model is strongest when they own copyright and are the exclusive publishers of a paper. Peer review and scientific principles don't mesh well with peer review though, and many academes have either "published" their papers on their own websites or found other ways to try to work around the journals.
Ridding peer review and science of copyright would be a great improvement.
For every problem, there is at least one solution that is simple, neat, and wrong.
"Scientists" scared of goofy analysis are priests, not scientists. Take their funding away and use their PhD parchment for toilet paper.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Free if you pay your TV tax or pirate I believe
This post was generated by a Cadre of Uber Monkeys for Monkey-Man2000 (603495).
On the other hand, this will likely produce a whole stream of deliberately inaccurate analyses with ulterior motives behind them.
But with the data public, it'll be easier to shoot them down for picking, choosing, skewing, and what else.
There is no reason why this kind of data should ever be "secret"
Phil Willis, a Liberal Democrat MP and chairman of the Science and Technology Select Committee, said that scientists now needed to work on the presumption that if research is publicly funded, the data ought to be made publicly available.
That doesn't seem unreasonable to me. Appendices with raw data are often included already in the online editions of journals. Of course, if the ruling applies to all data generated in the course of a study, whether it is used in publications or not, it could be onerous indeed.
does anyone know if the NSF has similar requirements?
weinersmith
Yes, yes, and yes. What is the problem? If they are racing, there is obviously something worth racing TO. If both teams have all the data, that goal will be reached no later, probably sooner.
Does this mean every biology, chemistry, physics, and engineering research group (I'm talking about grad students and postdocs, here) would have to open their lab notebooks to anyone who asked?
Researchers who ply their trade on the cutting edge of science live in perpetual fear of being "scooped" by another group who publishes their discovery first. These are sometimes literally "races." So now a group at one university could demand access to the notebooks of a group at another university? And vice versa?
Not at all.
It means they have access to each others results and source data when published (once the group is done researching this phase, and is ready to publish). There's no "opening notebooks", simply because that's a terrible metaphor for how data is collected these days.
You only have to publish your data after publishing your article, which means "you won". You don't have to publish data for a research in progress.
We can agree that the whole scientific process does not make much sense if we have to believe in the interpretations without seeing the actual data. From this perspective it is crucial for all scientific data to be open.
The other perspective comes from the individual scientist. It might take years to put together a complete data set of a particular phenomenon via experiments, literature review, digging in the ground or looking at the stars. So after looking for something special you finally discover something new and write a small article about it. This will just be something along the lines of: "hey, there is something interesting going on here." Now you go back and look carefully at all your data for similar events, filter out noise because you have a better idea what to look for and then hopefully publish more about. So the next article will not only contain more information but also some analysis about the possible origins of the phenomenon and so forth.
Imagine you had to open your carefully put together data right the second after you recorded it. Other people might grab your stuff and your research might not even be cited because they just looked at all the steps that you took that were not successful and repeated the experiments or used other available data.
This interest in keeping your data private cannot be avoided with the current system of judging a scientist by his or her publications.
That doesn't matter. The important thing is that the attacks are made. Even if every one is shown to be completely wrong, people will still remember all those (erroneous) anti-global warming reports. Especially since the media will enthusiastically report the initial attack and relegate the news of its rebuttal to a small paragraph on page 34, if they report it at all.
I am more concerned with the time and effort it will take to format data for external users.
An accompanying more detailed methodology will surely have to be provided for the data to be used correctly.
That is indeed an issue. Presumably the methodology is already published, as is the rule for scientific papers. What could happen is that competent scientists have to waste their time debunking incompetent analyses by axe-grinding cranks.
Actually, if the requirement is specified up front as terms for the grant, I'm not opposed to it. I don't think it'll do any good, mind you, as a rule all that's useful is published, and scientists are generally happy to cooperate if you need more, as long as you have honest intent. But the current system is a charter for arseholes using FoI requests to harass scientists.
OK, why does this argument not also apply to teaching? I am paid to teach and do research from the public purse. My teaching is available to any one who meets certain standards and pays a user fee. Access to data should be the same.
But with the data public, it'll be easier to shoot them down for picking, choosing, skewing, and what else.
Not sure what regulations are on "release all data to the public" but seems like there are loopholes big enough to drive a bus through. For instance, in my field, no one but me knows how many cells I looked at. Maybe that thing I said happens in these cells happens in all those cells. Maybe I looked at 300 before seeing one doing what I said, took a picture of that one, and that was that. All my data would be that one cell I cherrypicked.
Even if I did take pictures of all 300, no one knows but me. Those other 299 can dissapear.
If I'm -not- evil though, this could hurt me. If I looked at say 3000 cells, and 10 were doing a thing that I thought was significant, I could have my reasons. Maybe the other 2990 were the wrong cell type or something. Being the expert, that might be obvious to me just from looking at them. A non expert looking at them might not see that. They would just see that out of 3000 cells, I chose the 10 that supported my data. They might call foul without bothering to have me explain myself.
There's no reason the data should be secret, but most data doesn't stand on it's own, and writing up supporting information to -all data gathered- just isn't going to happen.
wants to use data that wasn't used for climate change and models in order to prove that the studies that didn't use them are flawed.
Add to that a reporter who continually overstates anything the climate change denilist say, I'm sure it will confuse even more people.
This should be fun.
The Kruger Dunning explains most post on
You say that until he gets on a major talk show, talks about his improperly interpret results and suddenly 20 million people are parroting his incorrect results.
Suddenly it's not a good thing because those same outlets will not give the same time to actual experts.
The Kruger Dunning explains most post on
Scientists are always concerned when people who have no idea what they are doing try to interpret data. It has nothing to do with being scared.
For example:
Lets say this guy cherry picks some data to support his belief and Opera finds out about his 'findings' and puts him on the air. Suddenly 25 million people who aren't qualified to judge his assessment is not hounding politician over incorrect data.
I just spent about 10 years watch this very thing happen to Vaccines. Some idiots bad study gets on Opera, and a year later people are dying.
It's a real and serious problem, and the people causing it(media) are doing nothing to fix it.
The Kruger Dunning explains most post on
What could happen is that competent scientists have to waste their time debunking incompetent analyses by axe-grinding cranks.
It's much more likely that incompetent scientists will be debunked by more competent analysis, because as soon as there is any controversy regarding a study the scientific community swarms to verify one way or the other.
Also, it's just as important to know what data was disregarded, and why (there are a plethora of valid reasons, but there are even more invalid reasons) as it is to know what was included. The GP's point about the tree ring data that was collected but never used, why wasn't it used? Was it simply because they weren't interested in doing a tree-ring study, and used the data for something else entirely? Or did it make their model not work quite right so they tossed it out? How is anybody to know if they can't look at the data they collected?
Furthermore, if the raw data is not provided, you cannot verify that the models and statistical conclusions are correct. What if there is a problem with the model the researchers were using? Well, if you plug the data into a better model, or even just a different model, you'll see a big difference if one of them is wrong. Climate science relies heavily on computer models, and often multiple researchers will use the exact same model in their study, so it's not hard to get a systemic error across multiple studies.
In other words, how can you verify anybody's science without the original data they observed to begin with? I'm never going to look at this data, I wouldn't have a clue what to do with it, but I know there are a lot of climate researchers who are chomping at the bit to verify these studies.
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
I believe that all public universities (in the US) that cannot prove public money was not used on research, should be required to release the findings/data to the public shortly after it is published. Of course there are exceptions for things involving national security and what not.
I wonder how this conflicts with the laws about Privacy of Data. For example, if a company shares a dataset that contains sensible information with a University (this CAN be done, at least in my country, with a contract. We compromise to safeguard the data and to not violate Privacy laws, consultants also do this everywhere) for the purpose of developing a model or some other application that needs the data. The professor then publishes a paper with the main (non-corporate secret) results, and uses public funds for an undergrad or something. Does this mean the professor can be sued into giving the information away? Doing this clearly violates the laws on privacy, but would conflict directly with the Freedom of Information Act. Compelling with one law contradicts the other... I do not think that this can be upheld in court for EVERY case, instead it would have to be analyzed in a case-by-case basis using (possible costly) lawsuits. Then again, IANAL, so maybe I'm wrong... (Full disclosure: I am a researcher in data mining)
Of course you would. And if you truly did find such a strange sample set, you would document those reasons with just such a sentence. Maybe they WERE the wrong cell type, and in your paper you would be expected to say precisely that. Odds are fairly good you would have a citation concerning why those cells are the wrong type, if not several, since any such assertion that 99.7% of your sample set is junk would be unusual enough to require justification of your methodology. Perhaps no better cell culture or separation method is available. This should be easy enough to document and explain. It took me just one sentence fragment, after all.
Most likely, if there really isn't any better method, there have been multiple papers describing what the limitations are and why, in an effort to formulate a better method. There is probably also active, ongoing research into creating a better method, since any line of inquiry with such poor sources is bound to attract attention. To paraphrase Heinlein: To score an academic coup, find out what everyone agrees is impossible. Then do it.
Yes, you're probably going to experience an uptick in the noise floor if you're a UK researcher. The timecube guys are out there, and audible now. But complying with the UK directive is easy. Provide the data. You're not required to address specific demands of every crank who claims your data proves the existence of aliens. The people who control your tenure/salary/book deals/whatever read your paper, saw your cite, and moved on. The guy worried about aliens doesn't affect much of your life. Just your inbox.
I don't know. The USA (and a lot of other countries) might not be too happy since it means releasing the UK is saying it's OK for these scientists to release the USA's proprietary data. So I guess, you're right in that those jerks like the USA (and a lot of other countries) that wanted to profit from this data will get their comeuppance, but I wonder if we now need to increase taxes in order to pay for these services that used to make a profit. So that means that we all need to pay more money because of this.
I also wonder what it means for the university to release data that is illegal for them to release. I mean, on one side the court says they need to release it, but on the other side other courts say it's illegal to release it. Should be interesting in the UK for a while.
Science makes progress through experiments. You design an experiment; you figure out what measurements you need to make; you make those measurements according to the requirements and specifications of your experiment; what do you need to control for? what calibrations are important? how much data do you need for a statistically significant sample? The answers to all of these questions are different depending on the experiment you want to do. Using data from someone else experiment means you have to go through all of these steps and then try to account for that fact that they way the data were gathered isn't quite right for what you want to do, you need to control for different things than the original experimenters, etc. This takes generally takes expertise in both the original scientific question and the new one. I get enough citations and questions from good-intentioned, responsible astronomers who use our data in published papers in subtly, but significantly, incorrect ways. I try to deal with such occurrences helpfully, but if often takes a long time to guide the interested fellow astronomer through the relevant literature explaining why what they did isn't quite right. When I write about something in a field that's new to me, I'm quite sensitive to this and try to check extensively that I'm not making a classic 1st-year graduate student mistake in that field. Don't even get me started on all of the email I get with re-analyses of our data by retired engineers.
If people cannot replicate your results it isn't science.
And with Climate Science part of the process is showing how you collected and interpreted the data. If you are not willing to share the raw data so other researchers can attempt to replicate your methods and results then don't bother publishing.
MOD PARENT UP!!
The problem that the climate scientists have created for themselves is that they are hiding the data from everyone. Up until a few months ago, these requests were relatively rare. Some of the requesting parties actually have fairly strong credentials. Steve McIntyre may be hated by the folk at realclimate, but he is an IPCC reviewer. To stonewall him is a little different than refusing to provide it to Jenny McCarthy.
As opposed to the proselytizers who are funded by the NGO's and the new "Green" capitalists and rent-seekers.
One of the more interesting bits of the Climategate emails showed that Mann was happy to share his data EXCEPT to people who he thought would disagree with his methods and results.
And in this case Mann was also the recipient of the tree ring data showing that again if you agreed with the owners ideas he had no problem getting you copies of what you needed.
I am a pretty big cynic, and I remain unconvinced that AGW is a significant problem. It doesn't help that the raw data isn't disclosed. I wish scientists would go back to doing science and quit trying to be policy makers.
'Political power grows out of the barrel of a gun.' - Mao Tse-tung
The problem is that we dont apply the same standard to a talk show as we do to a scientific institution.
If a talk show spreads incorrect information absolutely nothing happens, if a scientific institution does the same there will be a royal commission, investigation, scrutiny and even if they are found innocent someone's career is still ruined.
What we need is to get rid of the double standard, lets just say if Box News makes a deliberately misleading statement about the Australian Hoop Snake they should be investigates, charged and the editor, producer and reporter fired and barred from working in the media field again. If we started giving news agencies with the same scrutiny and punishments as universities then the level of misinformation would drop dramatically.
Published scientific reports should also have the data published publicly, however there should be severe punishments for the misuse of this data to spread misinformation and attempts to ruin careers.
Calling someone a "hater" only means you can not rationally rebut their argument.
People don't matter. Science doesn't advance by asking what Aunt Rosie from Ohio thinks about a particular result. Those who matter are scientists, and scientists read peer reviewed journals. Peer review is all about filtering out all those attacks so that nobody who matters needs to read them.
Opening the data up for free access means that other groups, who have more interest in scooping than being right, have more ability to do that scooping. That leaves the people who did the work in the cold.
That is not hard to achieve: someone has to make an FoI request, the cost to prepare the data has to be estimated, someone has to get hired to collect and format the data and then the data is released. That can take a considerable amount of time.....but that's not the only issue. In my field of particle physics raw data is generally useless unless you understand how it was collected and how to analyse it.
Even assuming that you had several petabytes of disk/tape available to store it, raw data from ATLAS would be completely useless to you unless you really understand the detector "warts and all". Trying to understand this data without access to the detector itself and the ability to test and cross-check ideas looking at (and sometimes carefully tweaking) the hardware is literally impossible....and that is before you get into the thorny international issues about who did what and so whether it falls under any one country's laws.
These issues were discussed on a previous experiment I worked on in the US and the conclusion was that it did not serve the public to have data released in just about any form: the raw data was useless and even the processed data still had considerable "quirks" which required understanding (e.g. acceptance drops at detector boundaries etc.). This was aptly demonstrated by a pilot project which resulted in no interest at all from the public but which worryingly attracted a few nutters who were more interested in proving their pet theory than in doing science.
So while I am very sympathetic to the "the public paid for it the public should be able to access it" argument I do not think that the public's interest is best served by releasing raw data in all (most?) cases. The best way to serve the public interest is to ensure that results and ideas arising from that research are freely available to all and allow the public to build on that.
If a lab has been spending my tax money for 10 years, I want my employees to give me my data right Goddamn now. .....if you're taking my money, you work for me.
Just stop and think for a second about exactly what it is that us scientists are being paid to do. We are NOT being paid to collect data we are being paid to figure out how the world works and how to apply that knowledge for the betterment of mankind. The data is an end towards that means.
Now, do you REALLY want us to spend a serious fraction of our time and money preparing and making available the raw data in a form which will probably be useless to you instead of analysing and coming up with results which you are far more likely to find useful? Is that REALLY the best way for us to serve the public interest?
Examples of how this could go horribly wrong immediately come to mind: it could delay finding medical cures as researchers spend time releasing, instead of analysing data, companies could request the data and develop/patent drugs which YOU will then pay through the nose for, nutters will start horribly misrepresenting the data to "prove" their pet theory on warp drive etc. etc. How does any of this serve the public interest?
If you want an even clearer example: taxpayers fund each country's intelligence agencies. So does this mean that since you own all the data every tax payer should be able to request to see it whenever they want? Obviously not because it would not be in YOUR best interest for such data to be public. While the reasons are different the conclusion is the same for scientific data. It may be your data but you are paying us to collect it, analyse it and come up with results which ultimately improve yours, and everyone else's, standard of living.
Unfortunately, Climategate proved that, at least in the field of climate research, "peer review" is worthless; Mann et al were actively conspiring to ensure that only "friendly" eyes carried out the reviews; anyone thought to be showing signs of scepticism were blacklisted, whether individuals or publications.
To add to that, Glaciergate proved that much of what was claimed to be peer-reviewed was actually just regurgitated propaganda, often based on anecdotal evidence (reminisces of mountaineers published in a student rag? Puh-lease!)
So, appeals to authority ("oh but all this research has been peer reviewed") just don't hold any more. Not until all the data and all the methods used to arrive at the results are made available, and the results can be independently confirmed or denied, can we say whether the research was worth the weight of mouldy notebooks it was archived on.
"Life is like a sewer - what you get out of it depends on what you put into it" - Tom Lehrer
Simply generating massive amounts of data isn't considered science - figuring out what it means is. I say this as someone who is very good at generating data quickly, but not particularly good at interpreting it.
Spot on. I have a PhD in Comp. Sci. (Multi-Agent Systems / Market Based Control). One of the things you learn (maybe in you Universitity degree courses or in your first paper presentation) is that data does not mean *anything*, what matters is the interpretation of such data.
Nevertheless, I am of the opinion that programs used for the generation / manipulation of such data should also be free / scrutinable. Specially those developped during the research as they are also being paid by the tax payers money.
In the field I am working now (Agent based computational economics) a lot of people do these so called agent-based simulations, then they write a nice paper about what their simulations showed and try to publish it. The problem is that they keep their code! and in that respect they are deffinitely removing a good chunk of the "methods" part of their research. It is absolutely impossible to duplicate that work without the code.
Ubuntu is an African word meaning 'I can't configure Debian'
That is indeed an issue. Presumably the methodology is already published, as is the rule for scientific papers.
There is at least one case in =two climate research papers where what the methodologies claimed was impossible because the data to do it didn't even exist. This didn't come out for 16 years, and was only discovered because a FOI request was finally honored.
In this case, the authors of the papers had claimed that the station data that they used was from stations that had "few, if any, changes in instrumentation, location or observation times." (quote from one paper) and "selected stations have relatively few, if any, changes in instrumentation, location, or observation times" (quote from the other paper)
"Hey! We only used great data!"
Now, these two authors used the same data, and one of these authors was actually a co-author of the other paper. These authors are Jones (hello climate gate) and Wang.
Now, they finally sourced the data as being from the Chinese Academy of Sciences, which coincedentally had co-published a report with the US Department of Energy at about the same time as those two research papers, stating quite specifically that DATA OF THAT QUALITY DID NOT EXIST. The report was specifically about the quality of the Chinese climate record.
Both papers concluded that the Urban Heat Island effect was minimal. Too bad that they didn't actually have data good enough to draw that conclusion. They said they did, tho.
None of this would have come out if it wasn't for the Freedom of Information Act. Jones and Wang both obstructed the release of the data (denying FOI requests, etc) for nearly 2 decades.
This all came out several years ago, but the media didnt give a fuck. They did care about hacked emails tho. Go figure. Now, as it turns out it probably wasn't Jones who was lying his ass off. Wang was a co-author on Jones's paper and supplied the "data." Jones gets credit for having his email hacked.
"His name was James Damore."
I am the story's submitter. My original submission included a link to the mathematician's web page about this; the page has many more details. There have also been other news stories, e.g. at the BBC.
The UK Freedom of Information Act has exemptions for data that has not yet been used in publications, vexatious requests, etc.
But you haven't given a reason why it's actually bad
It wastes scientists' time that would be better spent analysing the data rather than releasing it, it wastes money collecting and disseminating the data, it pollutes the real scientific results with those of nutters trying to prove their pet theory and, in the case of commercially useful data, it risks having companies use the data to develop something commercially useful that will then be locked away behind patents and the public will be charged through the nose for.
There is also the more subjective, human issue that if you don't let people who have worked like crazy to get the data have at least the first shot at analysis then recruiting scientists is going to become extremely hard and motivating them to perform large-scale experiments will be even harder if they just have to give the data away - why would you bother if you can just sit around and get the data as soon as it is collected?
Is that bad enough? There are ways you could mitigate some of the above but the bottom line is that nothing is free: it will cost more money to make the data publically available and, as a taxpayer myself, I see no real benefit from doing it and some serious potential pitfalls.