UK University Researchers Must Make Data Available
Sara Chan writes "In a landmark ruling, the UK's Information Commissioner's Office has decided that researchers at a university must make all their data available to the public. The decision follows from a three-year battle by mathematician Douglas J. Keenan, who wants the data to do his own analysis on it. The university researchers have had the data for many years, and have published several papers using the data, but had refused to make the data available. The data in this case pertains to global warming, but the decision is believed to apply to any field: scientists at universities, which are all public in the UK, can now not claim data from publicly-funded research as their private property."
There's more at the BBC, at Nature Climate Feedback, and at Keenan's site.
The public pays for gathering the data, the public should have access to that data. Kinda hard to find fault with that.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Why doesn't this apply to the BBC?
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
making publicly funded non-military research has nothing to do with privacy. Public money is spent for the public good and there is no good justifiable reason to keep it hidden from the public... especially if its meant for the betterment of society.
if you want your data to be private, get your own privately funded money
good news for everyone but those jerks who wanted to profit.
Science journals have long fought this, because their profit model is strongest when they own copyright and are the exclusive publishers of a paper. Peer review and scientific principles don't mesh well with peer review though, and many academes have either "published" their papers on their own websites or found other ways to try to work around the journals.
Ridding peer review and science of copyright would be a great improvement.
For every problem, there is at least one solution that is simple, neat, and wrong.
More data in more hands is a good thing. It sounds like this specific case was driven primarily by the nonsensical quackings of a global-warming denialist, but whatever; information is beautiful and the more we share the love the better off we all are.
"Scientists" scared of goofy analysis are priests, not scientists. Take their funding away and use their PhD parchment for toilet paper.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
On the other hand, this will likely produce a whole stream of deliberately inaccurate analyses with ulterior motives behind them.
But with the data public, it'll be easier to shoot them down for picking, choosing, skewing, and what else.
There is no reason why this kind of data should ever be "secret"
Does this mean every biology, chemistry, physics, and engineering research group (I'm talking about grad students and postdocs, here) would have to open their lab notebooks to anyone who asked? Researchers who ply their trade on the cutting edge of science live in perpetual fear of being "scooped" by another group who publishes their discovery first. These are sometimes literally "races." So now a group at one university could demand access to the notebooks of a group at another university? And vice versa?
Phil Willis, a Liberal Democrat MP and chairman of the Science and Technology Select Committee, said that scientists now needed to work on the presumption that if research is publicly funded, the data ought to be made publicly available.
That doesn't seem unreasonable to me. Appendices with raw data are often included already in the online editions of journals. Of course, if the ruling applies to all data generated in the course of a study, whether it is used in publications or not, it could be onerous indeed.
I think this should be the case for all government sponsored research.
This will be a very good thing, as the act of publication relies on condensing/compressing actual data results.
Often things may be left out simply because the author didnt think they were important.
I am more concerned with the time and effort it will take to format data for external users.
An accompanying more detailed methodology will surely have to be provided for the data to be used correctly.
does anyone know if the NSF has similar requirements?
weinersmith
We can agree that the whole scientific process does not make much sense if we have to believe in the interpretations without seeing the actual data. From this perspective it is crucial for all scientific data to be open.
The other perspective comes from the individual scientist. It might take years to put together a complete data set of a particular phenomenon via experiments, literature review, digging in the ground or looking at the stars. So after looking for something special you finally discover something new and write a small article about it. This will just be something along the lines of: "hey, there is something interesting going on here." Now you go back and look carefully at all your data for similar events, filter out noise because you have a better idea what to look for and then hopefully publish more about. So the next article will not only contain more information but also some analysis about the possible origins of the phenomenon and so forth.
Imagine you had to open your carefully put together data right the second after you recorded it. Other people might grab your stuff and your research might not even be cited because they just looked at all the steps that you took that were not successful and repeated the experiments or used other available data.
This interest in keeping your data private cannot be avoided with the current system of judging a scientist by his or her publications.
That doesn't matter. The important thing is that the attacks are made. Even if every one is shown to be completely wrong, people will still remember all those (erroneous) anti-global warming reports. Especially since the media will enthusiastically report the initial attack and relegate the news of its rebuttal to a small paragraph on page 34, if they report it at all.
Yeah, sure, it's perfectly easy for people to shoot down bad analyses (which don't just include incorrect selection of data, but also choosing inappropriate statistical techniques, bad models, etc.).
But the people with the most vested interest in producing deliberately bad analyses are the ones with the loudest megaphones and the greatest access to the press. Scientists all over the world, overwhelmingly, think that there's significant evidence of anthropogenic climate change as the result of our carbon output. Notice, though, that the public at large isn't nearly as convinced - mostly because 1) the science is actually complicated, and 2) because oil companies and their allies have a shitload of money to fund PR campaigns aimed at distracting the issue.
Make the data public, sure. But don't for a second think that you'll get any improvement in the bullshit factor.
Trol modl? Er, what? I don't think so. Note to moderator: "troll" does not mean "I don't agree".
This has nothing to do with journals. The data was not available anywhere - not in a for-pay journal, not on a website, not on request. It was the researchers that refused to release the raw data - the publishers have no motivation to suppress these release, because it is the published paper that earns them money, not the raw data.
The worse it gets, the more me and my colleagues wished we could see the raw data and draw our own conclusions.
I don't know about you, but I'm tired of listening to reports. I'm tired of hearing scientists say this and scientists say that.
Give me some FACTS, and I'll draw my own conclusions. If I'm wrong, you can correct me, and show me where I'm wrong. If you can't do that, I know who is lying.
You can read the publications. Well, the ones that aren't behind journal paywalls, unfortunately, but that's not by the choice of scientists.
The raw data will not help you, but you can judge the validity of the analyses if you like.
So what was your UK tax burden? Can UK taxpayers get US data for free now too?
On the other hand, this will likely produce a whole stream of deliberately inaccurate analyses with ulterior motives behind them.
But with the data public, it'll be easier to shoot them down for picking, choosing, skewing, and what else.
There is no reason why this kind of data should ever be "secret"
Surely it's not hard to see the dynamic that will unfold from this. Yes, the truth is "out there" for better or for worse. However, scientists will have to spend an enormous amount of time and money defending their work against cheap public shots from unqualified critics, instead of a smaller number of competent but dissenting colleagues. That will mean less time for doing research, preparing publications and writing grant proposals.
Consider also that even an expert can misinterpret raw data. Usually it takes an intimate knowledge of how the data was collected, the characteristics of the instruments, etc., in order to interpret it properly. Frequently, the only people with the proper knowledge to interpret raw data are those who collected it, built the instruments used for the measurements, or both. Without some kind of curation, raw data can generate far more "noise" than "signal."
If it weren't for deadlines, nothing would be late.
Funny how nacturation is against sharing when it comes to other intellectual property, but 150% behind this one.
"Scientists" scared of goofy analysis are priests, not scientists. Take their funding away and use their PhD parchment for toilet paper.
Nonsense. I have much better things to do, like reading and posting on slashdot, than respond or deal with every crack pot who has an axe to grind but has no real idea how to do it. Seriously, you want to analyze the data? Go collect it yourself.
Mod parent up!
This is most definately not a troll.
Do you think grad students were collecting data in the field on iPads in the 1980s?
Most of the data is probably in the form of moldy old penciled notebooks, core samples, B&W photo negatives and microscope slides. I hate to break it to you, but you know what, except maybe in physics or electrical engineering, not all experimental data was systematically recorded digitally until 15-20 years ago.
They collated, analyzed their data at the time, published their results in peer reviewed journals, and that was good enough. Now this judgement will require them to waste countless hours digging through old archives, scanning lab notebooks and so on, and for what? For one fucking idiot who apparently didn't even bother to read their latest paper in which they demonstrate that tree ring data for the trees they studied (oak or something) does not correlate with temperature but with summer rainfall.
OK, why does this argument not also apply to teaching? I am paid to teach and do research from the public purse. My teaching is available to any one who meets certain standards and pays a user fee. Access to data should be the same.
Unless you happen to be a scientist in a related field, raw data tends to be next to useless. Anybody can draw pretty graphs in Excel and get worried about a rising trend line, declining trend line or anomalous result but it takes someone who knows what they're talking about to explain what they actually mean.
How many people can read hex if only you and dead people can read hex?
But with the data public, it'll be easier to shoot them down for picking, choosing, skewing, and what else.
Not sure what regulations are on "release all data to the public" but seems like there are loopholes big enough to drive a bus through. For instance, in my field, no one but me knows how many cells I looked at. Maybe that thing I said happens in these cells happens in all those cells. Maybe I looked at 300 before seeing one doing what I said, took a picture of that one, and that was that. All my data would be that one cell I cherrypicked.
Even if I did take pictures of all 300, no one knows but me. Those other 299 can dissapear.
If I'm -not- evil though, this could hurt me. If I looked at say 3000 cells, and 10 were doing a thing that I thought was significant, I could have my reasons. Maybe the other 2990 were the wrong cell type or something. Being the expert, that might be obvious to me just from looking at them. A non expert looking at them might not see that. They would just see that out of 3000 cells, I chose the 10 that supported my data. They might call foul without bothering to have me explain myself.
There's no reason the data should be secret, but most data doesn't stand on it's own, and writing up supporting information to -all data gathered- just isn't going to happen.
wants to use data that wasn't used for climate change and models in order to prove that the studies that didn't use them are flawed.
Add to that a reporter who continually overstates anything the climate change denilist say, I'm sure it will confuse even more people.
This should be fun.
The Kruger Dunning explains most post on
Except they aren't experts at knowing what is picking, choosing and skewing and what is a correct and practical analysis of data. Prepare to see a lot of cherry pickling and so called 'experts' interpreting this data incorrectly. Many of whom won't even know what a P value is.
The Kruger Dunning explains most post on
Scientists are always concerned when people who have no idea what they are doing try to interpret data. It has nothing to do with being scared.
For example:
Lets say this guy cherry picks some data to support his belief and Opera finds out about his 'findings' and puts him on the air. Suddenly 25 million people who aren't qualified to judge his assessment is not hounding politician over incorrect data.
I just spent about 10 years watch this very thing happen to Vaccines. Some idiots bad study gets on Opera, and a year later people are dying.
It's a real and serious problem, and the people causing it(media) are doing nothing to fix it.
The Kruger Dunning explains most post on
Peer review and scientific principles don't mesh well with peer review though [...] Ridding peer review and science of copyright
I assume the first "peer review" was meant to be "copyright"?
Also, I think you can do peer review with restrictive copyrights just fine. It's the whole sharing-of-results that goes away if there's a for-profit journal owning all the results.
Also, the article really isn't about sharing the results themselves, but sharing the data that informs and/or is the foundation for the results. Copyrights really doesn't enter into this question (IANAL TINLA) unless the data is contained within the article. By way of analogy: I can collect data on my computer's fan speed, you can come ask me for that data, and I can say "no way, go 'way"; I don't need copyrights on my data collection to not give it to anyone.
"Scientists" scared of goofy analysis are priests, not scientists. Take their funding away and use their PhD parchment for toilet paper.
Big business has a long history of setting up think tanks and foundations in order to churn out disinformation.
In the past I've brought up the tobacco industry as a prime example of business producing bad science in order to stave off regulation.
Less pernicious, but equally anti-science, are creationists, anti-vaxxers, and those pushing abstinence only.
I disagree with locking the data behind university walls, but it's amazingly naive to think that scientists shouldn't be scared of "goofy analysis".
It only takes a few morons in a hurry to poison what could be an otherwise rational debate.
See: Death Panels
[Fuck Beta]
o0t!
Curiously, the decision was sent rather obscurely... OFFICIAL NOTICE -AYBABTU.
What does a web browser have to do with it?
If one desires real transparency in an experimental procedure, One should also release the code that turns data in to the published experimental findings. The can be "black boxes" but the source that implements the experimental should be public so that the experiments are reproducible.
Ultimately any "computational" experiments results should depend only the data and random seed.
To go even further a autonomous authority (ex a scientific journal) could assign the random seed to any prospective published research and require experimental results derived from published source code, data and the key.
The random seed is applicable to methods that introduce random elements in to calculation which is quite common in our days.
The assignment random seed would prevent an experiment with random elements to be repeated until satisfactory results come out.
It would be quite hard to
There is no law that researchers need to retain their raw field data(?). After I publish my paper, this kind of hassle could be avoided by dragging the 'research data' directory into the rubbish bin on the desktop. Oops, I just dropped those notebooks into the shredder too, they were a fire hazard anyway...
If someone wants to peer-review, why don't they get their own grad students to drill their own tree cores, measure them, and come up with their own conclusion. It's not like the institution is hoarding a Rosetta Stone for themselves. Collective independent research will converge on the the most likely scenario and prove or disprove the controversial hypothesis put forth, and using data from the suspect study would only pollute further endeavors.
peer review is good
The trouble is that I don't consider most deniers of AGW to be peers of those who are doing genuine research. They are more likely to be peers with their friends/sponsors in the oil industry and other big businesses.
Just in case someone wonders, no I don't believe that everyone who does not believe in Climate Change is in the pay of the corporates. There will be some few who have done research and come to different conclusions. They are probably in the minority.
Peer review is productive when someone actually reviews it. That is not what those FOI requests were made for. They are made with the specific aim of disproving a conclusion they don't like. A review is neutral and could end up agreeing with the other person. This is not likely to happen here. They have made up their mind already.
I'll see your Constitution and raise you a Queen.
The worst thing you can do is to hide your data because some fool may make a hash of it. The proper thing to do is, if an erroneous analysis get enough circulation, you point out the error. Chances are, you won't be alone in pointing out and so you probably won't have to bother yourself - more so for more frivolous and trivially inaccurate arguments.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
I believe that all public universities (in the US) that cannot prove public money was not used on research, should be required to release the findings/data to the public shortly after it is published. Of course there are exceptions for things involving national security and what not.
I wonder how this conflicts with the laws about Privacy of Data. For example, if a company shares a dataset that contains sensible information with a University (this CAN be done, at least in my country, with a contract. We compromise to safeguard the data and to not violate Privacy laws, consultants also do this everywhere) for the purpose of developing a model or some other application that needs the data. The professor then publishes a paper with the main (non-corporate secret) results, and uses public funds for an undergrad or something. Does this mean the professor can be sued into giving the information away? Doing this clearly violates the laws on privacy, but would conflict directly with the Freedom of Information Act. Compelling with one law contradicts the other... I do not think that this can be upheld in court for EVERY case, instead it would have to be analyzed in a case-by-case basis using (possible costly) lawsuits. Then again, IANAL, so maybe I'm wrong... (Full disclosure: I am a researcher in data mining)
Fair enough. Just return the public grant money.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
Of course you would. And if you truly did find such a strange sample set, you would document those reasons with just such a sentence. Maybe they WERE the wrong cell type, and in your paper you would be expected to say precisely that. Odds are fairly good you would have a citation concerning why those cells are the wrong type, if not several, since any such assertion that 99.7% of your sample set is junk would be unusual enough to require justification of your methodology. Perhaps no better cell culture or separation method is available. This should be easy enough to document and explain. It took me just one sentence fragment, after all.
Most likely, if there really isn't any better method, there have been multiple papers describing what the limitations are and why, in an effort to formulate a better method. There is probably also active, ongoing research into creating a better method, since any line of inquiry with such poor sources is bound to attract attention. To paraphrase Heinlein: To score an academic coup, find out what everyone agrees is impossible. Then do it.
Yes, you're probably going to experience an uptick in the noise floor if you're a UK researcher. The timecube guys are out there, and audible now. But complying with the UK directive is easy. Provide the data. You're not required to address specific demands of every crank who claims your data proves the existence of aliens. The people who control your tenure/salary/book deals/whatever read your paper, saw your cite, and moved on. The guy worried about aliens doesn't affect much of your life. Just your inbox.
Most ethical standards of scientific societies are quite clear that retaining records and the original data are a vital part of the scientific process.
Studies on private data of people, like medical data, personal data, etc?
The point is that the data presented and even peer reviewed isn't good enough. I've seen lots of errors in published articles, and I would have wanted both code and raw data in order to judge the validity.
How can you even claim the raw data would not help me or others? You are just making this up without any argument.
And open publishing is mostly the choice of scientist.
From Climate Audit: http://climateaudit.org/2010/04/21/mann-of-oak/
Notwithstanding the considered opinion of Baillie and Wilson that oaks are “virtually useless as a temperature proxy” and “dangerous” to use in a temperature reconstruction, no fewer than 119 oak chronologies were used in Mann et al 2008.
Among Mann’s oak chronologies were three Baillie chronologies: brit008 – Lockwood; brit042 – Shanes Castle, Northern Ireland; brit044 – Castle Coole, Northern Ireland.
If people cannot replicate your results it isn't science.
And with Climate Science part of the process is showing how you collected and interpreted the data. If you are not willing to share the raw data so other researchers can attempt to replicate your methods and results then don't bother publishing.
That's charmingly naive or deliberately ignorant of the state of affairs in our modern society.
Only the initial report gets made - that someone's "proven" something or "found a mistake" in published research. When what the person's found is shown to be his own error, even a deliberate distortion, that's seldom reported, and never as widely or clearly.
MOD PARENT UP!!
The problem that the climate scientists have created for themselves is that they are hiding the data from everyone. Up until a few months ago, these requests were relatively rare. Some of the requesting parties actually have fairly strong credentials. Steve McIntyre may be hated by the folk at realclimate, but he is an IPCC reviewer. To stonewall him is a little different than refusing to provide it to Jenny McCarthy.
For many academic scientists (i.e. professors, post-docs, graduate students), a part of their pay is the ability to publish their research findings. It takes long thought and work to devise and carry out experiments which gather pertinent data. It's not unreasonable to allow some time for these scientists to analyze their data and properly understand it.
If you mandate all data be immediately made public, the researcher can be "scooped" by anyone. This is bad for science because it removes the incentive to actually gather the data. This is one argument for why data may be kept internal, at least for a while.
I am a pretty big cynic, and I remain unconvinced that AGW is a significant problem. It doesn't help that the raw data isn't disclosed. I wish scientists would go back to doing science and quit trying to be policy makers.
'Political power grows out of the barrel of a gun.' - Mao Tse-tung
People don't matter. Science doesn't advance by asking what Aunt Rosie from Ohio thinks about a particular result. Those who matter are scientists, and scientists read peer reviewed journals. Peer review is all about filtering out all those attacks so that nobody who matters needs to read them.
Now if only the same rules were applied to the fraudsters who promote evolutionism as it is to the fraudsters who promote global warming. Instead, the evolutionists will just hide behind the same bullshit arguments about the science being "settled" and we will have another 200 years of lies before the real truth comes out.
Why is this marked Troll?? It's a beautiful example of satire, and a perfect argument against climate deniers.
This is a straw man argument. The data has to be public - I don't want the scientific community impose their own version of the nanny state.
Scientists are universally interested in protecting science (disclosure; IAAS). The problem is not in sharing data with other scientists (i.e., those trained in data analysis and objectivity), it's sharing the data with "cynics" who have a conclusion they'd like to cherry-pick supporting data for. It won't pass peer review, but that won't stop an ideologue from posting his "analysis" on the web, etc. and feeding non-objective BS into the policy debate.
An archive of professors 'teaching' would be useless except as a cure for insomnia.
Putting together a proper multimedia course is hard work.
Putting up 3000 bad videos with even worse sound of 99% bad lecturers all teaching the same material differently and in different order is damn near useless.
So build a web sight, convince the profs to all contribute (yeah right), collect data and rank the videos in order of quality and organized by relevance to each other.
The undergrads will still sleep off their hangovers through the original lecture.
You know its true.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Whats the big deal? Publish the data.. so what? Maybe someone will find a few errors and then it can be corrected. IMHO its a great idea... if some climate denier wants to mangle the data they should be free to do so, so what? Its not like their bullshit will be published in scientific journals.... it will die a quiet death in the extremists blogs.
Nothing to be afraid of here.. in fact, these guys should be proud of the work they did and be willing to give to everyone that asks.
"Scientists" scared of goofy analysis are priests, not scientists. Take their funding away and use their PhD parchment for toilet paper.
The issue is that much of the work in science is producing a high quality interpretation of data. That requires knowledge of the field, the methodology used to collect those data, statistics, good judgment, etc. Conversely, it takes nothing to create a poor interpretation of data that is indistinguishable from the real thing. That is what is potentially dangerous.
Of course science invites criticism, but it really only benefits from well informed criticism. And the criticism needs to be subject to criticism. Everything else is just a lot of noise. Better access to data is fine, at face value, but it will only ever really help in a meaningful way when it comes coupled with good critical thinking from all parties involved.
Opening the data up for free access means that other groups, who have more interest in scooping than being right, have more ability to do that scooping. That leaves the people who did the work in the cold.
That is not hard to achieve: someone has to make an FoI request, the cost to prepare the data has to be estimated, someone has to get hired to collect and format the data and then the data is released. That can take a considerable amount of time.....but that's not the only issue. In my field of particle physics raw data is generally useless unless you understand how it was collected and how to analyse it.
Even assuming that you had several petabytes of disk/tape available to store it, raw data from ATLAS would be completely useless to you unless you really understand the detector "warts and all". Trying to understand this data without access to the detector itself and the ability to test and cross-check ideas looking at (and sometimes carefully tweaking) the hardware is literally impossible....and that is before you get into the thorny international issues about who did what and so whether it falls under any one country's laws.
These issues were discussed on a previous experiment I worked on in the US and the conclusion was that it did not serve the public to have data released in just about any form: the raw data was useless and even the processed data still had considerable "quirks" which required understanding (e.g. acceptance drops at detector boundaries etc.). This was aptly demonstrated by a pilot project which resulted in no interest at all from the public but which worryingly attracted a few nutters who were more interested in proving their pet theory than in doing science.
So while I am very sympathetic to the "the public paid for it the public should be able to access it" argument I do not think that the public's interest is best served by releasing raw data in all (most?) cases. The best way to serve the public interest is to ensure that results and ideas arising from that research are freely available to all and allow the public to build on that.
"... it takes nothing to create a poor interpretation of data that is indistinguishable from the real thing."
Good thing you posted as AC, cuz that's one stupid statement - you are implying the science is crap.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
If a lab has been spending my tax money for 10 years, I want my employees to give me my data right Goddamn now. .....if you're taking my money, you work for me.
Just stop and think for a second about exactly what it is that us scientists are being paid to do. We are NOT being paid to collect data we are being paid to figure out how the world works and how to apply that knowledge for the betterment of mankind. The data is an end towards that means.
Now, do you REALLY want us to spend a serious fraction of our time and money preparing and making available the raw data in a form which will probably be useless to you instead of analysing and coming up with results which you are far more likely to find useful? Is that REALLY the best way for us to serve the public interest?
Examples of how this could go horribly wrong immediately come to mind: it could delay finding medical cures as researchers spend time releasing, instead of analysing data, companies could request the data and develop/patent drugs which YOU will then pay through the nose for, nutters will start horribly misrepresenting the data to "prove" their pet theory on warp drive etc. etc. How does any of this serve the public interest?
If you want an even clearer example: taxpayers fund each country's intelligence agencies. So does this mean that since you own all the data every tax payer should be able to request to see it whenever they want? Obviously not because it would not be in YOUR best interest for such data to be public. While the reasons are different the conclusion is the same for scientific data. It may be your data but you are paying us to collect it, analyse it and come up with results which ultimately improve yours, and everyone else's, standard of living.
It is neither the duty nor the right of scientists to be the gatekeepers of public interpretation or opinion.
That is akin to claiming that only ABA lawyers should be permitted access to the law or computer technicians access to benchmarks.
As far as I am concerned the media can stick anyone they like in front of a microphone to speak on any issue they like. It IS the duty of the VIEWER to access the competence of the individual speaking.
If you have a problem with the fact that viewers do not do their due diligence than I suggest supporting efforts to add critical thinking and local the curriculum in our grade schools.
In that case the results will be natural selection at its finest. What is the problem here?
"If I'm -not- evil though, this could hurt me. If I looked at say 3000 cells, and 10 were doing a thing that I thought was significant, I could have my reasons. Maybe the other 2990 were the wrong cell type or something. Being the expert, that might be obvious to me just from looking at them."
And no doubt it will be equally obvious to all your expert colleagues should have to defend your decision. The support for your conclusions would thus be in the data.
If you are basing decisions on things that are not in the data then you're a quack or a fraud and its best it be uncovered anyway.
"You're not required to address specific demands of every crank who claims your data proves the existence of aliens."
This annoys me since intelligent life elsewhere more likely than not exists (regardless of whether or not it mutilates our cows or is physically possible for us to reach).
How about this instead:
"You're not required to address specific demands of every crank who claims your data proves the existence of a deity."
Science journals have long fought this, because their profit model is strongest when they own copyright and are the exclusive publishers of a paper. Peer review and scientific principles don't mesh well....
You are getting somewhat confused: raw data is not the same as published papers. If anything releasing raw data would increase the number of papers being published - although likely at the cost of greatly reducing the signal to noise ratio! Peer review and scientific principles mesh incredibly well. Science is founded on reproducibility so you have to explain your work in a manner that your peers can understand and follow in order for them to be able to reproduce and check your conclusions: without that it is not science. That does not mean that peer review cannot be abused but if you find a rotten apple that does not mean that you should never eat an apple again.
Regarding copyrights you are correct that many of us academics are finding ways to get around the restrictions and/or getting increasingly annoyed with them. However this only applies to published, scientific papers NOT the raw data. So far nobody has come up with a good way to still provide peer review without ending up with a paid journal or one in which you pay to publish (which raises different ethical issues) but I have heard that people are working on developing solutions.
That sort of data is only useful if it is anonymous and double-blind anyway.
So, we're supposed to believe that your selection of 10 supporting cases out of your sample of 3000 is correct just because you're an "expert"? Have you heard of the fallacy of authority?
If you're data doesn't stand on its own (and sure, I'll grant you there might be perfectly reasonable situations where that might be the case) then it is open to criticism, and RIGHTLY SO! Your claim to being an expert doesn't grant you a free pass, sorry to tell you. If that criticism indeed happen then you're free to either a)writing up supporting information to the data UNDER QUESTION, no need to support all data gathered, or b)ignore the criticism, say, on the basis that it comes from non-experts. Your peers will decide if they're OK with what you say, either way. But keeping your government grants without being able to properly justify why it's OK to cherry-pick 10 out of 3000, expert or no expert, ain't gonna happen either ...
The US has had a similar policy for some time. If your research is funded by the NIH (the largest funding source for life science research) you are required to make your published works available freely. Even if you publish in a high-impact expensive journal, your published works must be available freely through the NIH.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
does anyone know if the NSF has similar requirements?
I don't know if NSF does, but NIH definitely does.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Just post the results on the Usenet.
Have gnu, will travel.
A friend had a bitter complaint about the recent United States Supreme Court ruling about dog fighting videos. She thought that any video that depicted cruelty to animals...yada...yada...yada. If people actually looked at what words were spoken and written by the people producing data, this question about public release would go away. It is that people are not able to look at the data that there are requests for it. If professional statisticians are not working on the data, at least let amateurs. I will get a citation (of the global warming statistics questions) if you absolutely are unable to find it yourself on google/yahoo/bing/askoxford.com
Simply generating massive amounts of data isn't considered science - figuring out what it means is. I say this as someone who is very good at generating data quickly, but not particularly good at interpreting it.
Spot on. I have a PhD in Comp. Sci. (Multi-Agent Systems / Market Based Control). One of the things you learn (maybe in you Universitity degree courses or in your first paper presentation) is that data does not mean *anything*, what matters is the interpretation of such data.
Nevertheless, I am of the opinion that programs used for the generation / manipulation of such data should also be free / scrutinable. Specially those developped during the research as they are also being paid by the tax payers money.
In the field I am working now (Agent based computational economics) a lot of people do these so called agent-based simulations, then they write a nice paper about what their simulations showed and try to publish it. The problem is that they keep their code! and in that respect they are deffinitely removing a good chunk of the "methods" part of their research. It is absolutely impossible to duplicate that work without the code.
Ubuntu is an African word meaning 'I can't configure Debian'
Hire people to get the data to you then. I would imagine that most researchers would be fine releasing data if the time and money burden of doing so was lifted from them.
People should note there is a big drive at UK universities to get more outside funding from corporations and other external funding bodies. Only last week I heard that one of our guys was close to clinching a deal worth several million from a multinational company that would employ several new people and research students.
Do you think their data will be made publicly available?
I recently made a grant application to one of the UK research councils (govt funding bodies) and explicitly wrote in that code would be open-sourced and data would be made available. Got the grant :)
I think he meant "indistinguishable to the layperson". That is a real issue. Jenny McCarthy can say all she wants, and to Joe Blow, it is all the same as what scientists say. Scientists know she is full of shit, but people not trained in science might not be able to tell.
There are some studies in which such raw data could be partially patient identifiable - for instance, in rare diseases and small studies.
People should not forget that climate science is an exception. In most sciences, this isn't necessary because experiments are supposed to be replicable by other scientists. If the experiment can't be replicated, the original paper and conclusions are invalid. Publishing the original data really serves little purpose and may actually discourage (necessary) replication of the experiments.
Climate science is a special case because it is based on massive observational data that cannot be replicated and that was derived from tax-payer funded satellites and other data sources.
Really? From his web site "I used to do mathematical research and financial trading on Wall Street and in the City of London; I now study independently"
What could possibly go wrong?
>>...which are all public in the UK
Not so - there's one private university!
@peetm
You know the multi-billion dollar LHC? Guess what they did their first physics on. Not finding new exotic particles
I could do that all day long for $200/hr. I'm not finding new exotic particles right as we speak. And last night I did it in my sleep. No takers??? Okay how about a one off special just for today $50/hr?
These posts express my own personal views, not those of my employer
I am the story's submitter. My original submission included a link to the mathematician's web page about this; the page has many more details. There have also been other news stories, e.g. at the BBC.
The UK Freedom of Information Act has exemptions for data that has not yet been used in publications, vexatious requests, etc.
One place where your black and white view doesn't work is medical research. Much of the data needed for evidence based medicine and other research is full of private patient data.
Many researchers need to access "secret" data under restrictions set by confidential data use agreements, or they cannot do the linked analysis necessary for their research. This data cannot be released to the public without violating the privacy of patients, and the "anonymization" techniques people use to try to remove this privacy aspect also destroy much of the utility of the data for future research. It may allow someone to repeat/verify some analysis results, but they cannot for example validate whether the cleaning and anonymization process was correct or whether it damaged the data in a way that affects all results. They need access to the raw data for full verification, as well as for novel retrospective studies.
To really support the desired scientific process, you need this kind of private data to be put into a sort of trusted escrow. Curated and made available to future researchers, but only under the approval of institute review boards who make sure that appropriate ethical and legal considerations are met. It's not the same as publishing, i.e. releasing to the public.
But you haven't given a reason why it's actually bad
It wastes scientists' time that would be better spent analysing the data rather than releasing it, it wastes money collecting and disseminating the data, it pollutes the real scientific results with those of nutters trying to prove their pet theory and, in the case of commercially useful data, it risks having companies use the data to develop something commercially useful that will then be locked away behind patents and the public will be charged through the nose for.
There is also the more subjective, human issue that if you don't let people who have worked like crazy to get the data have at least the first shot at analysis then recruiting scientists is going to become extremely hard and motivating them to perform large-scale experiments will be even harder if they just have to give the data away - why would you bother if you can just sit around and get the data as soon as it is collected?
Is that bad enough? There are ways you could mitigate some of the above but the bottom line is that nothing is free: it will cost more money to make the data publically available and, as a taxpayer myself, I see no real benefit from doing it and some serious potential pitfalls.
The only thing that is proven is that denialists are funded by the oil industry and the same "libertarians" funding the teabaggers, such as the bootstrappy Koch heirs.
At the end of the day, the opinion of the vast majority of scientists, and in paricular that of almost all climatologists, has not changed. A few years ago most national academies of sciences issued a joint statement supporting the IPCC, each with the overwhelming approval of their members. Has any of them gone back?
No, it's just fucking criminal PR bullshit, the same kind that was used to justify the Iraq war or implement liberticide "anti-terrorist" policies.
Two things:
1. Great, I want all the data on how to make an atom bomb provided in a neat easy to use format.
2. I accept that the viewer is ultimately responsible for their own due diligence, but why am I paying for a newspaper if the media aren't being held to reasonable standards of diligence?
Conservation of angular momentum makes the world go round.
How come this isn't the case for open source? And there's an awful lot more data than source code, and that data needs some serious maths to work with, therefore the number of people who CAN look at the data is a far, far, FAR smaller section of the people who have the data available than have the Linux source code.
Yet the less useful data is a great potential for ALL OF US, yet source code is worthless.
Yeah, right.
Except in the case of MMR/autism, it wasn't just some crank that started things using data. It was a doctor writing a peer-reviewed paper for The Lancet. It was actually the credentials of a real doctor that lent the sceptics case so much weight.
Much drama ensued.
Lacking <sarcasm> tags,
"1. Great, I want all the data on how to make an atom bomb provided in a neat easy to use format. "
Your point being? That comment doesn't relate to anything I said.
"2. I accept that the viewer is ultimately responsible for their own due diligence, but why am I paying for a newspaper if the media aren't being held to reasonable standards of diligence? "
I don't know. Do you read The Weekly World News? Deciding which news sources have such standards is part of the due dilligence of the consumer.
the time it would take to gather the data and put it on disk for him. The question I have is if this data has been used in climate models, it's already gathered and consolidated. Why not just run a copy of the data for him, which was used to do the climate models? That couldn't take longer than an hour to put on a dvd, since it's already sitting on a hard drive somewhere. It has to be or they couldn't have used it for the climate modeling.
I smell a rat... Why don't they want anyone from the outside to verify their conclusions? It would only validate their findings and conclusions if someone else were to run the numbers. Why would they not want anyone else to have the data? It's in their best interest, if they are honest, to give out as many copies as they can. In fact they should have a zip file sitting on their web server for anyone and everyone to download it so they can get back to their research and not be bothered by data requests.
Don't kid yourself. It's the size of the regexp AND how you use it that counts.
Data is not a set of instructions, so your request makes no sense.
So, we're supposed to believe that your selection of 10 supporting cases out of your sample of 3000 is correct just because you're an "expert"?
More or less. I was the one capturing the data. If I rejected the other 2990, I probably had a reason, one that may not be obvious to someone not manning the microscope or someone who has not stared at millions of those cells for hours at a time.
If I were to publish a paper saying "These 10 cells out of 3000 are the only ones that matter" then I'd have to support that claim. If, on the other hand, I'm publishing a paper that doesn't explicitly claim that those 10 out of 3000 cells are the only ones relevant, then I'm not going to explicitly show why those other 2990 cells aren't relevant. The data is incomplete without supporting figures showing what was obvious to me.
To make it a little more concrete, say I'm saying "Cells condense their DNA before dividing" and looked at cancer cells in a dish (all vertebrate cells I'm famililar with do in fact condense their DNA before dividing). I look at the 3000 cells and see that only 100 are dividing, based on what the cells look like. I'm not going to prove that the other 2900 are not dividing, it should be obvious to anyone familiar with that cell line. Of the 100 that are dividing, lets say 50 aren't marked properly with the signal I'm looking at, could be a DNA marker, so they get thrown out. Of the 50 that are dividing and are marked properly, maybe 30 are on top of one another preventing good analysis, and 10 of the ones remaining are cells that look unhealthy, like they're dying. That leaves 10 cells out of 3000.
If you're not a cell biologist, most of that wouldn't be obvious to you, some of those criteria wouldn't even be obvious to you unless you were a cell biologist familiar with that specific cell line.
To be convinced that the rejections were appropriate, either you could become an expert in that cell line, which I find unlikely, or I could write up 5 or 6 supporting figures demonstrating that non-dividing cells looked like this etc. That's a lot of wasted time that I'd rather be wasting on slashdot.
Just like the author denialists, you're just parroting talking points. Of course you don't care what national academies of sciences think, you only care what your masters think. Atta boy.
Any way, I was responding to the assertion that "Climategate" has proved anything. It hasn't.
We expect you folks to spend some time thinking up a way so that you don't spend any time at all on "preparing" the supposedly "raw" data _and_ still make it available to the desirous public. Like you know putting up a file on a website with some footnotes. I hear universities have some websites.
Congratulations - you have identified two media by which the information could be disseminated. Now, would you like to explain how it is possible to write the documentation, in a form that the general public can understand, without taking any time? ...and before you claim that we scientists should just find a way please remember that we are scientists, not magicians, and so are limited by physical reality. I could also point out that time spent designing any such system would also be time NOT spent doing science which is what we are actually getting paid to do.
"This is motion tracking data, format is X Y Z, its from ..."
You seem to have the impression that science is just like the physics you may have done at school. The experiment I work on has 1MB data per event using a complex, compressed data format and even then you need access to a parameter database to know the configuration of the detector given any particular data run. A couple of comments at the top of the file, or even a document describing the format is not enough. You have to understand the detector and have access to the database. It would require CONSIDERABLE effort to make this even accessible to the general public....but I tell you what if you have the tens of petabytes of data storage needed and the budget to fund the required development we'll give it a try.