Freeing and Forgetting Data With Science Commons

Again with the IP by Anonymous Coward · 2009-02-20 15:04 · Score: 1, Insightful

Einstein said "If I have seen farther than most it is because I have stood on the shoulders of giants."
Where does that begin to apply in a society of lawyers, profiteers, and billion dollar industries based on exploiting shortsighted IP management?

Re:Again with the IP by Anonymous Coward · 2009-02-20 16:43 · Score: 1, Informative

;Your faith in wikipedia is misplaced; it was both, actually.
Perhaps Sir I.N. was the first, so you do earn the proverbial "first quote"
Re:Again with the IP by wisty · 2009-02-20 17:54 · Score: 3, Insightful

There is a rumor that Newton meant it as an insult to Hooke. Newton had refined DesCarte's wave theory, while Hooke had backed the corpuscul theory. Also, Hooke was a short man.
Re:Again with the IP by HadouKen24 · 2009-02-20 20:42 · Score: 1

Both Newton and Descartes were corpuscularians, actually.
Re:Again with the IP by Hognoxious · 2009-02-21 02:32 · Score: 1

Also, Hooke was a short man.
Now I see how he discovered his law - hanging weights on his feet to try and get taller.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

I don't know! by blue+l0g1c · 2009-02-20 15:05 · Score: 2, Insightful

I was reading through the summary quickly and almost had a panic attack at the deluge of questions at the end. We get the point already!

Re:I don't know! by TapeCutter · 2009-02-20 16:07 · Score: 0

While I sympathise with any effort to make scientific data more accesible, the deluge of questions were long ago answered by the philosophy and method of science, ie: train oneself to think critically.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

1000 by Anonymous Coward · 2009-02-20 15:06 · Score: 0

Exactly 1000 more bytes? Wow!

Re:1000 by Anonymous Coward · 2009-02-20 17:45 · Score: 0

What's so amazing about 0.9765625 kilobytes?
Re:1000 by Anonymous Coward · 2009-02-21 03:57 · Score: 0

ITYM kibibyte.

Does it really matter? by Anonymous Coward · 2009-02-20 15:08 · Score: 0, Flamebait

After all, we are a nation of cowards.

What's most important to keep. by MoellerPlesset2 · 2009-02-20 15:11 · Score: 2, Insightful

What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.

It's an important and distinctive feature of Science that results are reproducible.

Re:What's most important to keep. by Anonymous Coward · 2009-02-20 15:13 · Score: 2, Insightful

How can the results be reproducible if you don't keep the original data?
Re:What's most important to keep. by Al+Kossow · 2009-02-20 15:17 · Score: 1

What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.
It's an important and distinctive feature of Science that results are reproducible.
At what cost? Would you suggest discarding the data sets of nuclear bomb detonations since they are easily reproduced? How about other data sets that may need to be reinterpreted because of errors in the original processing?
Re:What's most important to keep. by MoellerPlesset2 · 2009-02-20 15:27 · Score: 4, Insightful

How can the results be reproducible if you don't keep the original data?
The relevant results are supposed to be included in the paper, as well as the information necessary to reproduce the work. Most data doesn't fall into that category.

To make an analogy the computer geeks here can relate to: All you need to reproduce the output of a program is the source code and parameters. You don't need the executable, the program's debug log, the compilers object files, etc, etc.

The point is you want to reproduce the general result. You don't usually want to reproduce the exact same experiment with the exact same conditions. Supposedly you already know what happens then.
Re:What's most important to keep. by repepo · 2009-02-20 15:28 · Score: 3, Interesting

It is a basic assumption in science that given some set of conditions (or causes) you get the same effect. For this to happen it is important to properly record how to setup the conditions. This is the kind of things that scientific papers describe (in principle at least!).
Re:What's most important to keep. by MoellerPlesset2 · 2009-02-20 15:39 · Score: 2, Interesting

At what cost? Would you suggest discarding the data sets of nuclear bomb detonations since they are easily reproduced?
Nobody said results are easily reproduced. But a-bomb tests are hardly representative of the vast majority of scientific results out there.

How about other data sets that may need to be reinterpreted because of errors in the original processing?
That's a scenario that only applies when the test is difficult to reproduce, and the results are limited by processing power rather than measurement accuracy. That's a relatively unusual scenario, since, first: Most experiments are easier to reproduce than that second: methods and measurements improve over time. The much more common scenario is that it's more efficient to simply re-do the experiment with modern equipment and get both more accurate measurements as well as better processing.
Re:What's most important to keep. by mako1138 · 2009-02-20 15:47 · Score: 5, Insightful

Let's say the LHC publishes its analysis, and then throws away the data. What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?
For a billion-dollar experiment like the LHC, that dataset is the prize. The dataset is the whole reason the LHC was built. Physicists will be combing the data for rare events and odd occurrences, many years down the road.
Re:What's most important to keep. by Sanat · 2009-02-20 16:13 · Score: 0, Offtopic

Mod up this important position please.

--
And in the end, the love you take is equal to the love you make
Re:What's most important to keep. by MoellerPlesset2 · 2009-02-20 16:14 · Score: 2, Insightful

Let's say the LHC publishes its analysis [..]
Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule.
First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility. They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
Second: Primary data, actual measurement results, are already kept, as a rule.
Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.
Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.

What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?
Truth is, you'd still have to rebuild the LHC then, because you didn't test your 'corrected' assumption against the actual machine to show that your 'corrected' results are valid. Until the actual experiment is re-done it'll remain an unanswered question.
Re:What's most important to keep. by TapeCutter · 2009-02-20 16:20 · Score: 2, Interesting

"You don't usually want to reproduce the exact same experiment with the exact same conditions."

That's right I want an independent "someone else" to do that in order to make my original result more robust. If I were an acedemic I would rely on post-grads to take up that challenge, if they find a discrepency all the better since you now have another question! To continue your software development analogy - you don't want the developer to be the ONLY tester.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:What's most important to keep. by Mr+Z · 2009-02-20 16:24 · Score: 5, Interesting

With a large and expensive dataset that can be mined many ways, yes, it makes sense to keep the raw data. This is actually pretty similar to the raw datasets that various online providers have published over the years for researchers to datamine. (AOL and Netflix come to mind.) Those data sets are large and hard to reproduce, and lend themselves to multiple experiments.
But, there are other experiments where the experiment is larger than the data, and so keeping the raw data isn't quite so important as documenting the technique and conclusions. The Michelson-Morley interferometer experiments (to detect the 'ether'), the Millikan oil-drop experiment (which demonstrated quantized charges)... for both of these the experiment and technique were larger than the data, so the data collected doesn't matter so much.
Thus, there's no simple "one size fits all" answer.
When it comes to these ginormous data sets that were collected in the absence of any particular experiment or as the side effect of some experiment, their continued existence and maintenance is predicated on future parties constructing and executing experiments against the data. This is where your LHC comment fits.

--
Program Intellivision!
Re:What's most important to keep. by oneiros27 · 2009-02-20 16:52 · Score: 4, Insightful

Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule. First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility. They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.

There are two types of science. What you're referring to is called 'Little Science' (not to be derogatory), but it's the type of thing that a small lab can do, with a reasonable amount of funding. And then there's what we call "Big Science" like the LHC, Hubble Space Telecope, Arecibo Observatory, Large Synoptic Space Telescope, etc.

Second: Primary data, actual measurement results, are already kept, as a rule.

I wish. Well, okay, it might be kept, but the question is by who, and have they put it somewhere that people can analyze it?
I was at the AGU last year, and there was someone from a solar observatory that I wasn't familiar with. As I do work for the Virtual Solar Observatory, I asked them if we could put up a web service to connect their repository to our federated search. They told me there was no repository for the observatory -- the data walks out the door with whoever the observer was.
Then there's the issue of trying to to tell from the published research exactly what the original data was. But then, I've been harping on the need for data citation for years now ... it's an issue that's starting to get noticed.

Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.

For the type of data that I deal with, none of it is technically reproducible, because it's observations, not experiments. And that's precisely why it's important to save the data.

Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.

In your field, maybe. But we have folks who try to design systems to predict when events are going to happen and need training data. Others do long-term statistical analysis with years or decades of data at a time. Still others find a strange feature that hadn't previously been identified as important (eg, coronal dimmings) and want to go back through all of the data to try to identify other occurrences.

--
Build it, and they will come^Hplain.
Re:What's most important to keep. by Anonymous Coward · 2009-02-21 00:45 · Score: 0

Being that the LHC records on the order of Terabytes per second, it doesn't under any circumstance have a means of retaining all the data.
Re:What's most important to keep. by Rockoon · 2009-02-21 01:19 · Score: 2, Informative

On the subject of reproducibility, I am reminded of a situation with Wei-Chyung Wang, a climate scientist.

He was involved in the paper Jones et al (1990), which is where the situation begins.

After *17 YEARS* of requests, Jones FINALLY released some of the data used in Jones 1990 through demands under the terms of the U.K. Freedom of Information policy on publicly funded research.

Wang himself is free from FOI requests because Wang is an American and operates in America, where FOI requests regarding publicaly funded studies have no legal weight.

The result of the eventual discloser of Jones, is that several researches have concluded that Wang fabricated research steps. That some of the steps could not have been performed, then or even now, and that for many of the climate stations used in his work the existing station histories directly contradict Wang's stated assessments about his data set.

Specifically he claimed that only a few of these recording stations had been moved during the time-frame significant to the research, and that they were free from significant urbanization changes (the research was to measure the "Urban Heat Island" (UHI) Effect.) In short, Wang claimed that the stations histories showed that they were largely "homogeneous."

According to the DOE CAS study, in regards to the quality of Wang's other station data, "details regarding instrumentation, collection methods, changes in station location or observing times are not known." The CAS bills itself as the most comprehensive history of Chinese climate available to date. Note that Wang actualy cited the CAS as one of the sources for his data.

Essentialy both Wang et al 1990 and Jones et al 1990 were fradulent pieces of work that was never independently verified, and could not have been verified given both the straight out fraud and the failure to disclose the data set used.

(Jones denies knowledge of Wangs fabrication of data.)

Sparked by this controversy, new research specifically addressing the UHI based on the Chinese climate record paints an entirely different picture with regards to China, that the effect is in fact much more significant that concluded by Jones et al 1990.

FULL DATA DISCLOSER IS NEEDED.

This is especialy true in some areas of science, where all the big players not only know each other, BUT WORK, PUBLISH, AND PEER REVIEW TOGETHER.

One specific small group of people is directly influencing global policies regarding climate change through their direct involvement with the IPCC, all the while hiding their own work and obstructing validation of their work.

--
"His name was James Damore."
Re:What's most important to keep. by jschen · 2009-02-21 06:04 · Score: 2, Informative

How can the results be reproducible if you don't keep the original data?
As others noted, there are cases where raw data is king, and others where raw data is virtually useless. LHC raw data will be invaluable. Raw data from genetic sequencing is a waste of time to keep. Why store huge graphics files when the only thing we will ever want from them is the sequence of a few letters? One must be able to distinguish between these two possibilities (and more subtle, less black and white cases, too), and there is no one size fits all solution.
That said, you may be surprised how well really valuable data is stored by good principal investigators. I recently helped my PI re-digitize a prized result from 1988 (showing the first example of a synthetic enediyne compound cleaving DNA). The journal did not do a good job of scanning it, and it therefore was hard to interpret in the printed journal. So we dug up the original raw data (the original UV photograph of the DNA gel showing this result), which had been carefully filed away in our offsite storage location all these years, and re-digitized the image for a recent review article.
Re:What's most important to keep. by digitalunity · 2009-02-21 08:17 · Score: 1

Maybe you haven't noticed, but quantum mechanics seems to indicate there is not always one outcome for one set of conditions. This works on the macro scale, but not necessarily always on the subatomic level.

--
You can't legislate goodness. Let each to his own destiny, by will of his freely made choices.
Re:What's most important to keep. by Patch86 · 2009-02-21 10:04 · Score: 2, Interesting

5 Insightful?
Seriously, read the OP again.
"What's most important to keep is quite simple and obvious really: The results. The published papers, etc."
He never suggested you throw out the results. No-one is going to throw out the results. Why would anybody throw out the results? Whichever body owns the equipment is bound to keep the results indefinitely, any papers they publish will include the results data (and be kept by the publishers), and copies will end up in all manner of libraries and file servers, duplicated all over the place.
The most important things to keep from any experiment is 1) the results (no point in doing it if you don't keep the results) and 2) the methodology (if they don't know how you got the data, it's worthless). What you could throw away without too much harm is the analysis and interpretations, since you can always reanalyze and reinterpret (and any interpretations made now may prove wrong in the future anyhow). Even then, anything interesting is likely to be kept in the grand scheme of things anyway.
The place which TFA is actually talking about is less dramatic, lower budget science. Its still important (it's the bread and butter of science and technology), but will be found in the vaults of far fewer publishers, libraries and web servers. And it's lower budget science where it's far easier to reproduce results, as in GP.
Re:What's most important to keep. by mako1138 · 2009-02-21 11:06 · Score: 2, Insightful

You seem to be using "results" in a wider sense than "published papers". Yes, nobody is going to throw out papers. But the raw data from instruments? It is not clear whether those will be kept.
You say that the analysis and interpretations can be thrown out, but those portions are precisely what go into published papers. And for small-scale science, it makes little sense to throw away anything at all.
Re:What's most important to keep. by mako1138 · 2009-02-21 11:36 · Score: 2, Informative

Let's say the LHC publishes its analysis [..]
Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule.
First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility.
They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
I admit that I jumped on the LHC as an extreme example. But even in an "ordinary" lab these days, you'll find some specialized and complex equipment. This is true for the cutting edge of any field.

Second: Primary data, actual measurement results, are already kept, as a rule.

As oneiros27 notes, this is not guaranteed, either by design or circumstance.

Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.
Not sure what kind of point you're trying to make here.

Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.
It's not necessarily a matter of re-interpreting existing results. You may be adding an old dataset to a new dataset, and finding new results in the combined set, or finding a glimmer of something new in an old dataset. Even for "small" experiments, having somebody else's raw dataset can make your life a lot easier.

What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?
Truth is, you'd still have to rebuild the LHC then, because you didn't test your 'corrected' assumption against the actual machine to show that your 'corrected' results are valid. Until the actual experiment is re-done it'll remain an unanswered question.
No, I am talking strictly about analysis. For example, the use of neural networks in particle/track finding has recently met greater acceptance in in High Energy Physics. But what happens if, a few years down the road, evidence turns up that neural networks are fundamentally flawed? If you have kept the data, you can re-run the analysis with different methods. If you have thrown out the data, it's time to build a new LHC.
Granted, High Energy Physics, with its requirements for large datasets in order to find extremely rare processes, is perhaps the only branch of science to require so much data. In HEP, we want to keep as much as possible, but there are realistic limits. In other fields, since there are no difficulties, why not keep everything?
Re:What's most important to keep. by mako1138 · 2009-02-21 11:55 · Score: 2, Insightful

I agree that there is no simple answer, but I am uneasy with your "experiment is larger than the data" concept. Today we think of the Michelson-Morley and Millikan experiments as canonical and definitive investigations in Physics. But we do not often remember that each was preceded by a string of less-successful experiments, and followed by confirmations. It the accumulation of a body of data that leads to the gradual acceptance of a physical concept.
See chart:
http://en.wikipedia.org/wiki/Michelson-Morley_experiment#The_most_famous_failed_experiment

Comment removed by account_deleted · 2009-02-20 15:15 · Score: 4, Informative

Comment removed based on user account deletion

eh by Anonymous Coward · 2009-02-20 15:23 · Score: 1, Informative

That's not true. Any tax funded study requires more documentation and publication then a private one. Anyone who reads them knows.
All studies worth anything are aimed at a audience proficient in the subject, they are not meant for general audiences, and are often proven wrong, you need repeatable results.

And the scientists goes mooo! by Ostracus · 2009-02-20 15:35 · Score: 1

"And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them. "

I predict the dumbing down of science.

--
Shai Schticks:"You don't make peace with friends, you make peace with enemies"

Re:And the scientists goes mooo! by Vectronic · 2009-02-20 15:46 · Score: 2, Interesting

Although likely, not necessarily...
I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.
There are geniuses of all sorts, someone might be completely lost trying to understand it linguistically, but may find a fault in it instantly visually, or audibly.
However that is somewhat redundant, as the original (as it is now) can be converted into that by people, but a mandate saying it must contain X, Y and Z, will open it up to more people, quicker.
Re:And the scientists goes mooo! by wisty · 2009-02-20 17:57 · Score: 2, Insightful

Why should science be more complex than necessary? For every String Theory area (where complexity is unavoidable) there are plenty of theories like economics, which just rely on weird jargon to fence out the interlopers.
Re:And the scientists goes mooo! by Fallingcow · 2009-02-20 18:24 · Score: 2, Interesting

I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.
Don't count on that being at all helpful.
Take the math articles on Wikipedia: I can read one about a topic I already understand and have no idea what the hell their talking about in entire sections. It's 100% useless for learning new material in that field, even if it's not far beyond your current level of understanding. Good luck if you start on an article far down a branch of mathematics--assuming they bother to tell you the source of the notation in that article, it'll take you a half-dozen more articles to find anything that sort-of translates some of it for you.
Some sort of mouseover tool-tip hint thing or a simple glossary is all I ask, but I think the people writing that stuff don't even realize how opaque it is to people who majored in something other than math.
Re:And the scientists goes mooo! by Repossessed · 2009-02-21 01:06 · Score: 1

Or the scientists just stop writing in third person passive, and start writing in a manner people outside of the scientific community are used to. Though I think the summary refers more to trying to extract data you do understand from complicated papers that talk a lot about things you neither understand nor care about.

--
Liberte, Egalite, Fraternite (TM)

What? Nobody has ever read... by NotQuiteReal · 2009-02-20 15:39 · Score: 1

Has nobody ever read The tragedy of the commons?

However, in the case of the non-physical, I guess noone can "waste" or "steal" it, only copy and use.

--
This issue is a bit more complicated than you think.

Re:What? Nobody has ever read... by Anonymous Coward · 2009-02-20 16:05 · Score: 1, Funny

Has nobody ever read The tragedy of the commons?
Nope, can't afford the fees.
Re:What? Nobody has ever read... by Fallingcow · 2009-02-20 16:40 · Score: 1

I'm quite familiar with it, and I'm not seeing the connection.
Help?
Re:What? Nobody has ever read... by TapeCutter · 2009-02-20 17:20 · Score: 1

I'm not sure what your point is but I don't see libraries turning to dust because nobody cares.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:What? Nobody has ever read... by drinkypoo · 2009-02-21 02:51 · Score: 1

I'm not sure what your point is but I don't see libraries turning to dust because nobody cares.
Libraries are closing because nobody cares.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:What? Nobody has ever read... by TapeCutter · 2009-02-21 03:40 · Score: 1

Cite please.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:What? Nobody has ever read... by drinkypoo · 2009-02-21 04:08 · Score: 1

They saved Salinas libraries, but look into the story: Since 2002, cuts in library funding have approached $100 million around the country, with more than 2,100 jobs eliminated and 31 libraries closed, according to the American Library Assn.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:What? Nobody has ever read... by TapeCutter · 2009-02-21 13:18 · Score: 0, Offtopic

Well that really is a shame for the US, glad I don't live there.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

One format to gouvern them all by ProfMobius · 2009-02-20 16:23 · Score: 1

I have been waiting all my life to see how my simulation data would look in Excel. And everyone is supposed to have it (damn you linux users ! damn you people with not enough money to buy it !).

On a more serious note, a common ground for data format would be nice. You already have some generic formats, like HDF5 and other, but i must admit right now, it is a bit of a jungle in the astrophysic department, and it is not going to change anytime soon (unless someone make a awesome generic, one-fit-all library in... Fortran77...).

--
EULA : By reading the above message, you agree that I now own your soul.

Re:One format to gouvern them all by Logic+Worshiper · 2009-02-20 18:03 · Score: 1

Linux reads Excel files in open office, so .xls is pretty universal; one can also install crossover office or use wine to install office (though I don't know how well that works).

What's the goal, really? by Rostin · 2009-02-20 16:24 · Score: 4, Insightful

I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?

It reminds me of the XKCD this morning...

Re:What's the goal, really? by onionlee · 2009-02-20 16:56 · Score: 2, Interesting

agreed. most sciences that have been around for a long time and have developed their own specializations within them, such as physics, have specific journals that target their "demographics" (such as the journal of applied physics a, b, c, d, letters). anything outside of those journals most likely have been rejected by those journals and are irrelevant. furthermore, the relatively young sciences such as linguistics use (what i personally think is lame) a system of keywords so that anyone can easily find articles that theyre interested in. truly, i have yet to find any researcher who has complained about this "problem".
Re:What's the goal, really? by TapeCutter · 2009-02-20 17:06 · Score: 1, Insightful

"I'm a working scientist (ok, PhD student), so I read journal articles pretty often."

And how would you read them if your institution did not foot the bill for subscriptions?

"In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists."

When you amalgamate "almost all cases" you end up with "almost all publications". The rest of your post smacks of elitisim, trivializes scientific curiosity and completely ignores the social and scientific impact of radical improvements in communicating knowledge.

I would have thought working scientists would actually be proud of their work and want to diseminate it to the largest audience possible but in your case I'm obviously mistaken.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:What's the goal, really? by robbyjo · 2009-02-20 17:43 · Score: 1

To be honest, if your institution does not foot the bill for subscription, try inter-library loans. That's easy. Most credible institutions in the US do have some subscription for more mainstream journals. Unless you're in third world countries.
The problem with scientific publication is that you need to be terse. They're limited to 8-12 pages. If you are required to spend time for background knowledge for the uninitiated, you'll produce a 1000 page book instead. Moreover, the reviewers will think that you spend your time too much on things that are assumed to be familiar to the intended audience.
Face it: The knowledge we have so far is the agglomeration of previous knowledge. Those who are in the cutting-edge are expected to know the background already. Try explaining measure theory to high school students that has no idea what calculus is. And then if your research has anything to do with measure theory, your result is pretty much unreachable to those students. Let alone any data that correspond to that research.
Scientists want to have their work known. But they don't have the patience and 5 years to explain to aspiring noobs. Sorry. They have a lot more research to do. If you want to know the research, do your homework and study the subject carefully for a few years. Then you'll appreciate whatever data or paper the scientists are publishing.

--

--
Error 500: Internal sig error
Re:What's the goal, really? by Beetle+B. · 2009-02-20 18:13 · Score: 3, Insightful

Typical comments from someone in the first world.
First, just on the side, I know lots of people who got PhD's but did not really stay in research and academia. They still want to read papers, though, as they still maintain an interest.
But the main benefit of opening up journal papers is for the rest of the world to benefit. Yes, if you have a very narrow perspective, you could just dismiss that as charity. If you're open minded, you'll realize that shutting out most of the world to scientific output means much less science globally, and much less benefits to you as a result.
Imagine if all researchers in Japan published papers only in Japanese, and the journals had a copyright condition that prevented the content from ever being translated to another language, and you'll see what I mean. Whereas current journals require a lot of money for access, these ones also have a price: Just learn Japanese. It's not exactly promoting science.
Then again, of course, journals do need a base amount of money to operate. Just that Elsevier kind of companies charge so much more than is needed to make a profit.

--
Beetle B.
Re:What's the goal, really? by martin-boundary · 2009-02-20 18:49 · Score: 1

The real problem is hoarding knowledge, which over time leads to elitism, then guilds, and finally priesthoods. The fix is literally trivial: open access to electronic publications for everybody, ie bypass all the elaborate subscription checks. This isn't rocket science. The only thing stopping it from happening are greedy publishing companies who like the status quo.
You're right that every single modern scientific publication has a very small intended readership, yet the argument for opening up everything is the same as the argument for having well stocked libraries.
Consider a library which monitors you whenever you enter it, and prevents you from reading or perusing any and all books, except for exactly five books in your immediate area of expertise which you are allowed to borrow or read, say. That is pretty much the current situation with electronic archives of scientific journals. If you don't see anything fundamentally wrong with that, then you've wasted your life as a student.
Re:What's the goal, really? by Grym · 2009-02-20 18:52 · Score: 2, Interesting

In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?
I'm a poor medical student, but a medical student with--quite frequently--interdisciplinary ideas. I can't tell you the number of times I have been interested in pursuing a subject for independent research and have been stymied or effectively stopped in my tracks because of my lack of ability to pay or lack of online access to experimental data and results. You might think that modern science is highly specialized and, for most bleeding-edge topics, you're probably right. In these cases, the affected researchers can all afford the one or two subscriptions they need to stay up to date. However, in overlapping areas, non-specialists (or specialists in other fields) might have have a unique perspective and possibly insightful findings to add. What harm could be done by letting them take a look?
Take for example one of my hair-brain ideas. There is a disease called Pellagra, which is caused by diets deficient in certain amino acids. These amino acids are lacking in corn. In the United States, corn is, by far, the largest cash crop. Now, diets in the U.S. are varied enough to where modern Americans do not get Pellagra, but this isn't the case in developing nations, where Pellagra can sadly be endemic. So, my idea is this: why not introduce conservative substitutions into the genetic sequence of the gene encoding the major structural protein of corn ( zein ) in such a way as to make corn a (more) complete amino acid food source? By doing this, you'd be turning one of the world's most abundant and cheap foodstuffs into an effective cure for a common, debilitating disease.
Now, to me, as an outsider to Agriculture, this seems like a rather basic idea. I was convinced that someone had to have tried something similar to this. But you'd be surprised. I have yet to find a single paper that has ever attempted such a thing. Almost all of them focus on crop yields or the use of zein in commercial products. Now, maybe (for reasons unbeknown to me) my idea is untenable such that, people in the field have never given it a thought. But what if that isn't the case? What if the leaders of the field (or at least the emergent behavior of the scientists and scientific institutions) is pushing so hard in one direction that an obvious area for research or advancement was overlooked? Let's hope it's not the latter...
Regardless it's a travesty how petty scientific institutions are in this regard considering how often they talk to the public about high-minded ideals when extolling the virtues of public funding of Science. This information should be available to all: specialists and non-specialists alike.
-Grym
P.S. Oh yeah, and in case, any of you were wondering. Somebody already patented the general idea described in my post. So don't get any wild ideas about trying to use it to help the poor, now! (/facepalm)
Re:What's the goal, really? by Logic+Worshiper · 2009-02-20 18:58 · Score: 1

I know people who'd have an easier time learning Japanese than C++, does that mean we should write computer code in English? The same thing applies to science. Non-technical science isn't science, so when scientists publish something for each other to read, they publish it in their language. There are people who translate it back to English, such as teachers and writers, and those of us who don't have the background to compile science code in our minds need to find the binary version or learn the language of science, not complain we don't understand code.
Re:What's the goal, really? by TapeCutter · 2009-02-20 19:19 · Score: 1

"If you are required..."

I don't think anyone in TFA is seriously suggesting that hand holding noobs be a requirement for publication and this is probably where the confusion sets in. I also understand that you may want to keep your own data close to your chest until you have extracted a paper out of it (ie: publish or perish).

"To be honest, if your institution does not foot the bill for subscription, try inter-library loans...[snip]...The problem with scientific publication is that you need to be terse. They're limited to 8-12 pages."

Einstein managed to get away with three elegant pages and zero refrences, chasing down the english translation in that link took a couple of minutes. I'm interested in quality not quantity, I would be delighted with the 8-12 pages at my finger-tips because like most educated laymen I do not have "too much time on my hands". The internet and afformentioned lack of time is the reason I have not set foot in a library for almost a decade and the last time I studied/taught at a tertiary institution was quite possibly before you were born...

"If you want to know the research, do your homework and study the subject carefully for a few years. Then you'll appreciate whatever data or paper the scientists are publishing.

Precisely why I chose to use the folk at realclimate as an example, following the science for 25+yrs does not make me a climatologist but it has given me a deep understanding of what they are banging on about.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:What's the goal, really? by Anonymous Coward · 2009-02-20 22:12 · Score: 0

Go and read about what is happening in the bio + semantic web communities.
IE: http://www.slideshare.net/fbelleau/bio2rdf-presentation-at-www2007-hcls-workshop2
It is a real problem (too much data in too many formats); and the foundation technologies to solve it already exist - there is just a truckload of modelling to do and tool building to happen now.
This is just like open source - I personally don't know a thing about the hardware driver for my network card; but if the code is open, someone else can come along and look at it from the outside; maybe submit a patch or two; or point out interesting things I did not notice about it.
Re:What's the goal, really? by lbbros · 2009-02-20 23:04 · Score: 1

An example would be if the data could be re-analyzed or reviewed when new methods for looking at it or simply to try out new stuff. I work with high-throughput data (DNA microarrays) and about half of my work is applying my ideas to data that others have published, to validate an approach in an independent data set.
Some fields require access to the data more than others. In the case I'm talking about, you should take a look at the MIAME (Minimal Information About a Microarray Experiment) checklist published by the MGED society, and at the letter MGED sent to Science ("Standards for microarray data" by Ball et al., no link provided as you need a subscription to read it...) to urge adoption of such modus operandi.

--
A CC-licensed illustrated horror novel
Re:What's the goal, really? by Anonymous Coward · 2009-02-20 23:05 · Score: 0

I don't even know where to start.. your comment is just ignorant and assumes that everyone that wants to access scientific publications/knowledge is in academia or is supported (in some way) by an institution. Of course, if you are not in academia (and in a first world country), you don't deserve knowledge. *rolls eyes*
You seem to be the prototypical "ivory tower scientist" who doesn't care/understand much of science outside your field and so you assume that others are the same.
Yes, knowledge is built upon previous knowledge and "most" scientific fields will seem esoteric to an outsider, but just because you "have no patience and 5 years to explain to aspiring noobs" doesn't mean that they can't understand it better than you do (given enough time and information).
It would be nice if information that is often (almost always) charged to the taxpayer would be freely acessible to those who want it (regardless of their status in society).
Oh, and of course, this is all hypothetical stuff.. we don't really want _YOUR_ publications in Science Commons. prick.
Re:What's the goal, really? by smallfries · 2009-02-20 23:22 · Score: 3, Informative

Trickle-down. Dissemination of knowledge.
You don't know it yet (not meant as a jibe but it is something that clicks in after your PhD) but your primary function as a scientist is not to make discoveries. It is spreading knowledge. Sometimes that dissemination will occur in a narrow pool, through journal papers between specialists in that narrow pool of talent.
This is not the primary goal of science, although it can seem like it when you are slogging away at learning your first specialisation well enough to get your doctorate. Occasionally a wave from that little pool will splash over the side - maybe someone will write a literature review that is read by a specialist in another field. A new idea will be found - after all sometimes we know the result before we know the context that it will be applied to.
The pools get bigger as you move further downstream. Journal articles pass into conference publications, then into workshops. Less detail but carried through a wider audience. Then after a time, when the surface seems to have become still textbooks are written and the knowledge is passed on to another generation. We tend to stick around and help them find the experience to use it as well. This is why all PhD students have an advisor to point out the best swimming areas.
That was the long detailed answer to your question. The simple version is that you don't know who your target audience is yet. And limiting it to people in institutions that pay enormous access fees every year is not science. As a data-point - a lot of European institutes don't bother with IEEE fees. They run to about Â£50k/year which simply isn't worth it. As a consequence results published in IEEE venues are cited less in Europe. So even amongst the elite access walls have an effect.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:What's the goal, really? by Hognoxious · 2009-02-21 02:47 · Score: 1

just because you "have no patience and 5 years to explain to aspiring noobs" doesn't mean that they can't understand it better than you do (given enough time and information).
So your average high school student can understand almost any science paper, if you just wait for him to get a degree, PhD and ten years postdoctoral experience in the relevant field?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re:What's the goal, really? by robbyjo · 2009-02-21 06:37 · Score: 1

Einstein managed to get away with three elegant pages and zero refrences
Science has evolved much from 1905. Even with his zero references, he's still implicitly citing the results of Lorentz. By today's standard, no citation like that is unacceptable.
Let me ask you this: Can you honestly ask a high school student or a freshman to understand even that paper without grasping the concepts of differential equation (DE)? They can't. Sure, you can understand the motivation and introduction of that paper, just like those of typical scientific papers. But when you start to delve into the formulas, i.e. what the "meat" is all about, you suddenly need to know everything involved much beyond words explained by the scientists. I have no backgrounds in physics and I can't even follow the derivations of the formula in section I.3 of that paper although I know DE. In other words, I'm lost at section I.3 and I cannot see how Einstein arrived at his conclusions. Maybe a little knowledge of physics will help. There are some baseline knowledge you'd expect your audience to know. You can't explain everything.
Let's face it: English is an ambiguous medium of transfer for scientific knowledge. Mathematical formulas are far more succinct and far less ambiguous. If you think you can sidestep the formula part of the paper, you're dreaming. You might be better off reading popular science magazines.
The folks at RealClimate are just commenting on their results, not real papers. This sort of writing is more of popsci magazine style. They're glossing over way much on how they arrive at their results. To some degree, it's useful. But to me, the gory details is of more importance because only then can I know their assumptions and theoretical limitations on the underlying assumptions or formulas and how to further advance the knowledge or make the estimates more precise.
Scientists do not take other scientists' words at face values and neither should laymen. Given the climate crisis pro-contra, I think reading just the research comments will add to the confusion to the public minds. Or worse, creating camps. We don't want that to happen. So, I think it's wise for the public to read far beyond pop sci writings.

--

--
Error 500: Internal sig error
Re:What's the goal, really? by radtea · 2009-02-21 07:25 · Score: 1

Could someone explain to me why this is a real problem
I'm a physicist who runs a business that amongst other things does data analysis in the life sciences, mostly genomics. In this area data collection is relatively expensive (hundreds or thousands of dollars per sample) and disease states are relatively generic--follicular lymphoma is pretty much the same regardless of whether you are in Kansas or Karachi.
I recently invented a new algorithm for combing gene expression data for patterns of expression that distinguish sample classes. There are two ways to get this algorithm applied to more data: one is to develop an application and hope that a few thousand researchers will bother downloading it and using it on their data. The other is for me to go out and find published datasets and apply the algorithm to them myself.
I'm pursuing both paths, but having seen how hard it is to get people to adopt new software, even stuff that's dead easy to use (which my application is). Furthermore, even with really good software, having an expert look at the data carefully is highly desirable. We have yet to find a way to fully automate good judgement, and a certain amount of judgement is required in any hard analysis problem.
So I'm figuring that the most use I'll get out of this algorithm it is applying it to other people's data myself. I've already done this on a public schizophrenia dataset with some success, although I'm still trying to figure out what to do with the results (as a scientist I'd like to see them used for the betterment of humanity, as a businessperson I'd like to somehow get paid at least a little for my contribution.)
The widespread publication of well-curated datasets is absolutely vital to getting the most value out of the considerable amount of money spent on collecting data of this kind. How we deal with rewarding the various contributors to any commercially useful discoveries that result is an ongoing problem that can be dealt with on a case-by-case basis for now.

--
Blasphemy is a human right. Blasphemophobia kills.
Re:What's the goal, really? by ceoyoyo · 2009-02-21 07:27 · Score: 1

Sucks to live in the developing world and be told that if you want to publish your results it's $1000 a paper.
Re:What's the goal, really? by Midnight+Thunder · 2009-02-21 07:47 · Score: 1

To be honest, if your institution does not foot the bill for subscription, try inter-library loans. That's easy. Most credible institutions in the US do have some subscription for more mainstream journals. Unless you're in third world countries.
Anything that complicates the retrieval of knowledge ends up reducing access to that knowledge. Why should someone have to put up with manual process, when we have this things called the internet. The internet is designed to facilitate access of knowledge, so it is the tool of choice.
While some readers of papers may not understand the content fully, it is sometimes enough to start the quest of understanding. Science suffers from a lack of people entering the field, so anything that can make it easier to access knowledge makes the idea of entering less daunting. In many ways this can be seen as part of the PR process.
The other way of approaching the issue, is simply asking why journals should be the only ones allowed to publish the information? They aren't paying anyone for the content, yet they are requiring a monopoly of the publishing of the given paper.
Journals have long had the role of being the only providers of papers to the community and see their position in jeopardy. I think this new source of competition is the chance for them to see where they can matter. In my opinion journals can matter by being the filter, where the 'best' papers are published.

--
Jumpstart the tartan drive.
Re:What's the goal, really? by Scrameustache · 2009-02-21 09:07 · Score: 1

I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?
Replace "scientific data" with "satellite imagery".
There's nothing to gain by letting anyone look at it? Only highly trained experts can decipher it?
People have found hidden forests, ancient ruins, and a few meteor impacts. You don't know what's to find in the data until you let people look.

--
You can't take the sky from me...
Re:What's the goal, really? by Beetle+B. · 2009-02-21 11:25 · Score: 1

Find me an open access journal that does not lower the price for people in the developing world.

--
Beetle B.
Re:What's the goal, really? by robbyjo · 2009-02-21 12:38 · Score: 1

Anything that complicates the retrieval of knowledge ends up reducing access to that knowledge. Why should someone have to put up with manual process, when we have this things called the internet. The internet is designed to facilitate access of knowledge, so it is the tool of choice.
Yes, and there are open-access journals already. Guess what? The scientists (i.e. the paper authors) are required to pay much more for the open access. Heck, they're required to pay for non-open journals as well. Don't believe me? Ask your fellow scientists. Some scientists simply wise up and not pay the extra charges and still fulfill the publish or perish call.
While some readers of papers may not understand the content fully, it is sometimes enough to start the quest of understanding.
That's the task of textbooks, popsci magazines, universities, or even wiki/encyclopedias. In general, papers are to communicate novel results among scientists of the field, not newcomers.
Science suffers from a lack of people entering the field, so anything that can make it easier to access knowledge makes the idea of entering less daunting. In many ways this can be seen as part of the PR process.
Anything to make it easier? I think there's no shortcuts in science, much like there's no shortcuts in computer programming. Shortcuts make bad scientists just like shortcuts make bad programmers. PR process doesn't do any good. Aren't you afraid of quantum physics scientists that don't know squat about calculus? You should, just like you would OS programmers who don't know squat about subroutines. Getting papers is the least daunting task for a budding scientist (unless perhaps he/she has a phobia to libraries). The most daunting task is typically the math and finding a suitable mentor.
The other way of approaching the issue, is simply asking why journals should be the only ones allowed to publish the information? They aren't paying anyone for the content, yet they are requiring a monopoly of the publishing of the given paper.
Journal double-dips. They charge the scientists who author the papers and their customer who read the papers. I agree that there should be a free no-charge journal should be formed. But somebody needs to pay the bills.
Establishing a journal is tedious and very involved (get the scientists to do peer review, get the papers edited, published, build reputation, get $$$, etc). I can't foresee how we could solve this problem. It's a chicken-and-egg problem. Scientists won't volunteer doing review for no-name journals. But journals' reputation is built upon quality publications, which are highly peer-reviewed. Scientists won't publish their quality works to no-name journal either. After all, scientists need to get tenures, right? Tenures are evaluated upon how many publications are published in famous journals. I think that having "a competition" won't solve the problem much. But some journals already open-access their paper collections. I think it's a matter of time that open access will be a norm.

--

--
Error 500: Internal sig error
Re:What's the goal, really? by TapeCutter · 2009-02-21 13:16 · Score: 1

"Science has evolved much from 1905."

Document procedures have evolved (precicely what TFA is banging on about), the philosophy and methodology of science are pretty much the same, no?

"Let me ask you this: Can you honestly ask a high school student or a freshman to understand ..."

I could but as you say they may have diffuculty understanding. More puzzling is why are you asking me? - I'm 50 and I am talking about myself and other educated laymen (particularly those in the less developed countries), why you are focusing on kids that are younger than my own adult childeren? - because it makes a good strawman?

"The folks at RealClimate are just commenting on their results, not real papers."

Speaking of a lack of background understanding, yes the site is commentry but look deeper and unlike the mass-media opinion columns you will find pointers to the original papers splattered all over it. Climate related papers are amoungst the simplest to find on the web BECAUSE OF THE PUBLIC INTEREST, physics and math are up there too but that's got more to do with the common-sense and intelligence of the community surrounding those subjects. I just happend to become intrested in climate before it became popular, ditto with computers where it lead to my CS and OR qualifications and then on to a good living OUTSIDE the ivory towers where papers can cost an arm or a leg (ie: an un-natural barrier to self-education).

"Given the climate crisis pro-contra, I think reading just the research comments will add to the confusion to the public minds. Or worse, creating camps. We don't want that to happen. So, I think it's wise for the public to read far beyond pop sci writings."

Huh? - The first part of that sounds like the objections of a pre-Guttenburg scribe, the second part contradicts it.

"Scientists do not take other scientists' words at face values and neither should laymen."

Thank you captain obvious. You have spent the entire post arguing against things I did not say, and views I do not hold.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
Re:What's the goal, really? by cmaley · 2009-02-21 15:17 · Score: 1

Actually, the few times I know of that a good data set was put up on the web, it generated a lot of research and progress. I'm thinking of Pat Brown putting up some of the first data on gene expression arrays. Probably hundreds of people worked on that data - everything from statistical methods, to reverse engineering the gene network. It was great. This is probably most valuable when the data is from a new type of experiment that is likely to be widely used.
I hope to do something similar but there is a big problem for geneticists like me. If you post your volunteer's genetic data on the web, there is no way to anonymize it. It would be a simple thing for a medical insurance company to take a cheek swab, run the genetics and then match it against all public datasets to see if an applicant has a known disease. I know of patients that have lost their medical insurance because their insurer found out that they were participating in a research study, and inferred (incorrectly) that the patient had a disease.

--
- living sig-free and proud
Re:What's the goal, really? by robbyjo · 2009-02-21 20:23 · Score: 1

The original post made a point that "In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists." I completely agree with him. Public mostly has no use to any of such data unless they know how to process the data and all the rationale behind them (which implies that they must know all the underlying scientific process). I agree to that as well. However, you stressed the communication issue to the uninitiated--which I think is misleading. And that accounts to the data? If it were only main results and summaries, I would heartily agree. But the data? No! The public doesn't know anything about the process behind them. They don't care about the data in general.
I agree that the results have to be communicated to general public, but that's not the primary goal of typical scientific papers: It's to inform other scientists in the field. For general public, there are popsci magazines, textbooks, and universities. And that has NOTHING to do with the data.
physics and math are up there too but that's got more to do with the common-sense and intelligence of the community surrounding those subjects
Yet there are lots of pro contras in that issue alone even with many PhDs with lots of brain powers are on both sides. If you don't follow the math and understand the assumption to the gory details, it's virtually impossible to decide which analysis is valid and which is not. You can use your so-called common-sense and intelligence. After all, this is our environment and we've got to do something regardless of the analysis of global warming, right? Now, you can safely chuck all valid analyses that belong to the "other camp" and subscribe into whatever analyses of "your camp". Presto, problem solved, right? What I know is each side has their valid points, but I'm not qualified to judge them because I don't know the gory details.
If only the results or summaries are of importance to you, you don't need any access to those papers (and thus saving $$$) and subscribe to your favorite popsci magazines (cost much less, thus invalidate your ivory-tower $$$ claim). While the papers discuss how the data was gathered, processed, and transformed into the result, numerous subtleties on the assumptions, and inherent limitations on the employed methods, they are of no use to the general public. Without knowing them, however, the understanding of the result would not be complete and might be misleading. If you only read the scientists' conclusions / results, like general public would, you're essentially taking their words at face value. This is dangerous since I've seen too many occassions that even so-called seasoned scientists are not aware of the subtleties of the methods they're using and thus misinterpreting their reports (yes, that's despite the peer review).
While I agree that reading the original paper is very important to make an informed decision, it's inaccessible to general public anyway due to the sheer amount of required background knowledge. Even the so-called simple three-page paper of Einstein that you've linked actually fail to provide any sufficient elucidation for the general public. I doubt that any general public would benefit from open access to any actual research paper, let alone the data. So, your accusation of elitism is completely unfounded. It's not elitism. It's the way the science is done.
I would urge the public to educate themselves far beyond popsci writings so that they make an informed decision, but that's not for everyone. Those who are really interested should devote their time to study the subject and only then they're worthy of the access of the data.

--

--
Error 500: Internal sig error
Re:What's the goal, really? by TapeCutter · 2009-02-22 00:35 · Score: 1

"worthy of the data" - Thank you for confirming my suspicions of elitisim or is it just plain arrogance? Either way the rest of the post that precedes your conclusion of who is "worthy" reads as an attempt to define what others should or should not be interested in. If you don't want to take part in open access then fine, nobody is forcing you to do so. Please do not obstruct the efforts of others just because it does not fit your worldview as this would imply you are not only elitist but also a control freak.

I also suspect your a "climate skeptic", this is fine by me as long as your arguments are intellectually honest and backed by hard evidence (ie: genuine scientific skepticisim). If you have such arguments I would love to hear them but please don't link to papers that I have to pay for out of my own pocket!

Finally I ask you again to stop putting words in my mouth to build your strawman. Eg: I did not say Einstein's paper was simple I said it was elegant, nor did I claim it provided "sufficient elucidation", AFAIK Phd's are still arguing over it's implications.

--
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.

The value of Data by a-zA-Z0-9$_.+!*'(),x · 2009-02-20 16:25 · Score: 1

It would be good to share, for a paper, both all the data as well as the conclusions on the web. Then the reasoning is easier to check. More importantly, other people can query the data differently to produce other conclusions without special requests.
"Reproduceability" doesn't mean using the same data, it means using the same procedures on different data and seeing if the same conclusion is reached.
If data were commonly provided as experiments are done, this would be of value to others even if the experimenters' paper(s) came later. In other words, as well as the market for articles, a market for data!

--
Epitaph: At last! Root access!

Re:The value of Data by Logic+Worshiper · 2009-02-20 18:48 · Score: 1

The data is reviewed by people qualified to review the data; that's what peer reviewed journals are for.
Re:The value of Data by Rockoon · 2009-02-21 19:29 · Score: 1

Peer review doesnt require reviewing the data. In most cases, peer review simply has a peer signing off as if to say "yeah, if he went through the steps he claims to have gone through, then his conclusion is probably reasonable."

Do you HONESTLY think that publications such as Nature and Science have teams of people sifting over supplied data?

Look into what these publications require of the researchers sometime. They do not require the data. They instead require that the data be made available upon request, that in practice their peer review processes never requests the data, and that the editors of these publications ignore complaints from 3rd parties about the data not being made available when requesting it. They only care if the peer reviewer himself got refused the data (but as noted, they do not in practice, request it.)

Remember, the publisher WANTS the article published because it BENEFITS from publishing it.
Remember, the peer reviewer often knows the author personally.

The peer review process does not protect against fraud and incompetance. That is why we need to adopt a policy that the data should be publicly available BEFORE publishing. That a new class of "peer review", perhaps called "open peer review", be adopted. The weight of an "open peer reviewed" study should be greater than that of a contradicting "peer reviewed" study.

Too many times it has come to be known that if the data was publicly available, errors in a study would have been noticed much easlier. I recall a certain problem with the NOAA climate data set overseen by Hansen, having a "Y2K" bug, but because they kept the raw data a secret and only published "filtered" and "adjusted" data, it took *8* years to notice that the "filtering" and "adjusting" software had such a boneheaded bug in it. That when an independent researching finally discerned the serious problem with the data set, the NOAA had to retract many public claims they had made about what year was "hottest" over the last 100 years (not 1998 after all.. more like 1934)

I realize that most of the problems with the peer review process that I know about are climate science related, but thats thanks in part to the diligence and openness of a group of people often called "oil company shills" by the very people they are exposing. That these "shills" use open methodology all the way through is quite possibly the reason that they are so damn good at finding the problems with all the "research" in climate science.

--
"His name was James Damore."
Re:The value of Data by Logic+Worshiper · 2009-02-24 19:59 · Score: 1

Publications like Nature and Science are not peer reviewed journals.

Is storage an issue? by blue+l0g1c · 2009-02-20 16:34 · Score: 2, Interesting

Data storage is something we've gotten very good at and we've made it very cheap. A Petabyte a day is not as staggering as it was even five years ago.

Re:Is storage an issue? by DerekLyons · 2009-02-20 17:17 · Score: 1, Interesting

Not as staggering as it was five years ago only means it is not as staggering as five years - not that it still isn't staggering. Especially when you consider a petabyte a day means 36.5 exabytes a year.
Re:Is storage an issue? by dkf · 2009-02-20 20:34 · Score: 1

Data storage is something we've gotten very good at and we've made it very cheap. A Petabyte a day is not as staggering as it was even five years ago.
It still has to be paid for. It still has to be actually stored. It still has to be backed up. It still has to be kept in formats that we can actually read. It still has to have knowledge about what it all means maintained. In short, it still has to be curated, kept in an online museum collection if you will. And this all costs, both in money and effort by knowledgeable people.
The problem doesn't stop with copying the data to a disk array.

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Is storage an issue? by Anonymous Coward · 2009-02-21 03:50 · Score: 0

Do you mean 365 petabytes a year?
Re:Is storage an issue? by blue+l0g1c · 2009-02-21 09:52 · Score: 1

36.5 exabytes should be more than enough for anybody.

not results- grant dollars by SuperBanana · 2009-02-20 17:26 · Score: 1, Insightful

The results. The published papers, etc. It's an important and distinctive feature of Science that results are reproducible.

Having worked around academic groups that do medical research for three years now, I can tell you that is absolutely not what drives research.

Researchers will love to tell you about how it is the quest for knowledge and other pie-in-the-sky ideals, but when it comes down to it- it's mostly about making a living (or more than a living), and fame/prestige.

See, journals have what's called an "impact factor." An impact factor is how many times an article in a particular journal ends up being cited by other papers. In one lab I worked at, it was closely tracked who was published where, and how many times.

At the end of the year, when it came time to decide who went and who stayed, the scores were lined up and however many people needed to go came from the bottom. The top ones get a little closer to becoming a PI (Principle Investigator, aka someone who has postdocs and grad students working for them.)

PIs, all the people you read about in the paper- they survived the process, but they're now nothing more than management. They don't do lab work, they don't do research. They solicit ideas from their postdocs, put the final polish on a grant proposal the postdoc slaved over, and get big fat checks from NIH for millions of dollars. The PIs then pass the work down to postdocs, who dole it out to grad students. The grad students do it because a PhD is dangled in front of them while they run on the treadmill of endless, monotonous, repetitive lab work and analysis work. The postdocs do it because faculty positions and PI slots are dangled in front of them.

The problem with "the system" is that nobody is rewarded for reaching that brass ring. Just like Ford has no incentive to build a very durable car (no service/parts sales after the vehicle hits the end of the warranty, and the market quickly becomes saturated) researchers have no incentive to completely solve issues facing us today; their incentive is to come close enough to say "aha, look, we did find SOMETHING, so your grant money wasn't wasted."

What incentive does a massive industry have to solve cancer, when it would put them out of business? Tens of thousands of people have dedicated most of their adult lives, usually to studying specific mechanisms and biological functions so narrow that if cancer were cured tomorrow, they would be useless- their training and knowledge is so focused, so narrow- they cannot compete with the existing population of researchers in other biomedical fields. Journals which charge big bucks for subscriptions also would be useless. Billions of dollars of materials, equipment, supplies, chemicals- gone. "Centers", hospitals, colleges, universities which each rake in hundreds of millions of dollars in private, government, and non-profit sourced money would be useless.

--
Please help metamoderate.

Re:not results- grant dollars by smallfries · 2009-02-20 23:09 · Score: 4, Insightful

What incentive does a massive industry have to solve cancer, when it would put them out of business? Tens of thousands of people have dedicated most of their adult lives, usually to studying specific mechanisms and biological functions so narrow that if cancer were cured tomorrow, they would be useless- their training and knowledge is so focused, so narrow- they cannot compete with the existing population of researchers in other biomedical fields. Journals which charge big bucks for subscriptions also would be useless. Billions of dollars of materials, equipment, supplies, chemicals- gone. "Centers", hospitals, colleges, universities which each rake in hundreds of millions of dollars in private, government, and non-profit sourced money would be useless.
That's an old argument and although it sounds reasonable it is completely unsound. An industry does not function as a single cohesive entity with wants and desires. It is composed of many different individuals with their own wants and desires.
I know enough academics to say for certain that if any one of those individuals could discover a cure that would put their entire employer out of business then they would leap at the chance. The fame that would follow would make another job easy enough to get, and the recognition is what they're really in it for anyway.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:not results- grant dollars by Bowling+Moses · 2009-02-21 06:46 · Score: 2, Informative

I've been doing research in the biological sciences for 12 years now, including some work that was at least tangentially related to human health. I am not in it for the paycheck--if that's all I wanted, my friends and I joke that we'd go to KFC School of Business Management and be assistant managers at fast food restaurants making more than we do in science. I, and the majority of the people I know, don't want to be professors either. It's extremely rare for a professor to actually do any lab work themselves, but if you ask they'll tell you they miss it. Besides there are 300 people applying for each professorship at a decent university. Then if you are unlucky enough to get the job, you have to successfully fight in a viciously competitive funding environment to get tenure and not lose your mind or your liver in the process. It's actually hard enough to keep a job in academic science, period. My boss and I are applying for grants. Hers are in part to keep my position funded, she's got one out and is writing a second. I've got one out, and am applying for two or possibly three more. Contrary to what you wrote, my grants are largely my ideas and my writing, and should I get funded is my money, not the boss's. However science funding is so obscenely bad (most grants have ~5% success rate, the best one I'm applying for has ~25%) that I'm also going to look for a new job, with the boss's full knowledge and support, even though we'd both very much like me to stick around for another couple years and get our proposed butt kicking science done.

So why do it if there's nothing but nonstop stress, Burger King assistant manager pay, and institutionalized job insecurity? I get to solve problems. I get to figure things out. I get to do things (sometimes, not often, but sometimes) that nobody has ever done before, see things nobody else has ever seen before. Work in a small way on projects that could impact millions of people's lives. I'll never be famous, which is fine with me. I'll never be rich, which, well, I can tolerate. I might not ever have job security...which okay, I'll admit is seriously grinding down my enthusiasm and idealism. But the things I've gotten to do--even paid a pittance to do--I wouldn't trade. Catching jellyfish off the docks in Oregon. Turned loose on a billion dollar synchrotron, unsupervised at 3 am to understand how an enzyme known to be a virulence factor in several diseases functions at an atomic level. Making radioactively labeled mosquitoes to understand lipid trafficking, working with cell culture (It's a cell from an insect's midgut...that under laboratory conditions can endlessly propagate itself. How cool! And here's my what I'm going to do with it...), genetically engineering fluorescent organisms, using high-throughput screening to find new drug lead compounds. A lot of hard work, but sometimes that's damn good fun. Plus along the way you get to understand phenomena on a level that most people don't even know exists. I'm of course not claiming god-king knowledge here, but I could spend a long time talking about the terrible beauty of host:pathogen and vector:pathogen relationships for example, or protein structure, or anything else I've studied a while, just like any other scientist. That's fun too, although not cool in most of society. But my mom still thinks I'm cool. Ok, no, she doesn't.

If you expect to get rich and famous doing science, no wonder your post seems bitter. It isn't going to happen and isn't a right reason to do science in the first place. Those pie-in-the-sky ideals are.
Re:not results- grant dollars by cmaley · 2009-02-21 15:08 · Score: 1

I'm a cancer researcher and I agree. Though I'm in it more for the good of society and because it is an engaging problem. I would jump at the chance to cure cancer even if it put my institution out of business and I didn't get the recognition. The reality (of this fantasy) is that most institutions and researchers could easily move on to other diseases/problems. We do it all the time.
In addition, there is BIG money to be made from a drug that cures cancer. Even the ones that cure a small percent of cancer can make Gigabucks these days. This is why big pharma really does try to find new cancer drugs.

--
- living sig-free and proud

Hello? Is there a fucking editor in the house ... by Potor · 2009-02-20 17:32 · Score: 0

Wilbanks also points of that as the volume of data grows from new projects...

I'm sorry, but that makes no sense. 'Points of'???? Come on.

Science is hard - news at 11 by jstott · 2009-02-20 17:36 · Score: 2, Insightful

And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them.

I know that this is a real shock to you humanities majors, but science is hard. And yes, for the record, I do have degrees in both [physics and philosophy, or will as of this May — and the physics was by far the harder of the two].

Here's another shocker. If you think the papers are hard to read, you should see the amount of work that went into processing the data until it's ready to be written up in an academic journal. Ol' Tom Edison wasn't joking when he said its "1% inspiration and 99% perspiration." If you think seeing the raw data is going to magically make everything clear, well, I'm sorry, the real world just doesn't work that way. Finally, if you think professional scientists are going to trust random data they downloaded off the web of unknown provenance, well, I'm sorry but that isn't going to happen either. I spend enough time fixing my own problems; I certainly don't have time to waste fixing other peoples' data for them.

-JS

--
Vanity of vanities, all is vanity...

Re:Science is hard - news at 11 by Anonymous Coward · 2009-02-20 18:40 · Score: 0

I agree. Scientists can't, and shouldn't be expected to, dumb down science so Joe Shome can understand it. Science is inherently technical; if that bothers people they just need accept the data analysis used to draw those conclusions is over their head, and trust the people who did it. Most people trust the makers of computer software understanding they don't know how to program; the same applies to science. You don't have to understand the source code to use open office, but the source code being available not only helps with the program's verifiability, and allows advanced users to change it, it also can be a teaching tool for beginners. It's not the programmers fault Joe Shome who can't plug a flash drive in doesn't understand the source code. The same is true with an open database for scientific research; it could be used to teach students, and interested members of the public, as well as to share data more easily. Of course it would have to be verifiable, not just any old website.
Re:Science is hard - news at 11 by Anonymous Coward · 2009-02-20 19:10 · Score: 1, Insightful

I fully agree.
Furthermore, I've read the entire, long interview and get the feeling this is a person looking for a problem. Yes, taxpayer-funded research should be freely available. Yes, we could all benefit from more freely available data. But he builds up a massive and poorly defined manifesto with very little meat around a few good points.
I'd love to have access to various data sets that I know exist, because others have published their results and described the data collection. But they likely invested multiple years of experimental work (and grant writing) to generate said data - so I see why they may be reluctant to hand it out to others. The solution must be based on giving credit where credit is due, yet this is precisely the problem: if I use someone else's experimental data, even old data, for a new analysis, how do I ensure they receive credit? My sense is that even if my paper clearly stated the generous source of the data, and cited the publications describing the original research, many readers would read past these bits and focus on my results only.
I write this from the point of view of someone who a) has data and would like to share it, b) spends most of his time painstakingly generating more experimental data, c) is a bit conflicted, because I'd feel bad to see someone else get credit for an analysis of my hard-earned data without some of that rubbing off on me.
As for the storage problem: there is none, in general. Certain special disciplines, yes, but those call for specific, appropriate solutions. The cost of storing most experimental data that I am aware of is completely dwarfed by, say, the cost of a single experiment's reagents, or the monthly health insurance of a single graduate student.
Finally, the suggestion given in the interview that scientists must come up with standard "ontologies", etc. is misguided at best. Every area of specialization already is, and long has, maintained an ongoing, iterative process whereby new terms are introduced, debated, used, and accepted. And yet, as methods and ideas change, sometimes new things don't fit into the established vocabulary: but it would be wrong to suppose (or require) a consensus procedure for such cases. It's science, it's original, it's creative, and every scientist feels some ownership for their ideas and their specific use of terminology. So let us have sort it out, we're not bad at that stuff. Besides, "standard ontologies" usually become out of date the moment they are "ratified" ... better to be flexible and unstructured. That's how Google sees the world's information, and I'm fine with that.
Re:Science is hard - news at 11 by Anonymous Coward · 2009-02-21 00:55 · Score: 0

"I spend enough time fixing my own problems; I certainly don't have time to waste fixing other peoples' data for them."
I'm sure encyclopedia editors said that as well: I spend enough time doing this at my work, who'd want to spend time actually making it for free. Then came Wikipedia.
Re:Science is hard - news at 11 by ceoyoyo · 2009-02-21 07:33 · Score: 1

Bravo.
There ARE big shared datasets, when it makes sense, from trustworthy sources. They tend to cost a lot to assemble, make available, and maintain. I'm starting a post doc at a new lab and they showed me one they're working on: the price tag was $40 million.
We also have a mechanism by which anybody can read scientific papers, for free, if they choose to put in a little effort. They're called libraries.
Yes, the journal publishers probably need to cut their prices now that nobody actually wants the printed version anymore. Yes, open access journals are an interesting idea. I'm currently submitting a paper to one, and writing a chapter to an open access textbook. BUT, they can't completely take over or you shut out poorer research groups who can't afford to pay to have their papers published.

Is it? by Anonymous Coward · 2009-02-20 17:46 · Score: 0

Is it that I'm posting on slashdot? Is it that we all read this article? Is it that your read this comment? Is it that you read this comment on slashdot? Is it that you read slashdot which had this article which had this comment? Or is it?

Re:Is it? by Anonymous Coward · 2009-02-20 21:42 · Score: 0

"Gucci by Gucci - Pour Homme"

Euclids... by gmuslera · 2009-02-20 17:47 · Score: 1

said once to a king "there is no royal road to geometry". The nature of some things is in fact complex and there is no easy and accurate at the same time way to represent that.

Is a science or religion goal that the universe is made in such way that should be easy to explain it to humans?

Re:Euclids... by metageek · 2009-02-20 23:17 · Score: 0

Very good point.
I never bought into Occam's razor. I think it shaves off important stuff all too often.

--
metageek

Doesn't work yet: by Anonymous Coward · 2009-02-20 17:50 · Score: 0

Tried to use in recent paper, and got this reply:

"IEEE have advised that they are unable to accept
the Science Commons license at this time.

If you want your paper to be published, you will
need to sign off a plain IEEE copyright form
and scan/email it to me."

Re:Doesn't work yet: by janwedekind · 2009-02-21 00:58 · Score: 2, Informative

Actually IEEE allows you to make your paper available on the internet at *one* location. However the material must not be reprinted/republished without permission from the IEEE. They also don't allow making your work part of another world-wide indexed collection. That's still far from perfect but at least it allows you to make your work accessible on your homepage or your university's Digital Commons repository. I don't know what the future plans of IEEE are.

Cumbersome... by going_the_2Rpi_way · 2009-02-20 17:56 · Score: 1

This is the just as likely to add burden as to remove it.

I can't count the number of times I've seen attempts to 'standardize' data, or even just notation, in a given field. It all works very well for data to that point, but then the field expands or changes, or new assumptions become important, and the whole thing becomes either unwieldy or obsolete. This is one reason why every different field, it seems, has their own standards in their literature.

Speaking of the literature, most of these proposals are quickly followed by a 'let's just ask authors to conform to this now' approach to adopting these things. Papers get rewritten (or rejected), key points get lost, and the community gets weaker, all so that some standard with a half life of 12 months can be implemented.

This might be different. I applaud people trying to solve hard problems, and this is certainly one. I do think that more of the burden should be on demonstrating that the standradization is applicable for 12 months or more AFTER final development in a given field, never mind several.

Generally, though we shouldn't fear context. We should embrace it.

--
=======
Science -- Sealed, Delivered.

The format is the least important issue by EmbeddedJanitor · 2009-02-20 18:27 · Score: 1

It is an almost trivial exercise to convert one format to another.

What is a lot harder is knowing how the data sets were measured and whether it is valid to combine them with data sets measured in other ways.

At least half the Global Warming bun-fight is about the validity of comparison between different data sets and the same goes for pretty much any non-trivial data sets.

--
Engineering is the art of compromise.

Well Science excluding Maths and the Hard Sciences by Secret+Rabbit · 2009-02-20 19:59 · Score: 1

Excluding experimental data, those fields don't really have the problem that this guy is talking about. Perhaps someone should give him/her a lesson in the Scientific Method. Then maybe his/her words would reflect some rigour. Well, that and a link to the ArXive (http://arxiv.org/).

Why is this so? Because, these communities are so small, that just about everyone knows or knows of everyone else('s work). Of course, that's a slight hyperbole. BUT, /just/ a *slight* one.

This sort of project only really applies to the non-fundamental sciences. Not that it's not useful. Of course it'd be a good thing to get this going. But, we just have to be honest about its true scope. And of course it'd be nice if this guy would tone down the rhetoric. Coming off that naively idealistic only works against things.

noobs? by Anonymous Coward · 2009-02-20 21:43 · Score: 0

But they don't have the patience and 5 years to explain to aspiring noobs.

I guess you don't have PhD students then? You should try one - mine make me think hard about things I thought I already knew.

Re:Well Science excluding Maths and the Hard Scien by metageek · 2009-02-20 23:11 · Score: 0

This sort of project only really applies to the non-fundamental sciences.

And what are fundamental sciences?
I keep hearing this type of argument: (some) physicist think biology is not a fundamental science; (some) biologists think sociology is not a fundamental science... each science is fundamental to those who want to understand the phenomena that it deals with.

--
metageek

The purpose of papers... by Anonymous Coward · 2009-02-21 00:36 · Score: 0

IAAP (I am a physicist), and agree that the model of charging researchers to access their own papers is rediculous and broken - i submit preprints to arxiv.org in addition to print journals (everyone needs citations).

Any researcher will tell you that writing papers is a giant pain - it takes a long time which we would rather spend running experiments/simulations.
Whether they are published in open or closed journals, papers do have a useful function: they summarise the important results and (should) clearly explain the caveats and errors.

What this guy seems to be advocating is that the raw data from experiments be openly available. I work on large experiments (tokamaks), where diagnostics are one-offs, built specifically for that experiment. The data is full of errors and subtleties which only those most familiar with the experiment can assess properly. For this reason any papers to be published externally are first thoroughly reviewed internally to ensure that the data has not been misinterpreted.

Whilst freely publishing the resulting papers is a Good Thing (TM), freely allowing access to the raw data is not.

like, ummm by Hognoxious · 2009-02-21 02:02 · Score: 1

Does anyone, -- I mean there's me obviously -- think that the way the structure of the articles doesn't, in the sense that it's sort of an exact word for word -- transcription of someone *speaking* -- is extremely jarring when you see it -- by that I mean in the written form?

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re:like, ummm by ceoyoyo · 2009-02-21 07:36 · Score: 1

Yes. Hey, maybe we should all write our scientific papers that way!

Scientific data is niether free nor cheap... by w0mprat · 2009-02-21 02:09 · Score: 2, Insightful

Research data is typically large. In the mid-late 90s I recall a researcher planning to move 10 TB of data internationally. It wasn't exactly unprecedented either. The internet was simply not capable of such a transfer. Eventually they had to ship it on many disks.

The problem is with such raw data, ie from a radio telescope, is you need all of it, you can't really cut any out before it's even processed.

This is a lot less of a issue today with research networks all hooked into multi-gigabit pipes. But there are still very large datasets researchers are attempting to work with that are simply not cheap to handle.

I think this is a great idea, it's nice being able to share it but as far as the really sexy big research going on these days I don't see it being much of a point-click-download service!

--
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.

Re:Scientific data is niether free nor cheap... by ceoyoyo · 2009-02-21 07:37 · Score: 1

Grant applications in my field typically have at least one line item for "storage." It's not cheap.

arXiv!!!!! by anonymShit · 2009-02-21 02:44 · Score: 1

Finally somebody mentioned the arxiv.
By the way, it's quite funny to see all these guys telling somebody how to do his job better, mostly when they have absolutely no idea what they're talking about.
Some nice sentences from the article:
-"It's taken me some time to learn how to read them"... what!!??
-"Because you're trying to present what happened in the lab one day as some fundamental truth", hahaaha, that one is good.
-"So what we need to do is both think about the way that we write those papers, and the words and the tone and how that really keeps people out of science. It really reduces the number of scientists". Yea. From under which rock of another planet have they taken this guy?. Keeps people out of science...yes, they see equations they don't understand, and don't want to make the needed effort. I can imagine the solution is that we through science away and begin writing "easy papers" that any analphabet can understand. That would be progress!

Now seriously, change the patent system to reward theoreticians, not only experimentalists. And make the population less ignorant!!!

All research payed for by taxes should be free by Anonymous Coward · 2009-02-21 02:56 · Score: 0

Very simple:
Make a law that forces any tax funded research to end up in the public domain.
Problem solved.

Science, Truth. by Anonymous Coward · 2009-02-21 04:29 · Score: 0

We all benefit if policy is based on reality, rather than bad science or bad data. We all lose if our money is wasted based on bad science. And the policy should make everything public, as you don't know which data will affect you (and you might not be able to get the data you need for your project).

Recently outsiders have spotted bad data at Antarctica and arctic ice mistakes.

Scientific papers are not "compressed" by habbakuk · 2009-02-21 06:10 · Score: 1

From the article, regarding scientific literature: "Because you're trying to present what happened in the lab one day as some fundamental truth. And the reality is much more ambiguous. It's much more vague. But this is an artifact of the pre-network world. There was no other way to communicate this kind of knowledge other than to compress it."

A statement like this suggests that the speaker either unfamiliar with the way scientific data is actually turned into papers, or inappropriately optimistic about the utility of making the data "available." It is true that scientific data can be voluminous, but the overwhelming majority of papers do not "compress" data. To stretch an inadequate analogy, scientific literature is much more akin to metadata. Imagine scientific data as a large set of digitized recordings of music, all jumbled about. The paper would represent the list of song title, artist, etc. that someone had to put together. The metadata is not so much a compression as a re-representation and categorization of the data.

As a neuroscientist responsible for sharing my results with the world, I've taken reasonable steps to ensure that all of the data used in my papers is freely available (under the Science Commons license, which I'm quite grateful to Wilbanks & co. for). Similarly, the code I wrote to extract meaningful parameters from the data and present them in an aesthetically pleasing way is also freely available. I maintain no illusions as to the utility of the database: nobody is really interested in recreating the figures in the paper from the original data, nor in reanalyzing the data. However, I do know that some of the insights I've presented have influenced those (few) that have read my papers and struggled to understand the ideas presented within.

There is nothing wrong with the idea that scientific data and biological materials ought to be readily available to those who would use them. But the notion that somehow the hard-won insights that come to those who spend years collecting and thinking about the data will somehow follow is fanciful at best. Peer-reviewed, editor-selected papers are not compressed versions that are easier to transmit, but rather the collected insights and interpretations that allow us confidence in the work we've done. So by all means, if Mr. Wilbanks can find people to pay for it, make it easy to disseminate data. Just don't be surprised to find that "decompressing" papers doesn't do all that much to advance knowledge.

--
Try to love the questions themselves -- Rilke

OT (?): Sig reply by NotQuiteReal · 2009-02-21 17:03 · Score: 1

Don't feed the trolls - when an AC says something stupid, let it slide.

What about when an AC says something smart?

--
This issue is a bit more complicated than you think.

Raw data is necessary for validation... by tchall · 2009-02-24 03:57 · Score: 1

What matters? Is it the raw data? Is it the processed data? Is it the software used to process the data?

The original data is of paramount importance, software for processing and analysis not so much... Science requires the ability to independently redo experiments and analyze data... getting the same result IS the method of verification that makes the "Scientific Method" valid. Getting the same result using different tools for analysis is even better... Mann's "Hockey Stick" graph is one of the failures of that system since he either can't recall which data sources he used or lost the original data... (not a problem for him since random noise conveniently generates the same hook in the graph)

Slashdot Mirror

Freeing and Forgetting Data With Science Commons

114 comments