Freeing and Forgetting Data With Science Commons

← Back to Stories (view on slashdot.org)

Freeing and Forgetting Data With Science Commons

Posted by Soulskill on Friday February 20, 2009 @02:59PM from the bringing-it-all-together dept.

blackbearnh writes "Scientific data can be both hard to get and expensive, even if your tax dollars paid for it. And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them. That's the argument that John Wilbanks makes in a recent interview on O'Reilly Radar, describing the problems that have led to the creation of the Science Commons project, which he heads. According to Wilbanks, scientific data should be easy to access, in common formats that make it easy to exchange, and free for use in research. He also wants to see standard licensing models for scientific patents, rather than the individually negotiated ones now that make research based on an existing patent so financially risky." Read on for the rest of blackbearnh's thoughts. "Wilbanks also points of that as the volume of data grows from new projects like the LHC and the new high-resolution cameras that may generate petabytes a day, we'll need to get better at determining what data to keep and what to throw away. We have to figure out how to deal with preservation and federation because our libraries have been able to hold books for hundreds and hundreds and hundreds of years. But persistence on the web is trivial. Right? The assumption is well, if it's meaningful, it'll be in the Google cache or the internet archives. But from a memory perspective, what do we need to keep in science? What matters? Is it the raw data? Is it the processed data? Is it the software used to process the data? Is it the normalized data? Is it the software used to normalize the data? Is it the interpretation of the normalized data? Is it the software we use to interpret the normalization of the data? Is it the operating systems on which all of those ran? What about genome data?'"

33 of 114 comments (clear)

Min score:

Reason:

Sort:

I don't know! by blue+l0g1c · 2009-02-20 15:05 · Score: 2, Insightful

I was reading through the summary quickly and almost had a panic attack at the deluge of questions at the end. We get the point already!
What's most important to keep. by MoellerPlesset2 · 2009-02-20 15:11 · Score: 2, Insightful

What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.
It's an important and distinctive feature of Science that results are reproducible.
1. Re:What's most important to keep. by Anonymous Coward · 2009-02-20 15:13 · Score: 2, Insightful
  
  How can the results be reproducible if you don't keep the original data?
2. Re:What's most important to keep. by MoellerPlesset2 · 2009-02-20 15:27 · Score: 4, Insightful
  
  How can the results be reproducible if you don't keep the original data?
  The relevant results are supposed to be included in the paper, as well as the information necessary to reproduce the work. Most data doesn't fall into that category.
  
  To make an analogy the computer geeks here can relate to: All you need to reproduce the output of a program is the source code and parameters. You don't need the executable, the program's debug log, the compilers object files, etc, etc.
  
  The point is you want to reproduce the general result. You don't usually want to reproduce the exact same experiment with the exact same conditions. Supposedly you already know what happens then.
3. Re:What's most important to keep. by repepo · 2009-02-20 15:28 · Score: 3, Interesting
  
  It is a basic assumption in science that given some set of conditions (or causes) you get the same effect. For this to happen it is important to properly record how to setup the conditions. This is the kind of things that scientific papers describe (in principle at least!).
4. Re:What's most important to keep. by MoellerPlesset2 · 2009-02-20 15:39 · Score: 2, Interesting
  
  At what cost? Would you suggest discarding the data sets of nuclear bomb detonations since they are easily reproduced?
  Nobody said results are easily reproduced. But a-bomb tests are hardly representative of the vast majority of scientific results out there.
  
  How about other data sets that may need to be reinterpreted because of errors in the original processing?
  That's a scenario that only applies when the test is difficult to reproduce, and the results are limited by processing power rather than measurement accuracy. That's a relatively unusual scenario, since, first: Most experiments are easier to reproduce than that second: methods and measurements improve over time. The much more common scenario is that it's more efficient to simply re-do the experiment with modern equipment and get both more accurate measurements as well as better processing.
5. Re:What's most important to keep. by mako1138 · 2009-02-20 15:47 · Score: 5, Insightful
  
  Let's say the LHC publishes its analysis, and then throws away the data. What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?
  For a billion-dollar experiment like the LHC, that dataset is the prize. The dataset is the whole reason the LHC was built. Physicists will be combing the data for rare events and odd occurrences, many years down the road.
6. Re:What's most important to keep. by MoellerPlesset2 · 2009-02-20 16:14 · Score: 2, Insightful
  
  Let's say the LHC publishes its analysis [..]
  Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule.
  First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility. They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
  Second: Primary data, actual measurement results, are already kept, as a rule.
  Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.
  Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.
  
  What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?
  Truth is, you'd still have to rebuild the LHC then, because you didn't test your 'corrected' assumption against the actual machine to show that your 'corrected' results are valid. Until the actual experiment is re-done it'll remain an unanswered question.
7. Re:What's most important to keep. by TapeCutter · 2009-02-20 16:20 · Score: 2, Interesting
  
  "You don't usually want to reproduce the exact same experiment with the exact same conditions."
  
  That's right I want an independent "someone else" to do that in order to make my original result more robust. If I were an acedemic I would rely on post-grads to take up that challenge, if they find a discrepency all the better since you now have another question! To continue your software development analogy - you don't want the developer to be the ONLY tester.
  
  --
  And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
8. Re:What's most important to keep. by Mr+Z · 2009-02-20 16:24 · Score: 5, Interesting
  
  With a large and expensive dataset that can be mined many ways, yes, it makes sense to keep the raw data. This is actually pretty similar to the raw datasets that various online providers have published over the years for researchers to datamine. (AOL and Netflix come to mind.) Those data sets are large and hard to reproduce, and lend themselves to multiple experiments.
  But, there are other experiments where the experiment is larger than the data, and so keeping the raw data isn't quite so important as documenting the technique and conclusions. The Michelson-Morley interferometer experiments (to detect the 'ether'), the Millikan oil-drop experiment (which demonstrated quantized charges)... for both of these the experiment and technique were larger than the data, so the data collected doesn't matter so much.
  Thus, there's no simple "one size fits all" answer.
  When it comes to these ginormous data sets that were collected in the absence of any particular experiment or as the side effect of some experiment, their continued existence and maintenance is predicated on future parties constructing and executing experiments against the data. This is where your LHC comment fits.
  
  --
  Program Intellivision!
9. Re:What's most important to keep. by oneiros27 · 2009-02-20 16:52 · Score: 4, Insightful
  
  Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule. First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility. They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
  
  There are two types of science. What you're referring to is called 'Little Science' (not to be derogatory), but it's the type of thing that a small lab can do, with a reasonable amount of funding. And then there's what we call "Big Science" like the LHC, Hubble Space Telecope, Arecibo Observatory, Large Synoptic Space Telescope, etc.
  
  Second: Primary data, actual measurement results, are already kept, as a rule.
  
  I wish. Well, okay, it might be kept, but the question is by who, and have they put it somewhere that people can analyze it?
  I was at the AGU last year, and there was someone from a solar observatory that I wasn't familiar with. As I do work for the Virtual Solar Observatory, I asked them if we could put up a web service to connect their repository to our federated search. They told me there was no repository for the observatory -- the data walks out the door with whoever the observer was.
  Then there's the issue of trying to to tell from the published research exactly what the original data was. But then, I've been harping on the need for data citation for years now ... it's an issue that's starting to get noticed.
  
  Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.
  
  For the type of data that I deal with, none of it is technically reproducible, because it's observations, not experiments. And that's precisely why it's important to save the data.
  
  Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.
  
  In your field, maybe. But we have folks who try to design systems to predict when events are going to happen and need training data. Others do long-term statistical analysis with years or decades of data at a time. Still others find a strange feature that hadn't previously been identified as important (eg, coronal dimmings) and want to go back through all of the data to try to identify other occurrences.
  
  --
  Build it, and they will come^Hplain.
10. Re:What's most important to keep. by Rockoon · 2009-02-21 01:19 · Score: 2, Informative
  
  On the subject of reproducibility, I am reminded of a situation with Wei-Chyung Wang, a climate scientist.
  
  He was involved in the paper Jones et al (1990), which is where the situation begins.
  
  After *17 YEARS* of requests, Jones FINALLY released some of the data used in Jones 1990 through demands under the terms of the U.K. Freedom of Information policy on publicly funded research.
  
  Wang himself is free from FOI requests because Wang is an American and operates in America, where FOI requests regarding publicaly funded studies have no legal weight.
  
  The result of the eventual discloser of Jones, is that several researches have concluded that Wang fabricated research steps. That some of the steps could not have been performed, then or even now, and that for many of the climate stations used in his work the existing station histories directly contradict Wang's stated assessments about his data set.
  
  Specifically he claimed that only a few of these recording stations had been moved during the time-frame significant to the research, and that they were free from significant urbanization changes (the research was to measure the "Urban Heat Island" (UHI) Effect.) In short, Wang claimed that the stations histories showed that they were largely "homogeneous."
  
  According to the DOE CAS study, in regards to the quality of Wang's other station data, "details regarding instrumentation, collection methods, changes in station location or observing times are not known." The CAS bills itself as the most comprehensive history of Chinese climate available to date. Note that Wang actualy cited the CAS as one of the sources for his data.
  
  Essentialy both Wang et al 1990 and Jones et al 1990 were fradulent pieces of work that was never independently verified, and could not have been verified given both the straight out fraud and the failure to disclose the data set used.
  
  (Jones denies knowledge of Wangs fabrication of data.)
  
  Sparked by this controversy, new research specifically addressing the UHI based on the Chinese climate record paints an entirely different picture with regards to China, that the effect is in fact much more significant that concluded by Jones et al 1990.
  
  FULL DATA DISCLOSER IS NEEDED.
  
  This is especialy true in some areas of science, where all the big players not only know each other, BUT WORK, PUBLISH, AND PEER REVIEW TOGETHER.
  
  One specific small group of people is directly influencing global policies regarding climate change through their direct involvement with the IPCC, all the while hiding their own work and obstructing validation of their work.
  
  --
  "His name was James Damore."
11. Re:What's most important to keep. by jschen · 2009-02-21 06:04 · Score: 2, Informative
  
  How can the results be reproducible if you don't keep the original data?
  As others noted, there are cases where raw data is king, and others where raw data is virtually useless. LHC raw data will be invaluable. Raw data from genetic sequencing is a waste of time to keep. Why store huge graphics files when the only thing we will ever want from them is the sequence of a few letters? One must be able to distinguish between these two possibilities (and more subtle, less black and white cases, too), and there is no one size fits all solution.
  That said, you may be surprised how well really valuable data is stored by good principal investigators. I recently helped my PI re-digitize a prized result from 1988 (showing the first example of a synthetic enediyne compound cleaving DNA). The journal did not do a good job of scanning it, and it therefore was hard to interpret in the printed journal. So we dug up the original raw data (the original UV photograph of the DNA gel showing this result), which had been carefully filed away in our offsite storage location all these years, and re-digitized the image for a recent review article.
12. Re:What's most important to keep. by Patch86 · 2009-02-21 10:04 · Score: 2, Interesting
  
  5 Insightful?
  Seriously, read the OP again.
  "What's most important to keep is quite simple and obvious really: The results. The published papers, etc."
  He never suggested you throw out the results. No-one is going to throw out the results. Why would anybody throw out the results? Whichever body owns the equipment is bound to keep the results indefinitely, any papers they publish will include the results data (and be kept by the publishers), and copies will end up in all manner of libraries and file servers, duplicated all over the place.
  The most important things to keep from any experiment is 1) the results (no point in doing it if you don't keep the results) and 2) the methodology (if they don't know how you got the data, it's worthless). What you could throw away without too much harm is the analysis and interpretations, since you can always reanalyze and reinterpret (and any interpretations made now may prove wrong in the future anyhow). Even then, anything interesting is likely to be kept in the grand scheme of things anyway.
  The place which TFA is actually talking about is less dramatic, lower budget science. Its still important (it's the bread and butter of science and technology), but will be found in the vaults of far fewer publishers, libraries and web servers. And it's lower budget science where it's far easier to reproduce results, as in GP.
13. Re:What's most important to keep. by mako1138 · 2009-02-21 11:06 · Score: 2, Insightful
  
  You seem to be using "results" in a wider sense than "published papers". Yes, nobody is going to throw out papers. But the raw data from instruments? It is not clear whether those will be kept.
  You say that the analysis and interpretations can be thrown out, but those portions are precisely what go into published papers. And for small-scale science, it makes little sense to throw away anything at all.
14. Re:What's most important to keep. by mako1138 · 2009-02-21 11:36 · Score: 2, Informative
  
  Let's say the LHC publishes its analysis [..]
  Let's stop right there. There are no general lessons to be had from the LHC. It's an exception, not the rule.
  First: 99.9% of scientists are not working at LHC, or any other billion dollar, world-unique facility.
  They are working in ordinary labs, with ordinary equipment that's identical or similar to equipment in hundreds of other labs around the world.
  I admit that I jumped on the LHC as an extreme example. But even in an "ordinary" lab these days, you'll find some specialized and complex equipment. This is true for the cutting edge of any field.
  
  Second: Primary data, actual measurement results, are already kept, as a rule.
  
  As oneiros27 notes, this is not guaranteed, either by design or circumstance.
  
  Third: The vast majority of experiments are never ever reproduced to begin with. You're lucky enough to get cited, really. Most papers don't even get cited apart from by those who wrote them.
  Not sure what kind of point you're trying to make here.
  
  Fourth: Very little science is done by re-interpreting existing results. That only applies to the unique cases where the actual experiment can't be reproduced easily.
  It's not necessarily a matter of re-interpreting existing results. You may be adding an old dataset to a new dataset, and finding new results in the combined set, or finding a glimmer of something new in an old dataset. Even for "small" experiments, having somebody else's raw dataset can make your life a lot easier.
  
  What happens when five years later it's discovered that a flawed assumption was used in the analysis? Are we going to build another LHC any time soon, to verify the result?
  Truth is, you'd still have to rebuild the LHC then, because you didn't test your 'corrected' assumption against the actual machine to show that your 'corrected' results are valid. Until the actual experiment is re-done it'll remain an unanswered question.
  No, I am talking strictly about analysis. For example, the use of neural networks in particle/track finding has recently met greater acceptance in in High Energy Physics. But what happens if, a few years down the road, evidence turns up that neural networks are fundamentally flawed? If you have kept the data, you can re-run the analysis with different methods. If you have thrown out the data, it's time to build a new LHC.
  Granted, High Energy Physics, with its requirements for large datasets in order to find extremely rare processes, is perhaps the only branch of science to require so much data. In HEP, we want to keep as much as possible, but there are realistic limits. In other fields, since there are no difficulties, why not keep everything?
15. Re:What's most important to keep. by mako1138 · 2009-02-21 11:55 · Score: 2, Insightful
  
  I agree that there is no simple answer, but I am uneasy with your "experiment is larger than the data" concept. Today we think of the Michelson-Morley and Millikan experiments as canonical and definitive investigations in Physics. But we do not often remember that each was preceded by a string of less-successful experiments, and followed by confirmations. It the accumulation of a body of data that leads to the gradual acceptance of a physical concept.
  See chart:
  http://en.wikipedia.org/wiki/Michelson-Morley_experiment#The_most_famous_failed_experiment
Comment removed by account_deleted · 2009-02-20 15:15 · Score: 4, Informative

Comment removed based on user account deletion
Re:And the scientists goes mooo! by Vectronic · 2009-02-20 15:46 · Score: 2, Interesting

Although likely, not necessarily...
I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.
There are geniuses of all sorts, someone might be completely lost trying to understand it linguistically, but may find a fault in it instantly visually, or audibly.
However that is somewhat redundant, as the original (as it is now) can be converted into that by people, but a mandate saying it must contain X, Y and Z, will open it up to more people, quicker.
What's the goal, really? by Rostin · 2009-02-20 16:24 · Score: 4, Insightful

I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?
It reminds me of the XKCD this morning...
1. Re:What's the goal, really? by onionlee · 2009-02-20 16:56 · Score: 2, Interesting
  
  agreed. most sciences that have been around for a long time and have developed their own specializations within them, such as physics, have specific journals that target their "demographics" (such as the journal of applied physics a, b, c, d, letters). anything outside of those journals most likely have been rejected by those journals and are irrelevant. furthermore, the relatively young sciences such as linguistics use (what i personally think is lame) a system of keywords so that anyone can easily find articles that theyre interested in. truly, i have yet to find any researcher who has complained about this "problem".
2. Re:What's the goal, really? by Beetle+B. · 2009-02-20 18:13 · Score: 3, Insightful
  
  Typical comments from someone in the first world.
  First, just on the side, I know lots of people who got PhD's but did not really stay in research and academia. They still want to read papers, though, as they still maintain an interest.
  But the main benefit of opening up journal papers is for the rest of the world to benefit. Yes, if you have a very narrow perspective, you could just dismiss that as charity. If you're open minded, you'll realize that shutting out most of the world to scientific output means much less science globally, and much less benefits to you as a result.
  Imagine if all researchers in Japan published papers only in Japanese, and the journals had a copyright condition that prevented the content from ever being translated to another language, and you'll see what I mean. Whereas current journals require a lot of money for access, these ones also have a price: Just learn Japanese. It's not exactly promoting science.
  Then again, of course, journals do need a base amount of money to operate. Just that Elsevier kind of companies charge so much more than is needed to make a profit.
  
  --
  Beetle B.
3. Re:What's the goal, really? by Grym · 2009-02-20 18:52 · Score: 2, Interesting
  
  In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?
  I'm a poor medical student, but a medical student with--quite frequently--interdisciplinary ideas. I can't tell you the number of times I have been interested in pursuing a subject for independent research and have been stymied or effectively stopped in my tracks because of my lack of ability to pay or lack of online access to experimental data and results. You might think that modern science is highly specialized and, for most bleeding-edge topics, you're probably right. In these cases, the affected researchers can all afford the one or two subscriptions they need to stay up to date. However, in overlapping areas, non-specialists (or specialists in other fields) might have have a unique perspective and possibly insightful findings to add. What harm could be done by letting them take a look?
  Take for example one of my hair-brain ideas. There is a disease called Pellagra, which is caused by diets deficient in certain amino acids. These amino acids are lacking in corn. In the United States, corn is, by far, the largest cash crop. Now, diets in the U.S. are varied enough to where modern Americans do not get Pellagra, but this isn't the case in developing nations, where Pellagra can sadly be endemic. So, my idea is this: why not introduce conservative substitutions into the genetic sequence of the gene encoding the major structural protein of corn ( zein ) in such a way as to make corn a (more) complete amino acid food source? By doing this, you'd be turning one of the world's most abundant and cheap foodstuffs into an effective cure for a common, debilitating disease.
  Now, to me, as an outsider to Agriculture, this seems like a rather basic idea. I was convinced that someone had to have tried something similar to this. But you'd be surprised. I have yet to find a single paper that has ever attempted such a thing. Almost all of them focus on crop yields or the use of zein in commercial products. Now, maybe (for reasons unbeknown to me) my idea is untenable such that, people in the field have never given it a thought. But what if that isn't the case? What if the leaders of the field (or at least the emergent behavior of the scientists and scientific institutions) is pushing so hard in one direction that an obvious area for research or advancement was overlooked? Let's hope it's not the latter...
  Regardless it's a travesty how petty scientific institutions are in this regard considering how often they talk to the public about high-minded ideals when extolling the virtues of public funding of Science. This information should be available to all: specialists and non-specialists alike.
  -Grym
  P.S. Oh yeah, and in case, any of you were wondering. Somebody already patented the general idea described in my post. So don't get any wild ideas about trying to use it to help the poor, now! (/facepalm)
4. Re:What's the goal, really? by smallfries · 2009-02-20 23:22 · Score: 3, Informative
  
  Trickle-down. Dissemination of knowledge.
  You don't know it yet (not meant as a jibe but it is something that clicks in after your PhD) but your primary function as a scientist is not to make discoveries. It is spreading knowledge. Sometimes that dissemination will occur in a narrow pool, through journal papers between specialists in that narrow pool of talent.
  This is not the primary goal of science, although it can seem like it when you are slogging away at learning your first specialisation well enough to get your doctorate. Occasionally a wave from that little pool will splash over the side - maybe someone will write a literature review that is read by a specialist in another field. A new idea will be found - after all sometimes we know the result before we know the context that it will be applied to.
  The pools get bigger as you move further downstream. Journal articles pass into conference publications, then into workshops. Less detail but carried through a wider audience. Then after a time, when the surface seems to have become still textbooks are written and the knowledge is passed on to another generation. We tend to stick around and help them find the experience to use it as well. This is why all PhD students have an advisor to point out the best swimming areas.
  That was the long detailed answer to your question. The simple version is that you don't know who your target audience is yet. And limiting it to people in institutions that pay enormous access fees every year is not science. As a data-point - a lot of European institutes don't bother with IEEE fees. They run to about Â£50k/year which simply isn't worth it. As a consequence results published in IEEE venues are cited less in Europe. So even amongst the elite access walls have an effect.
  
  --
  Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Is storage an issue? by blue+l0g1c · 2009-02-20 16:34 · Score: 2, Interesting

Data storage is something we've gotten very good at and we've made it very cheap. A Petabyte a day is not as staggering as it was even five years ago.
Science is hard - news at 11 by jstott · 2009-02-20 17:36 · Score: 2, Insightful

And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them.

I know that this is a real shock to you humanities majors, but science is hard. And yes, for the record, I do have degrees in both [physics and philosophy, or will as of this May — and the physics was by far the harder of the two].
Here's another shocker. If you think the papers are hard to read, you should see the amount of work that went into processing the data until it's ready to be written up in an academic journal. Ol' Tom Edison wasn't joking when he said its "1% inspiration and 99% perspiration." If you think seeing the raw data is going to magically make everything clear, well, I'm sorry, the real world just doesn't work that way. Finally, if you think professional scientists are going to trust random data they downloaded off the web of unknown provenance, well, I'm sorry but that isn't going to happen either. I spend enough time fixing my own problems; I certainly don't have time to waste fixing other peoples' data for them.
-JS

--
Vanity of vanities, all is vanity...
Re:Again with the IP by wisty · 2009-02-20 17:54 · Score: 3, Insightful

There is a rumor that Newton meant it as an insult to Hooke. Newton had refined DesCarte's wave theory, while Hooke had backed the corpuscul theory. Also, Hooke was a short man.
Re:And the scientists goes mooo! by wisty · 2009-02-20 17:57 · Score: 2, Insightful

Why should science be more complex than necessary? For every String Theory area (where complexity is unavoidable) there are plenty of theories like economics, which just rely on weird jargon to fence out the interlopers.
Re:And the scientists goes mooo! by Fallingcow · 2009-02-20 18:24 · Score: 2, Interesting

I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.
Don't count on that being at all helpful.
Take the math articles on Wikipedia: I can read one about a topic I already understand and have no idea what the hell their talking about in entire sections. It's 100% useless for learning new material in that field, even if it's not far beyond your current level of understanding. Good luck if you start on an article far down a branch of mathematics--assuming they bother to tell you the source of the notation in that article, it'll take you a half-dozen more articles to find anything that sort-of translates some of it for you.
Some sort of mouseover tool-tip hint thing or a simple glossary is all I ask, but I think the people writing that stuff don't even realize how opaque it is to people who majored in something other than math.
Re:not results- grant dollars by smallfries · 2009-02-20 23:09 · Score: 4, Insightful

What incentive does a massive industry have to solve cancer, when it would put them out of business? Tens of thousands of people have dedicated most of their adult lives, usually to studying specific mechanisms and biological functions so narrow that if cancer were cured tomorrow, they would be useless- their training and knowledge is so focused, so narrow- they cannot compete with the existing population of researchers in other biomedical fields. Journals which charge big bucks for subscriptions also would be useless. Billions of dollars of materials, equipment, supplies, chemicals- gone. "Centers", hospitals, colleges, universities which each rake in hundreds of millions of dollars in private, government, and non-profit sourced money would be useless.
That's an old argument and although it sounds reasonable it is completely unsound. An industry does not function as a single cohesive entity with wants and desires. It is composed of many different individuals with their own wants and desires.
I know enough academics to say for certain that if any one of those individuals could discover a cure that would put their entire employer out of business then they would leap at the chance. The fame that would follow would make another job easy enough to get, and the recognition is what they're really in it for anyway.

--
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Re:Doesn't work yet: by janwedekind · 2009-02-21 00:58 · Score: 2, Informative

Actually IEEE allows you to make your paper available on the internet at *one* location. However the material must not be reprinted/republished without permission from the IEEE. They also don't allow making your work part of another world-wide indexed collection. That's still far from perfect but at least it allows you to make your work accessible on your homepage or your university's Digital Commons repository. I don't know what the future plans of IEEE are.
Scientific data is niether free nor cheap... by w0mprat · 2009-02-21 02:09 · Score: 2, Insightful

Research data is typically large. In the mid-late 90s I recall a researcher planning to move 10 TB of data internationally. It wasn't exactly unprecedented either. The internet was simply not capable of such a transfer. Eventually they had to ship it on many disks.

The problem is with such raw data, ie from a radio telescope, is you need all of it, you can't really cut any out before it's even processed.

This is a lot less of a issue today with research networks all hooked into multi-gigabit pipes. But there are still very large datasets researchers are attempting to work with that are simply not cheap to handle.

I think this is a great idea, it's nice being able to share it but as far as the really sexy big research going on these days I don't see it being much of a point-click-download service!

--
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
Re:not results- grant dollars by Bowling+Moses · 2009-02-21 06:46 · Score: 2, Informative

I've been doing research in the biological sciences for 12 years now, including some work that was at least tangentially related to human health. I am not in it for the paycheck--if that's all I wanted, my friends and I joke that we'd go to KFC School of Business Management and be assistant managers at fast food restaurants making more than we do in science. I, and the majority of the people I know, don't want to be professors either. It's extremely rare for a professor to actually do any lab work themselves, but if you ask they'll tell you they miss it. Besides there are 300 people applying for each professorship at a decent university. Then if you are unlucky enough to get the job, you have to successfully fight in a viciously competitive funding environment to get tenure and not lose your mind or your liver in the process. It's actually hard enough to keep a job in academic science, period. My boss and I are applying for grants. Hers are in part to keep my position funded, she's got one out and is writing a second. I've got one out, and am applying for two or possibly three more. Contrary to what you wrote, my grants are largely my ideas and my writing, and should I get funded is my money, not the boss's. However science funding is so obscenely bad (most grants have ~5% success rate, the best one I'm applying for has ~25%) that I'm also going to look for a new job, with the boss's full knowledge and support, even though we'd both very much like me to stick around for another couple years and get our proposed butt kicking science done.

So why do it if there's nothing but nonstop stress, Burger King assistant manager pay, and institutionalized job insecurity? I get to solve problems. I get to figure things out. I get to do things (sometimes, not often, but sometimes) that nobody has ever done before, see things nobody else has ever seen before. Work in a small way on projects that could impact millions of people's lives. I'll never be famous, which is fine with me. I'll never be rich, which, well, I can tolerate. I might not ever have job security...which okay, I'll admit is seriously grinding down my enthusiasm and idealism. But the things I've gotten to do--even paid a pittance to do--I wouldn't trade. Catching jellyfish off the docks in Oregon. Turned loose on a billion dollar synchrotron, unsupervised at 3 am to understand how an enzyme known to be a virulence factor in several diseases functions at an atomic level. Making radioactively labeled mosquitoes to understand lipid trafficking, working with cell culture (It's a cell from an insect's midgut...that under laboratory conditions can endlessly propagate itself. How cool! And here's my what I'm going to do with it...), genetically engineering fluorescent organisms, using high-throughput screening to find new drug lead compounds. A lot of hard work, but sometimes that's damn good fun. Plus along the way you get to understand phenomena on a level that most people don't even know exists. I'm of course not claiming god-king knowledge here, but I could spend a long time talking about the terrible beauty of host:pathogen and vector:pathogen relationships for example, or protein structure, or anything else I've studied a while, just like any other scientist. That's fun too, although not cool in most of society. But my mom still thinks I'm cool. Ok, no, she doesn't.

If you expect to get rich and famous doing science, no wonder your post seems bitter. It isn't going to happen and isn't a right reason to do science in the first place. Those pie-in-the-sky ideals are.