Freeing and Forgetting Data With Science Commons
blackbearnh writes "Scientific data can be both hard to get and expensive, even if your tax dollars paid for it. And if you do pay the big bucks to a publisher for access to a scientific paper, there's no assurance that you'll be able to read it, unless you've spent your life learning to decipher them. That's the argument that John Wilbanks makes in a recent interview on O'Reilly Radar, describing the problems that have led to the creation of the Science Commons project, which he heads. According to Wilbanks, scientific data should be easy to access, in common formats that make it easy to exchange, and free for use in research. He also wants to see standard licensing models for scientific patents, rather than the individually negotiated ones now that make research based on an existing patent so financially risky."
Read on for the rest of blackbearnh's thoughts.
"Wilbanks also points of that as the volume of data grows from new projects like the LHC and the new high-resolution cameras that may generate petabytes a day, we'll need to get better at determining what data to keep and what to throw away. We have to figure out how to deal with preservation and federation because our libraries have been able to hold books for hundreds and hundreds and hundreds of years. But persistence on the web is trivial. Right? The assumption is well, if it's meaningful, it'll be in the Google cache or the internet archives. But from a memory perspective, what do we need to keep in science? What matters? Is it the raw data? Is it the processed data? Is it the software used to process the data? Is it the normalized data? Is it the software used to normalize the data? Is it the interpretation of the normalized data? Is it the software we use to interpret the normalization of the data? Is it the operating systems on which all of those ran? What about genome data?'"
I was reading through the summary quickly and almost had a panic attack at the deluge of questions at the end. We get the point already!
What's most important to keep is quite simple and obvious really:
The results. The published papers, etc.
It's an important and distinctive feature of Science that results are reproducible.
Comment removed based on user account deletion
Although likely, not necessarily...
I'd be happy with a Wiki-Style, where the actual article can be as complex (in the know) as desired, but with a glossary of sorts.
There are geniuses of all sorts, someone might be completely lost trying to understand it linguistically, but may find a fault in it instantly visually, or audibly.
However that is somewhat redundant, as the original (as it is now) can be converted into that by people, but a mandate saying it must contain X, Y and Z, will open it up to more people, quicker.
I'm a working scientist (ok, PhD student), so I read journal articles pretty often. I can understand the rub in principle, but let's say that we come up with some way for all scientific data to be freely shared. So what? In almost all cases, the only people who actually benefit from access to particular data are a small handful of specialists. Could someone explain to me why this is a real problem and not just something that people with too much time on their hands (and who would never actually read, let alone understand, real research results) get worked up about?
It reminds me of the XKCD this morning...
Data storage is something we've gotten very good at and we've made it very cheap. A Petabyte a day is not as staggering as it was even five years ago.
I know that this is a real shock to you humanities majors, but science is hard. And yes, for the record, I do have degrees in both [physics and philosophy, or will as of this May — and the physics was by far the harder of the two].
Here's another shocker. If you think the papers are hard to read, you should see the amount of work that went into processing the data until it's ready to be written up in an academic journal. Ol' Tom Edison wasn't joking when he said its "1% inspiration and 99% perspiration." If you think seeing the raw data is going to magically make everything clear, well, I'm sorry, the real world just doesn't work that way. Finally, if you think professional scientists are going to trust random data they downloaded off the web of unknown provenance, well, I'm sorry but that isn't going to happen either. I spend enough time fixing my own problems; I certainly don't have time to waste fixing other peoples' data for them.
-JS
Vanity of vanities, all is vanity...
There is a rumor that Newton meant it as an insult to Hooke. Newton had refined DesCarte's wave theory, while Hooke had backed the corpuscul theory. Also, Hooke was a short man.
Why should science be more complex than necessary? For every String Theory area (where complexity is unavoidable) there are plenty of theories like economics, which just rely on weird jargon to fence out the interlopers.
Don't count on that being at all helpful.
Take the math articles on Wikipedia: I can read one about a topic I already understand and have no idea what the hell their talking about in entire sections. It's 100% useless for learning new material in that field, even if it's not far beyond your current level of understanding. Good luck if you start on an article far down a branch of mathematics--assuming they bother to tell you the source of the notation in that article, it'll take you a half-dozen more articles to find anything that sort-of translates some of it for you.
Some sort of mouseover tool-tip hint thing or a simple glossary is all I ask, but I think the people writing that stuff don't even realize how opaque it is to people who majored in something other than math.
What incentive does a massive industry have to solve cancer, when it would put them out of business? Tens of thousands of people have dedicated most of their adult lives, usually to studying specific mechanisms and biological functions so narrow that if cancer were cured tomorrow, they would be useless- their training and knowledge is so focused, so narrow- they cannot compete with the existing population of researchers in other biomedical fields. Journals which charge big bucks for subscriptions also would be useless. Billions of dollars of materials, equipment, supplies, chemicals- gone. "Centers", hospitals, colleges, universities which each rake in hundreds of millions of dollars in private, government, and non-profit sourced money would be useless.
That's an old argument and although it sounds reasonable it is completely unsound. An industry does not function as a single cohesive entity with wants and desires. It is composed of many different individuals with their own wants and desires.
I know enough academics to say for certain that if any one of those individuals could discover a cure that would put their entire employer out of business then they would leap at the chance. The fame that would follow would make another job easy enough to get, and the recognition is what they're really in it for anyway.
Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
Actually IEEE allows you to make your paper available on the internet at *one* location. However the material must not be reprinted/republished without permission from the IEEE. They also don't allow making your work part of another world-wide indexed collection. That's still far from perfect but at least it allows you to make your work accessible on your homepage or your university's Digital Commons repository. I don't know what the future plans of IEEE are.
Research data is typically large. In the mid-late 90s I recall a researcher planning to move 10 TB of data internationally. It wasn't exactly unprecedented either. The internet was simply not capable of such a transfer. Eventually they had to ship it on many disks.
The problem is with such raw data, ie from a radio telescope, is you need all of it, you can't really cut any out before it's even processed.
This is a lot less of a issue today with research networks all hooked into multi-gigabit pipes. But there are still very large datasets researchers are attempting to work with that are simply not cheap to handle.
I think this is a great idea, it's nice being able to share it but as far as the really sexy big research going on these days I don't see it being much of a point-click-download service!
After logging in slashdot still does not take you back to the page you were on. It's been that way for 20 years.
I've been doing research in the biological sciences for 12 years now, including some work that was at least tangentially related to human health. I am not in it for the paycheck--if that's all I wanted, my friends and I joke that we'd go to KFC School of Business Management and be assistant managers at fast food restaurants making more than we do in science. I, and the majority of the people I know, don't want to be professors either. It's extremely rare for a professor to actually do any lab work themselves, but if you ask they'll tell you they miss it. Besides there are 300 people applying for each professorship at a decent university. Then if you are unlucky enough to get the job, you have to successfully fight in a viciously competitive funding environment to get tenure and not lose your mind or your liver in the process. It's actually hard enough to keep a job in academic science, period. My boss and I are applying for grants. Hers are in part to keep my position funded, she's got one out and is writing a second. I've got one out, and am applying for two or possibly three more. Contrary to what you wrote, my grants are largely my ideas and my writing, and should I get funded is my money, not the boss's. However science funding is so obscenely bad (most grants have ~5% success rate, the best one I'm applying for has ~25%) that I'm also going to look for a new job, with the boss's full knowledge and support, even though we'd both very much like me to stick around for another couple years and get our proposed butt kicking science done.
So why do it if there's nothing but nonstop stress, Burger King assistant manager pay, and institutionalized job insecurity? I get to solve problems. I get to figure things out. I get to do things (sometimes, not often, but sometimes) that nobody has ever done before, see things nobody else has ever seen before. Work in a small way on projects that could impact millions of people's lives. I'll never be famous, which is fine with me. I'll never be rich, which, well, I can tolerate. I might not ever have job security...which okay, I'll admit is seriously grinding down my enthusiasm and idealism. But the things I've gotten to do--even paid a pittance to do--I wouldn't trade. Catching jellyfish off the docks in Oregon. Turned loose on a billion dollar synchrotron, unsupervised at 3 am to understand how an enzyme known to be a virulence factor in several diseases functions at an atomic level. Making radioactively labeled mosquitoes to understand lipid trafficking, working with cell culture (It's a cell from an insect's midgut...that under laboratory conditions can endlessly propagate itself. How cool! And here's my what I'm going to do with it...), genetically engineering fluorescent organisms, using high-throughput screening to find new drug lead compounds. A lot of hard work, but sometimes that's damn good fun. Plus along the way you get to understand phenomena on a level that most people don't even know exists. I'm of course not claiming god-king knowledge here, but I could spend a long time talking about the terrible beauty of host:pathogen and vector:pathogen relationships for example, or protein structure, or anything else I've studied a while, just like any other scientist. That's fun too, although not cool in most of society. But my mom still thinks I'm cool. Ok, no, she doesn't.
If you expect to get rich and famous doing science, no wonder your post seems bitter. It isn't going to happen and isn't a right reason to do science in the first place. Those pie-in-the-sky ideals are.