Researchers Pull Out of Talks With Publishers On Text-Mining
ananyo writes "Disagreement between scientists and publishers has grown on a thorny issue: how to make it easier for computer programs to extract facts and data from online research papers. On 22 May, researchers, librarians and others pulled out of European Commission talks on how to encourage the techniques, known as text mining and data mining. The withdrawal has effectively ended the contentious discussions, although a formal abandonment can be decided only after a commission review in July. Scientists have chafed for years at limitations on computer-aided research. They would like to use computer programs to crawl over thousands or millions of articles and other online research content, extracting data to build up databases or to pick out patterns such as associations between genes and diseases. But in many parts of the world, including Europe (though perhaps not in the U.S. — the situation is unclear), this sort of use currently requires permission from the content's copyright owner. Even if an institution has paid to access a journal, its academics do not necessarily have permission to mine the text."
Text and data mining have a long term history in the media, back to 80s.
Some vids are available on Youtube. European Agencies are ahead of Us based ones :
Text mining
Data mining
...professionals are starting to take notice.
The "we're just replacing menial labor, so now those people are freed up to do more valuable knowledge-driven work" argument doesn't work so well when it's the knowledge driven work itself that is being co-opted.
It's the researchers' work, their livelihood. That Google (replace with Big Data Corp. of your preference) can get very rich short-term by vacuuming up the mental effort of the planet and slapping ads onto it, does not mean it is a viable economic model for the rest of us.
Buy the content you want to index, put in in a database and search it at your leisure.
What? You want it for free? Get a life.
Was in the tea I own lube, beverage,
The people who do the science and write the papers produce the content. Yet somehow the publisher controls how it gets used thereafter.
Everyone is so damned beholden to copyright that it more or less constrains how you do anything.
And they wonder why people are pushing for open access -- it's time to cut the buggy whip makers out of the equation.
If you took public money to do this, it should be open. If you want it to be locked down and proprietary, don't publish.
Lost at C:>. Found at C.
Even if an institution has paid to access a journal, its academics do not necessarily have permission to mine the text.
I thought copyright only concerned the rights to distribute and make copies of an original work. Since when did "distribution" and "making copies" get extended to what you can do with what you have obtained legally?
Fuck Corporations
Fuck "Intellectual Property"
Pulling out is not an effective method of prevention.
You're scientists. Just do it. Facts are facts. They aren't subject to extortion schemes like copyright.
Don't ask publishers ahead of time. It will put bad ideas in their head. Conduct your research, then let them prove that that you need their permission later. Not the other way around.
As an NLP Bioinformatics guy, I believe the real crime Aaron Swartz committed was being in the news.
He isn't the first to have that dataset and he wont be the last.
We write papers using massive NLP scans of publications rather routinely.
Most of the time, the papers are downloaded from PubMed (public funded) so they can't even complain about bandwidth costs, etc.
For anyone who didn't know already, most subscription Publishers don't **DO** anything.
They are only slightly better than patent trolls, and in some cases, worse.
academic culture and the academic generation gap.
Hiring and tenure still involve large percentages of faculty that "came up" under the old system, and don't see the problem (don't have time to see the problem) that has emerged in academic publishing culture over the last couple of decades in particular. They don't see work published outside of the big name journals/publishers as "serious" or "academic" for the moment. So young academics wanting to build a career continue to support them and publish in them, as a pragmatic career-building move.
But young academics by and large (at least in my wing of the social sciences) are incredibly jaded about academic publishing and are absolutely willing to shift the culture away from publishing with big journal mills—they just have to get hired, get tenure, and become "the academics of the world" first. Then, as they begin to be the ones making the hiring and tenure decisions, you can bet that as they consider the next crop of youngsters, they won't place the same premium on Springer, Elsevier, et. al. journals.
The publishing mills are not long for the world, and they know it, which is why they're all trying to expand/reshape their product lines, business models, etc. away from straight print content licensing and toward academic SaaS and other similar offerings.
STOP . AMERICA . NOW
One of the concerns (read: lame excuses) given by the publisher side of this is fear that large scale downloads will cripple their web servers. Private torrent trackers for scientific work is the obvious solution. With university and institutional seeds, this solution would be efficient, equitable and fast.
.: Semper Absurda
Come on. The description of research methods , procedures, tests and results scientific papers, exists for the betterment of humankind, not to make people who own it rich. Get rich by Making Stuff, not exerting a monopolist's control on Knowledge.
How hard is this? All research and results conducted by higher ed should be available for free and the costs rolled into the tax base.
This is as basic as it gets. Roads bridges security and advances in knowledge.
It is done look at dunblincore, OAI, and duraspace for opensource solutions
So the discussion concerns:
Publishers who block Content
Data miners doing research.
At least that's clear then.