Slashdot Mirror


Researchers Pull Out of Talks With Publishers On Text-Mining

ananyo writes "Disagreement between scientists and publishers has grown on a thorny issue: how to make it easier for computer programs to extract facts and data from online research papers. On 22 May, researchers, librarians and others pulled out of European Commission talks on how to encourage the techniques, known as text mining and data mining. The withdrawal has effectively ended the contentious discussions, although a formal abandonment can be decided only after a commission review in July. Scientists have chafed for years at limitations on computer-aided research. They would like to use computer programs to crawl over thousands or millions of articles and other online research content, extracting data to build up databases or to pick out patterns such as associations between genes and diseases. But in many parts of the world, including Europe (though perhaps not in the U.S. — the situation is unclear), this sort of use currently requires permission from the content's copyright owner. Even if an institution has paid to access a journal, its academics do not necessarily have permission to mine the text."

17 of 67 comments (clear)

  1. Re:Well, this is simple. by Anonymous Coward · · Score: 2, Informative

    The researchers are protesting that they are not being allowed to mine the content that they have already paid to get. They are not arguing that the content should be available for free.

  2. Sad ... by gstoddart · · Score: 5, Insightful

    The people who do the science and write the papers produce the content. Yet somehow the publisher controls how it gets used thereafter.

    Everyone is so damned beholden to copyright that it more or less constrains how you do anything.

    And they wonder why people are pushing for open access -- it's time to cut the buggy whip makers out of the equation.

    If you took public money to do this, it should be open. If you want it to be locked down and proprietary, don't publish.

    --
    Lost at C:>. Found at C.
    1. Re:Sad ... by rsborg · · Score: 4, Insightful

      Everyone is so damned beholden to copyright that it more or less constrains how you do anything.

      This is not just a failure of copyright, this is an institutional failure where the "publisher" gets to control the entire scientific debate and profit on all ingress and egress of data. Copyright is just the weapon the publisher is brandishing to force even more people to pay them.

      How is this even tenable long-term? What curation do these journals provide? Why are they regarded as anything more than leeches?

      --
      Make sure everyone's vote counts: Verified Voting
    2. Re:Sad ... by Charliemopps · · Score: 2

      If you want it to be locked down and proprietary, don't publish.

      While I agree with you mostly, one of the biggest problems they have (especially in medicine) is unpublished papers.
      Watch this: http://www.ted.com/talks/ben_goldacre_what_doctors_don_t_know_about_the_drugs_they_prescribe.html
      Over 100,000 people were killed in the United States due to 1 paper that went unpublished.

    3. Re:Sad ... by Anonymous Coward · · Score: 3, Insightful

      It's a free market.

      No it isn't. When your livelihood requires X papers published to be in a small set of predetermined circulars, you cannot simply stick a pdf on cesspits like arXiv.org.

    4. Re:Sad ... by Trepidity · · Score: 2

      It damages the literature beyond access to papers as well, since publishers sometimes use copyright to interfere with papers themselves. I was forced to remove a screenshot from one paper because the publisher's official position on fair-use was extremely narrow and would not allow screenshots. Perhaps this is simply due to risk-aversion: it's easier to just restrict fair-use than worry about how close to the line to get. But a more cynical person might suspect it's in the publisher's own interests to push generally anti-fair-use positions.

      Film-studies scholars have been struggling with this for years: including a still image from a film in a discussion of it is obviously fair-use, but a surprising number of publishers disagree.

    5. Re:Sad ... by dwsobw · · Score: 2

      But there is a simple solution. At my Institute (in Germany), we simply do not publish at journals and conference where we have to give publishers the exclusive rights to the paper. Either they accept that we do remove that clause from the forms we have to sign, or we do not publish with them. It is fairly simple. Even Springer seems to go along most of the time.

  3. Re:Well, this is simple. by wierd_w · · Score: 4, Insightful

    Translation: Invent the wheel many many times! Don't you DARE share the data on wheels with others without first getting permission to replicate data from the spoke makers, and rim makers!

    Fuck off AC. Look at the internet as a model on how unfettered data proliferation prevents biases from dominating information use. (What's that barbara striesand? That pictue of your beach house is STILL on the internet? Fancy that!) Allowing researchers to share and vet each of these databases you want them to all make independently is EXACTLY how this technology should be used, BECAUSE it prevents usedful data from being hushed up, or forgotten, and gives that data its due. The scientists that created the data want the data shared. The scientists that ewant the data, want it shared.

    The only group that does NOT want the data shared, is the publishing industry, because if the data leaves their grimy little fingers, they can't charge rent.

    That's the real issue here.

  4. Re:Now that it's moving up the cognitive chain... by ewanm89 · · Score: 3, Insightful

    Name a journal that has paid a researcher to publish a paper. I'll tell you, there isn't one, researchers have to pay a "submission fee" to have their paper even considered if accepted copyright is often deferred to the journal, then they have to subscribe to the journal to read it. Infact the only thing the actual publisher pays for in this whole mess is the paper and ink to print the thing. I'm going to guess this is just another nail in the coffin of traditional academic journals as the researchers start taking more of their papers elsewhere for publishing.

  5. Re:Well, this is simple. by K.+S.+Kyosuke · · Score: 2

    Knowledge mining does not extract content. Knowledge mining extracts facts. Text mining analyzes and classifies documents, clusters them into groups, and tries to support further knowledge mining. About the only activity that I can think of that could qualify as potentially violating copyright is summarizing. Content may be protected by copyright laws, but the facts can't, and your comment isn't therefore very relevant. I really wonder how the publishers argue that copyright applies to this. Where I live (.cz), copyright not only explicitly applies to making copies; in fact, I believe there's a clause explicitly allowing using "using the work in scientific research", if the use of parts of the text "doesn't exceed the extent necessary for the intended goal". One could argue that mining facts, entities, and their relationships qualifies under this, and that it expressly doesn't qualify as making copies of a work perceived as an authored text.

    --
    Ezekiel 23:20
  6. Re:Now that it's moving up the cognitive chain... by interkin3tic · · Score: 4, Insightful
    You've failed to read the summary and now the reply you were replying to. The researchers are the ones who want data mining, the publishers do not, at least not without being paid more money. Not for adding any value either, just for slightly modifying the copyright. On data that shouldn't be theirs.

    Give the researchers a few years with the current trends, when it becomes clearer that if nobody associated with their work is getting paid for it, they won't be either.

    The researchers are paid with grants, they're not paid directly through publishing. If I publish a paper in Nature, it gets included in text mining, and people cite it from the text mine, that benefits me EVEN if no one ever actually reads the paper. If zero people pay for access to my article, that doesn't matter to me. If a billion people pay $30 to see my article, that doesn't matter to me. It matters only to the publisher.

    And data mining can't replace most researchers doing benchwork. Barring AI, data mining is not going to come up with brilliant theories or insights, and barring robots, data mining is not going to do benchwork.

    Publishers have a lot to fear from this, not researchers.

  7. Pulling out by ArtemaOne · · Score: 4, Funny

    Pulling out is not an effective method of prevention.

  8. Re:Now that it's moving up the cognitive chain... by reve_etrange · · Score: 2

    It's definitely another point against subscription-fee journals (the "traditional" label is ambiguous; are open-access journals with standard review structures "traditional?").

    That aside, I want to clarify: subscription-fee journals do not charge a submission fee, although they often charge for extra pages or color figures. The price then is signing over copyright on your work. In contrast, the open-access journals do charge a (quite large) submission fee. PLoS One for example charges $1,350. The OA journal then lets you keep copyright while licensing under Creative Commons or similar.

    --
    .: Semper Absurda :.
  9. Re:Now that it's moving up the cognitive chain... by interkin3tic · · Score: 2
    Let X = grants from the government. Plugging that into your text

    Grants will cease to exist when all value accrues to the companies mining and distributing the data by the most efficient means possible, such as Google... If X is a university or government agency providing funding or a grant, the economic process remains the same. Ultimately there is no reason for any value to accrue to any other entity, if the answer for Big Data Corp. is always "wait for somebody to provide content for free, or mine it, and slap an ad on it".

    That doesn't make any sense. The government is not looking for a direct return on the grant, particularly not in the form of publication income. The incentive for the government agency to provide the grants is still there: to advance science. If it ends up open access, published by elsevier or others, or packaged into google's data, the research is still being done.

    That's my point, as expressed. Frankly your evaluation of its correspondence with this particular case is of marginal interest to me. :p

    So you're just interested in spouting general economic principles and ignoring whether or not they apply to the topic of conversation? Because in this case, it doesn't. What you're suggesting is nonsense, but you don't care to hear that it's nonsense? Well then you'll never learn.

  10. Re:Well, this is simple. by reve_etrange · · Score: 2

    You are ignorant of how scientific publishing works. The publishers are the free loaders. Scientists did the research, wrote the papers, edited and peer reviewed them on a volunteer basis and, indeed, typeset the final print versions.

    The large scientific publishers are parasites who abuse their oligopoly powers to extract rents on the labor of the scientist.

    --
    .: Semper Absurda :.
  11. Aaron Swartz did a routine thing, journals knew it by hyperfl0w · · Score: 2

    As an NLP Bioinformatics guy, I believe the real crime Aaron Swartz committed was being in the news.

    He isn't the first to have that dataset and he wont be the last.
    We write papers using massive NLP scans of publications rather routinely.

    Most of the time, the papers are downloaded from PubMed (public funded) so they can't even complain about bandwidth costs, etc.

    For anyone who didn't know already, most subscription Publishers don't **DO** anything.
    They are only slightly better than patent trolls, and in some cases, worse.

  12. and 5) by aussersterne · · Score: 2

    academic culture and the academic generation gap.

    Hiring and tenure still involve large percentages of faculty that "came up" under the old system, and don't see the problem (don't have time to see the problem) that has emerged in academic publishing culture over the last couple of decades in particular. They don't see work published outside of the big name journals/publishers as "serious" or "academic" for the moment. So young academics wanting to build a career continue to support them and publish in them, as a pragmatic career-building move.

    But young academics by and large (at least in my wing of the social sciences) are incredibly jaded about academic publishing and are absolutely willing to shift the culture away from publishing with big journal mills—they just have to get hired, get tenure, and become "the academics of the world" first. Then, as they begin to be the ones making the hiring and tenure decisions, you can bet that as they consider the next crop of youngsters, they won't place the same premium on Springer, Elsevier, et. al. journals.

    The publishing mills are not long for the world, and they know it, which is why they're all trying to expand/reshape their product lines, business models, etc. away from straight print content licensing and toward academic SaaS and other similar offerings.

    --
    STOP . AMERICA . NOW