Slashdot Mirror


Study of Massive Preprint Archive Hints At the Geography of Plagiarism

sciencehabit writes with this excerpt from Science Insider: New analyses of the hundreds of thousands of technical manuscripts submitted to arXiv, the repository of digital preprint articles, are offering some intriguing insights into the consequences — and geography — of scientific plagiarism. It appears that copying text from other papers is more common in some nations than others, but the outcome is generally the same for authors who copy extensively: Their papers don't get cited much. The system attempts to rule out certain kinds of innocent copying: "It's a fairly sophisticated machine learning logistic classifier," says arXiv founder Paul Ginsparg, a physicist at Cornell University. "It has special ways of detecting block quotes, italicized text, text in quotation marks, as well statements of mathematical theorems, to avoid false positives."

53 comments

  1. Play the game by hessian · · Score: 1

    The game is to find a unique angle to approach your research that's essentially clickbait, then produce some results, and figure out some way you can claim victory and go home.

    If you're just doing this to get on to the next stage, it makes sense to plagiarize and get it out of the way. You can get to the nice fat yearly income that way without having to know much of anything.

    Do we have a quality of scientists problem because science is such an esteemed (and often well-paid, in private practice at least) career that people who should not be scientists are trying to be scientists?

    1. Re:Play the game by CaptainDork · · Score: 1

      The game is to find a unique angle to approach your research that's essentially clickbait, then produce some results, and figure out some way you can claim victory and go home.

      If you're just doing this to get on to the next stage, it makes sense to plagiarize and get it out of the way. You can get to the nice fat yearly income that way without having to know much of anything.

      Do we have a quality of scientists problem because science is such an esteemed (and often well-paid, in private practice at least) career that people who should not be scientists are trying to be scientists?

      © CaptainDork November 16, 1960

      You bastard/bitch as applies.

      --
      It little behooves the best of us to comment on the rest of us.
  2. who cares about plagiarism by Anonymous Coward · · Score: 0, Interesting

    Why does anyone need 'credit' for ideas? I think that recognizing a good idea is more important than boiling down credit to a single group or entity.

    We need to divest ourselves from the mind virus that intellectual property. The idea of credit is just another lump on that intellectual property turd.

    1. Re:who cares about plagiarism by PvtVoid · · Score: 4, Insightful

      Why does anyone need 'credit' for ideas?

      Because it allows funding agencies, university tenure committees, etc. to determine which people are contributing useful new science to the world, and which people are dead wood sucking at the teat of an academic salary without creating anything useful to anybody.

    2. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      So you are saying that the only reason that people do anything is for recognition or money?

      Are you?

    3. Re:who cares about plagiarism by ShanghaiBill · · Score: 1

      Why does anyone need 'credit' for ideas?

      Because "credit" leads to funding, pay raises, job security, and social status.

      I think that recognizing a good idea is more important than boiling down credit to a single group or entity.

      The problem is that research dollars go to a "group or entity" not to an idea. If you want to get rid of credit and recognition, then you need to propose an alternative way to provide incentives for productive research.

    4. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      I disagree. I think that the notion of credit creates a an incentive to get credit, not produce something worthwhile.

    5. Re: who cares about plagiarism by Anonymous Coward · · Score: 0

      I'll play along... So, how do you recognize something that is "worthwhile"?

      I'll suggest that we give it "credit"

    6. Re: who cares about plagiarism by Anonymous Coward · · Score: 0

      You recognize a good idea by familiarizing yourself with it and hopefully using it. Why would there be a need for 'atta boys', platitudes, and tribute?

    7. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      Yup. I enjoy doing my job, but if they quit paying me, or if somebody else offers to pay me more, I'm out.

    8. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      As would I (OP). That being said, if your job is to produce research, the credit is most likely going to your company, and not you. And if they are producing that research, they probably have a way to monetize it.

    9. Re:who cares about plagiarism by GuB-42 · · Score: 1

      So you are saying that the only reason that people do anything is for recognition or money?

      Are you?

      It is not the only reason but for a large part : yes.
      Yeah, some people genuinely love their jobs, and unfortunately, it looks like they are a minority. And even those who love their jobs wouldn't do it if they couldn't earn enough money from it.
      And useful work that's done for neither recognition nor money... yes, it happens, in the same way that a coin can land on its edge. We may do small things out of pure generosity but science is no small thing. It requires time and skill and I believe it is normal for scientists to expect something in return.

    10. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      Expect something in return?

      Why isn't it payback for all the previous research that was the foundation of their new research?

      'If I see farther than others, it's because I stood on the shoulders of giants'

    11. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      Because unless you plagiarize the entirety of a work, then you're only including a small snippet of the analysis and logic of an argument or experiment. Without citation, then, it becomes much more difficult to assess and critique an argument or experiment. Scholarship is supposed to be rigorous, because your work isn't meant to be thrown away--good work should be something that people will regularly go back to, and that means it needs to be transparent and verifiable as much as possible. Humanity would get nowhere if we had to regularly return to every experiment or argument.

      There are other reasons, particularly the need to assess credibility. Aristotelian logic has limitations in meatspace. And then there are fuzzy moral issues like identity, which is the only aspect that you appeared to have considered.

    12. Re:who cares about plagiarism by PvtVoid · · Score: 2

      So you are saying that the only reason that people do anything is for recognition or money?

      Are you?

      No, I am saying that the people who have an interest in assigning credit for work are the people who provide funding and jobs, because they don't want to provide either funding or jobs to people who are not actually creating new ideas. These are also the people who pay for journal subscriptions, fund conferences and professional societies, and confer degrees.

      As far as the people who do the research are concerned, very few of them would be able to continue doing research in the absence of funding. Do you think lab equipment, office space, and staff are free?

    13. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      I disagree that you need to consider the source when evaluating an idea. If anything, that can only create a negative effect (Oh it's Hitler's idea, so let's throw it away without further thought), or even worse (It's Jesus's idea, to consider any others would be heresy).

      Scholarship is rigorous and that is why the idea of credit is absolutely unnecessary for the advancement of scholarly endeavors. It needs to about the advancement of ideas, not the hero-worship fame-whoring we have now.

      If anything, the need for credit is a race to the bottom. Find the research that produces the most credit with the least effort.

    14. Re:who cares about plagiarism by Capt.Albatross · · Score: 1

      So you are saying that the only reason that people do anything is for recognition or money?

      Are you?

      No, clearly not - that would be an unjustified extrapolation of an unwarranted generalization of a simplistic reading of the point under discussion.

    15. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      I can't see very far at all. It's something to do with all these fucking dwarves on my shoulders.

    16. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      Because sometimes the original source of data, information or conclusions has errors in them. Then if you copy their paper, if they are wrong, so are you. But if you reference where you got your information or data from, then you can pass on the blame back to them.

    17. Re: who cares about plagiarism by bzipitidoo · · Score: 3, Interesting

      The idea of credit is just another lump on that intellectual property turd.

      Let's be clear on what plagiarism is. It's deliberately and knowingly claiming authorship of the work of others. It's lying about who created a work.

      Plagiarism and intellectual property need not have anything to do with each other. The people who argue that copyright prevents plagiarism are either confused, or trying to scrape up another justification to keep copyright. I think copyright should be abolished. And, that independent of whether copyright exists or not, plagiarism will still be undesirable, and that we can detect and punish those who do it. You don't see grade school students who are caught committing plagiarism being beat over the head with a copyright lawsuit, you see them punished with a failing grade, and perhaps detention.

      Having said that, we don't want to get too extreme about plagiarism, start seeing it everywhere. Duplicate chess problems, in which someone honestly creates essentially the same problem that someone else did, maybe 100 years ago, are so common that there's a term for it: anticipation. Chess has been around for centuries, and it is getting harder to find original and novel concepts. Anticipation may become a problem in many other areas as they mature. George Harrison famously committed "subconcious" copyright infringement (plagiarism really) with My Sweet Lord, how should that be handled? The day will come, may already be here, when every possible short melody has been composed. What about ghostwriting, should that be accepted? We also don't want people bogged down trying to give due credit for everything. Otherwise, a research paper would have to credit the Phonecians for inventing the alphabet, lots of Greeks for various elementary mathematical concepts, the Babylonians for the base 60 time system we still use today, and maybe the Egyptians for papyrus, if the research is indeed printed on actual paper.

      --
      Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
    18. Re: who cares about plagiarism by Anonymous Coward · · Score: 0

      The incentive for plagiarism is the credit. Remove the incentive for credit, and there is no point in plagiarism.

      I tried to resolve the issue of intellectual property (copyright, authorship among other things) and credit. Ultimately credit was being used as a tool to keep intellectual property. I had to throw out credit. It'd be nice if a great idea could be attributed to someone for some recognition, but ultimately, that was a distant second to the good idea - which is what is important.

      Every idea is based on other ideas. Ideas that existed before we did. Sometimes the best ideas are just a reshaping of an existing idea. What's important is the idea. Not the who (credit).

    19. Re:who cares about plagiarism by bigman2003 · · Score: 2

      Oh yeah baby- money makes the (academic) world go round.

      I work in academia. You've never seen a researcher drop a project that "is his/her life's passion" as fast as when the money dries up.

      I do IT for these people. As soon as that grant is done, you might as well pull the plug. Otherwise it becomes MY project- because they have moved on.

      I've shit-canned websites with tons of good info that receive millions of page views per year, because the researcher doesn't care about it anymore. And since it is not my name on the paper, I can't take any responsibility for it.

      I can say with about 99% certainty, that the only reason those projects were started was recognition or money. And even the recognition part means nothing once the academic has a few years of work under their belt. Because at that point the only recognition they care about would be academic journals.

      --
      No reason to lie.
    20. Re:who cares about plagiarism by Anonymous Coward · · Score: 0

      If I wasn't an anonymous coward, I'd give you all my modpoints.

  3. Something's very off. by Anonymous Coward · · Score: 0

    Anglo-Saxon types might be quick to say, "Oh look what a surprise more plagiarism in 2nd and 3rd worlds!" but there is something very fucking serious going on if five per cent of submissions involved in plagiarism... and I'm going to conjecture that the "very fucking serious" thing going on is a technocratic one: the plagiarism detection software is inadequate - a technology I have found to be universally shit wherever used - but probably received more testing from people familiar with European languages.

    1. Re:Something's very off. by CaptainDork · · Score: 1

      Most angled saxophone players are plagiarist bastards who perform covers without paying into the ASCAP, BMI, SOCAN, or PRS group and stuff.

      --
      It little behooves the best of us to comment on the rest of us.
    2. Re:Something's very off. by peon_a-z,A-Z,0-9$_+! · · Score: 1

      However, when everything is published in English (no matter the country), then does your point matter?

  4. Political context? by Empiric · · Score: 1

    For example, more than 20% (38 of 186) of authors who submitted papers from Bulgaria were flagged, more than eight times the proportion from New Zealand (five of 207). In Japan, about 6% (269 of 4759) of submitting authors were flagged, compared with over 15% (164 out of 1054) from Iran.

    I suspect that the ratio in countries where the motivation could -literally- be publish or perish, will be consistently higher than those where the saying is figurative.

    --
    ~ Whence do you come, slayer of men, or where are you going, conqueror of space?
    1. Re:Political context? by phantomfive · · Score: 1
      I suspect that the ratio in countries where the motivation could -literally- be publish or perish, will be consistently higher than those where the saying is figurative.

      Interestingly, using the word 'perish' to mean, 'lose one's job' is also figurative.

      --
      "First they came for the slanderers and i said nothing."
  5. For the countries... by jellomizer · · Score: 1

    I am wondering how many of the people who are flagged as plagiarizing in countries with a low rate, if they are originally from countries with a higher rates.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  6. i've read this article before by Anonymous Coward · · Score: 0

    it was printed in newsweek 17 years ago, verbatim.

  7. I have studied the issue extensively by Anonymous Coward · · Score: 4, Funny

    And, I have found that copying text from other papers is more common in some nations than others, but the outcome is generally the same for authors who copy extensively: Their papers don't get cited much.

    1. Re:I have studied the issue extensively by ColdWetDog · · Score: 0

      And, I have found that copying text from other papers is more common in some nations than others, but the outcome is generally the same for authors who copy extensively: Their papers don't get cited much.

      Funny. That's exactly what TFA said.

      It's almost as if you plagiarized it.

      --
      Faster! Faster! Faster would be better!
  8. Study pinpoints "lazy" authors too by Bearhouse · · Score: 1

    about one in 16 arXiv authors were found to have copied long phrases and sentences from their own previously published work

    OK, sometimes quoting your own work may be legit, but this sounds more like simple boilerplate cut and paste

    1. Re:Study pinpoints "lazy" authors too by starless · · Score: 1

      I work a lot with data from astronomy satellites. A lot of the first steps of the analysis, and describing the spacecraft
      and its instruments are very close to the same from paper to paper of mine. (And similarly for other people doing similar
      work.) This results in a lot of near (and sometimes exact) duplication of text. However, I believe this is still valid
      and necessary. The heart of the paper - i.e. the new results and conclusions - does still differ of course!

  9. But... but... 'no such thing as race'... LOL by Anonymous Coward · · Score: 0

    "copying text from other papers is more common in some nations than others"

    I presume they mean "plagiarism is more common in some nations than others", but hey, they are talking to Americans here...

    Let me guess which nations had the most dishonest people in them. India, perchance? China? We can leave Africa out (I know it's a continent), they don't 'do' tests there...LOL.

  10. Egypt must have a lot of Vice Presidents by Anonymous Coward · · Score: 1
  11. Irony...this paper will now be... by Anonymous Coward · · Score: 0

    the most massively plagiarized paper in the history of the universe.

  12. Seems more like a relationship with quantiy by medv4380 · · Score: 1

    I don't have there whole data, but they did put up 10 or so on their nice little map. Seems more like the fewer papers a country has the higher the percentage of plagiarism. However, the US has so many papers in this study it should be divided into smaller regions.

  13. Gaming the system by Anonymous Coward · · Score: 2, Informative

    I wonder how much these disparities are due to western researchers knowing how to game the system. Some 10 years ago I received a warning related to "self-plagiarism" because I had copied the definition of a problem from one of my previous papers (one column, the rest of the paper was completely new). Since then, I know I have to change the text of the problem definition between two papers, even if it is the same. In the meantime, I have seen people submit the same work to two different conferences after changing just the wording of the papers (or the presentation), and not being charged with plagiarism (especially if they are well-established in the field). Actual plagiarism I have only seen in one paper with chinese authors. So, presuming most plagiarism is in fact self-plagiarism I wonder how pertinent the results are.

    1. Re:Gaming the system by tlhIngan · · Score: 1

      Some 10 years ago I received a warning related to "self-plagiarism" because I had copied the definition of a problem from one of my previous papers (one column, the rest of the paper was completely new). Since then, I know I have to change the text of the problem definition between two papers, even if it is the same.

      So why not just quote yourself then? I mean, self-plagiarism is just like plagiarism (except you're presenting existing ideas as new, rather than other's ideas as yours).

      Is it too hard to cite oneself? Is it frowned upon? Or does it just not seem like plagiarism when you're the one doing it to yourself?

  14. Plagirisim detection programs: by Anonymous Coward · · Score: 0

    Eurocentric, every last one of 'em

  15. Moral of the story by phantomfive · · Score: 2

    If you're going to plagiarize, don't upload your paper to arXiv.

    --
    "First they came for the slanderers and i said nothing."
  16. Correlations anyone? by Anonymous Coward · · Score: 0

    I'm too lazy to do the math but I bet there's a correlation w/Transparency International's Corruption Index. Causation is also an exercise left for the reader but I'd guess it's a cultural thing.

    1. Re:Correlations anyone? by CronoCloud · · Score: 2

      Looks similar to the Media/Game piracy maps I've seen too.

      Anglophones plagiarize/pirate the least, followed by western Europe and Japan, Middle east/Eastern Europe/Third world plagiarizes/pirates the most.

      Considering that we've read that in some countries they have seminars/meetings to tell their business people who come to America to NOT try to use the bribery/graft thing that's usual in their countries, especially not with law enforcement..it probably is cultural.

      I'm not saying that there isn't bribery in America, but that it's not as much a cultural thing. We put people in PMITA federal prison for what would be considered penny-ante expected stuff in most countries.

  17. Return of Soviet Union? by Anonymous Coward · · Score: 0

    I could not understand if the politcal map represents the past or the future, as a bunch of countries are shown to be a part of Russia.

    1. Re:Return of Soviet Union? by Anonymous Coward · · Score: 0

      Well, Czechoslovakia returns as well!

  18. Some countries' education systems reward parroting by hey! · · Score: 5, Interesting

    Some countries place a high premium on memorizing and repeating back the teacher's words. These countries still produce their share of good and bad engineers, but they're sometimes bad in unrecognizable ways.

    I once hired a software engineer from a third world country who had an encyclopedic knowledge of design patterns. You could name any pattern in the GoF *Design Patterns* book and he could reel off the UML without hesitation and give a convincing sounding explanation of how the pattern worked. But when I started inspecting his code, I quickly realized he had no understanding of what any of it meant. It was just pictures and words he'd memorized, an impressive and prodigious feat, but ultimately useless to me.

    Now I should say I've hired some very good software engineers from this country; it's not that they don't make good engineers over there. For most people the discipline to absorb a lot of information yields many benefits. But this guy was an outlier; he managed to get a master's degree over there in a subject he had no practical understanding of whatsoever.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  19. Doesn't look right to me... by djupedal · · Score: 1

    Breakdown by region doesn't mean anything. IMHE, India should be a top offender. Maybe the fails in the US are all from native Indians and Bulgarians....we don't know.

    1. Re:Doesn't look right to me... by junkgoof · · Score: 1

      They are, I think, from the stats.

      --
      You got me into this! You were the ideologue! I'm only a poor assassin! - Twenty evocations, Bruce Sterling
  20. Come on now ... by cwarrior · · Score: 1

    I saw this same exact post over on reddit yesterday, but it was posted by a different user ...

  21. standard recitations at the beginning by Anonymous Coward · · Score: 0

    This..
    You either write a paper which can be understood on its own (describing the instrument and spacecraft in summary) or you have to reference some other paper, in some other journal, which may not be as readily available. A lot of the really good "here's how the spacecraft works, it's mission plan, all the instruments" papers get presented at conferences by the engineers who developed the equipment and ran the mission. And conference papers are a lot harder to come by than journal papers for a variety of reasons.

    Behind all this is that the employers and evaluators of those Engineers are not as driven by "peer-reviewed journal pubs" as they are by "delivered on time and within budget" and they're really not wild about "I need to be kept on the charge number for 2 years while I shepherd the paper through the review process at the journal". Make no mistake, in modern business, there is no "spare time on nights and weekends" to get your papers published. Nope, your night and weekend time is going to be spent trying to deliver on time and within budget for the next spacecraft. The manager of the engineer delivering that astronomy instrument doesn't care a rodent's fuzzy behind whether a paper is ever published.

    So, if you're a scientist publishing that paper about your new findings, you'd better put a canned description of the instrument and mission in your paper.

  22. The geographical presentation is flawed by Required+Snark · · Score: 2
    There is an intrinsic problem with the map presentation: it ignores the relative number of papers from each country. This can lead to a distorted perception for countries with a small number of papers in the data set.

    To quote the article "It shows only the incidence of flagged authors for the 57 nations with at least 100 submitted papers, to minimize distortion from small sample sizes." If a country has a total number of papers in the hundreds it implies the number of authors is also low. Therefor, a small number of authors who routinely plagiarize can have a major effect.

    It's analogous to a small town with a very low crime rate. All it takes is a few significant incidences to cause a huge jump in the statistics.

    For comparison, it would be interesting to see the rates for other kinds of text reuse. From the article:

    After filtering out review articles and legitimate quoting, about one in 16 arXiv authors were found to have copied long phrases and sentences from their own previously published work that add up to about the same amount of text as this entire article.

    For comparison it would be useful to see the percentage of this reuse displayed on another map. I have a strong suspicion that countries that look good on the presented map would not look nearly as good by this measure.

    --
    Why is Snark Required?