Slashdot Mirror


Study of Massive Preprint Archive Hints At the Geography of Plagiarism

sciencehabit writes with this excerpt from Science Insider: New analyses of the hundreds of thousands of technical manuscripts submitted to arXiv, the repository of digital preprint articles, are offering some intriguing insights into the consequences — and geography — of scientific plagiarism. It appears that copying text from other papers is more common in some nations than others, but the outcome is generally the same for authors who copy extensively: Their papers don't get cited much. The system attempts to rule out certain kinds of innocent copying: "It's a fairly sophisticated machine learning logistic classifier," says arXiv founder Paul Ginsparg, a physicist at Cornell University. "It has special ways of detecting block quotes, italicized text, text in quotation marks, as well statements of mathematical theorems, to avoid false positives."

10 of 53 comments (clear)

  1. Re:who cares about plagiarism by PvtVoid · · Score: 4, Insightful

    Why does anyone need 'credit' for ideas?

    Because it allows funding agencies, university tenure committees, etc. to determine which people are contributing useful new science to the world, and which people are dead wood sucking at the teat of an academic salary without creating anything useful to anybody.

  2. I have studied the issue extensively by Anonymous Coward · · Score: 4, Funny

    And, I have found that copying text from other papers is more common in some nations than others, but the outcome is generally the same for authors who copy extensively: Their papers don't get cited much.

  3. Gaming the system by Anonymous Coward · · Score: 2, Informative

    I wonder how much these disparities are due to western researchers knowing how to game the system. Some 10 years ago I received a warning related to "self-plagiarism" because I had copied the definition of a problem from one of my previous papers (one column, the rest of the paper was completely new). Since then, I know I have to change the text of the problem definition between two papers, even if it is the same. In the meantime, I have seen people submit the same work to two different conferences after changing just the wording of the papers (or the presentation), and not being charged with plagiarism (especially if they are well-established in the field). Actual plagiarism I have only seen in one paper with chinese authors. So, presuming most plagiarism is in fact self-plagiarism I wonder how pertinent the results are.

  4. Moral of the story by phantomfive · · Score: 2

    If you're going to plagiarize, don't upload your paper to arXiv.

    --
    "First they came for the slanderers and i said nothing."
  5. Re:Correlations anyone? by CronoCloud · · Score: 2

    Looks similar to the Media/Game piracy maps I've seen too.

    Anglophones plagiarize/pirate the least, followed by western Europe and Japan, Middle east/Eastern Europe/Third world plagiarizes/pirates the most.

    Considering that we've read that in some countries they have seminars/meetings to tell their business people who come to America to NOT try to use the bribery/graft thing that's usual in their countries, especially not with law enforcement..it probably is cultural.

    I'm not saying that there isn't bribery in America, but that it's not as much a cultural thing. We put people in PMITA federal prison for what would be considered penny-ante expected stuff in most countries.

  6. Some countries' education systems reward parroting by hey! · · Score: 5, Interesting

    Some countries place a high premium on memorizing and repeating back the teacher's words. These countries still produce their share of good and bad engineers, but they're sometimes bad in unrecognizable ways.

    I once hired a software engineer from a third world country who had an encyclopedic knowledge of design patterns. You could name any pattern in the GoF *Design Patterns* book and he could reel off the UML without hesitation and give a convincing sounding explanation of how the pattern worked. But when I started inspecting his code, I quickly realized he had no understanding of what any of it meant. It was just pictures and words he'd memorized, an impressive and prodigious feat, but ultimately useless to me.

    Now I should say I've hired some very good software engineers from this country; it's not that they don't make good engineers over there. For most people the discipline to absorb a lot of information yields many benefits. But this guy was an outlier; he managed to get a master's degree over there in a subject he had no practical understanding of whatsoever.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  7. Re:who cares about plagiarism by PvtVoid · · Score: 2

    So you are saying that the only reason that people do anything is for recognition or money?

    Are you?

    No, I am saying that the people who have an interest in assigning credit for work are the people who provide funding and jobs, because they don't want to provide either funding or jobs to people who are not actually creating new ideas. These are also the people who pay for journal subscriptions, fund conferences and professional societies, and confer degrees.

    As far as the people who do the research are concerned, very few of them would be able to continue doing research in the absence of funding. Do you think lab equipment, office space, and staff are free?

  8. Re: who cares about plagiarism by bzipitidoo · · Score: 3, Interesting

    The idea of credit is just another lump on that intellectual property turd.

    Let's be clear on what plagiarism is. It's deliberately and knowingly claiming authorship of the work of others. It's lying about who created a work.

    Plagiarism and intellectual property need not have anything to do with each other. The people who argue that copyright prevents plagiarism are either confused, or trying to scrape up another justification to keep copyright. I think copyright should be abolished. And, that independent of whether copyright exists or not, plagiarism will still be undesirable, and that we can detect and punish those who do it. You don't see grade school students who are caught committing plagiarism being beat over the head with a copyright lawsuit, you see them punished with a failing grade, and perhaps detention.

    Having said that, we don't want to get too extreme about plagiarism, start seeing it everywhere. Duplicate chess problems, in which someone honestly creates essentially the same problem that someone else did, maybe 100 years ago, are so common that there's a term for it: anticipation. Chess has been around for centuries, and it is getting harder to find original and novel concepts. Anticipation may become a problem in many other areas as they mature. George Harrison famously committed "subconcious" copyright infringement (plagiarism really) with My Sweet Lord, how should that be handled? The day will come, may already be here, when every possible short melody has been composed. What about ghostwriting, should that be accepted? We also don't want people bogged down trying to give due credit for everything. Otherwise, a research paper would have to credit the Phonecians for inventing the alphabet, lots of Greeks for various elementary mathematical concepts, the Babylonians for the base 60 time system we still use today, and maybe the Egyptians for papyrus, if the research is indeed printed on actual paper.

    --
    Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
  9. The geographical presentation is flawed by Required+Snark · · Score: 2
    There is an intrinsic problem with the map presentation: it ignores the relative number of papers from each country. This can lead to a distorted perception for countries with a small number of papers in the data set.

    To quote the article "It shows only the incidence of flagged authors for the 57 nations with at least 100 submitted papers, to minimize distortion from small sample sizes." If a country has a total number of papers in the hundreds it implies the number of authors is also low. Therefor, a small number of authors who routinely plagiarize can have a major effect.

    It's analogous to a small town with a very low crime rate. All it takes is a few significant incidences to cause a huge jump in the statistics.

    For comparison, it would be interesting to see the rates for other kinds of text reuse. From the article:

    After filtering out review articles and legitimate quoting, about one in 16 arXiv authors were found to have copied long phrases and sentences from their own previously published work that add up to about the same amount of text as this entire article.

    For comparison it would be useful to see the percentage of this reuse displayed on another map. I have a strong suspicion that countries that look good on the presented map would not look nearly as good by this measure.

    --
    Why is Snark Required?
  10. Re:who cares about plagiarism by bigman2003 · · Score: 2

    Oh yeah baby- money makes the (academic) world go round.

    I work in academia. You've never seen a researcher drop a project that "is his/her life's passion" as fast as when the money dries up.

    I do IT for these people. As soon as that grant is done, you might as well pull the plug. Otherwise it becomes MY project- because they have moved on.

    I've shit-canned websites with tons of good info that receive millions of page views per year, because the researcher doesn't care about it anymore. And since it is not my name on the paper, I can't take any responsibility for it.

    I can say with about 99% certainty, that the only reason those projects were started was recognition or money. And even the recognition part means nothing once the academic has a few years of work under their belt. Because at that point the only recognition they care about would be academic journals.

    --
    No reason to lie.