Competition Seeks Best Approaches To Detecting Plagiarism
marpot writes "Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation."
Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation
Now, I understand that plagiarism is common among the weakest of undergrad writers; but "machine translation from Spanish or German source documents" and "random text operations" seem like unrealistic experimental stimuli.
In order to be a success, a plagiarized paper has to survive scrutiny by automated systems, if any are deployed, and human graders, if any are paying attention. Machine translation and text mangling should trivially defeat automated systems, at least any that aren't cranked well into World o' false positives territory; but would they pass human scrutiny? Even if they did, handing in something produced by machine translation and text mangling would probably earn you a referral to "Remedial English 101 For Life".
Simply using words would not constitute plagiarism. You just can't allow students to use words that somebody else has used before.
For more information of this technique, please read my recent paper, Clickous Verandim Redundo Berata Quizzomandus.
He's getting rather old, but he's a good mouse.
Just imagine everyone's surprise when all the entrants turn in the exact same process.
If brevity is the soul of wit, then how does one explain Twitter?
A plagiarised paper just smells bad, and is characterized by shifts in voices and writing styles, sudden ignorance of the the critical points raised earlier. The same author who can't write a grammatically correct sentence one moment is throwing down complex constructions the next The harder part is identifying the source of the plagiarism. For undergraduate papers, even the harder part is trivial. After all, the point of plagiarism is that the author is too lazy to write anything original.
For academics (professors), the situation isn't all that different. Plagiarism is usually a mix of stupidity, laziness and pressure to get stuff done. It usually happens where big, popularizing authors try to rip off the obscure ones (go back twenty years a la Mr. Ambrose, or pick something in a different language, preferably Italian), or when someone needs a book in an obscure field, and tries to pirate something really obscure.
Even so, if a plagiarist has enemies who give a damn, they can find the source fairly fast. So why construct a test for the most obfuscated cases, when a plagiarist clever enough to obfuscate could simply write something original and sufficiently clever?
The tools are fairly good, but, in my experience, they'll always report 3-7% or so of your paper as plagiarized, just because it's pretty difficult to write about _anything_ without unknowingly using previously written words. I would _hope_ that anyone who would pursue disciplinary action from such a tool's results would at least take a look to see if the sections being flagged are consequential.
I have no idea how good they are with catching paraphrasing, though... it strikes me that the semi-intelligent plagiarizers would be doing that more than a straight copy and paste. There's also the "acceptable vs unacceptable" distinction to be made.
Plausible conjecture should not be misrepresented as proof positive.
... use the same system the US Patent Office uses for finding prior art.
On second thought, scratch that idea.
Have gnu, will travel.
And that is why I always change the font and margins on papers that I plagiarize...
My wife teaches for Phoenix. Probably 90% of the plagiarism she sees is from students copying and pasting whole papers word-for-word from random cheat sites. Occasionally she'll get someone who fails to properly quote sources, but that's very much the minority. For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating. They're just hoping they get away with it.
Plagiarism is a symptom of professors only being involved in the last step: reviewing the final product.
Require the students to submit multiple drafts. Meet with them for 15 minutes each and discuss their thought processes on the ongoing paper. You'll get better final products, teach people not to procrastinate, and smoke-out people who have no involvement in their "own work."
What, can't do that because you have 60 students in a class? Well, there's part of the problem too.
We're trying to find a technology solution to a problem with less student-teacher interaction. Typical!
Seriously, the humanities are in trouble. With over 6 billion people on the planet, it's extremely difficult to have an original thought. This sets the stage for endless repetition. Add to that the fact that the very process of teaching the humanities usually means imparting a teacher's single interpretation of the source material to the students who then do the natural thing when it comes to writing a paper and parrot back to the teacher what they've heard, knowing that's the only way to get a good grade, and the resulting combination is deadly.
The papers are all going to be similar from the beginning, because it's a rare instructor who actually encourages dissenting opinions (and that fault in teaching is a whole other discussion of its own). Then the papers are going to be similar because there really are only so many ways to interpret the source material that are defensible. And finally, the papers are heavily likely to be similar to at least one other paper written about the subject, when every paper ever written on the subject is considered (exactly what the plagiarism sites attempt to do).
I think the problem this competition is trying to solve is intractable in the face of the current educational system. It's gotten to the point where, if the software considers a large enough number of sources, even the instructor's own papers are going to look like plagiarism.
Hell, look at the Slashdot comment system. A million people read the front page, but only a few thousand post comments. Thousands more are content to simply moderate the comments, and face it, comments they agree with are more likely to be modded up, one way or another. Then compare the modded comments. We get a lot of duplicate or near duplicate thought, and hence near duplicate comments on every article. Why? Because when you get enough people together in one place, discussing the same subject in writing, there are only so many viewpoints and only so many comments that won't get modded down for being of the "cubic what?" variety.
Time to go back to grading on spelling and grammar. We've reached the end of the grading on ideas road. Coherency of presentation is all we have left. (One could argue it's all we ever had.)
For the most part, the cheaters aren't all that bright, nor do they try to hide their cheating.
How would you know? The best cheaters won't be caught, but that doesn't mean they're not cheaters.
I teach physics at a community college, and although I don't assign the kind of term papers you'd see in an English course, I do grade homework, lab writeups, and exams, and plagiarism is an issue that comes up. My school's policy is that the only punishment the professor can give for cheating is to assign a zero on that particular assignment. This is, in my opinion, almost no punishment at all; typically the reason people cheat is because they know they're going to fail, so assigning an F isn't a punishment, it's more like assigning the grade that the student actually earned. The school's administration tells us that this policy is the way it is because of a recent legal decision in California. Before this rule was imposed on us, my policy had been to give the student an F in the course if it was a serious case of cheating. In any case, my school, like most community colleges, has an extremely late drop deadline (the 14th week of the semester), so, e.g., if I give a student an F on an exam for cheating on the exam, the student will typically just drop the course, resulting in no penalty on his transcript other than a W, which will not affect his GPA.
My school does provide a process where the professor can file a form to report academic misconduct. The form is then supposed to be followed up on by the dean, filed somewhere, and referred to later if the student shows a repeating pattern of cheating. Theoretically the student can be expelled, but never on the first offense. My experience is that this process doesn't actually seem to work, because the administrators involved aren't interested in spending the time and meeting with angry students. The threat hanging over the heads of the profs and deans is always that the parents will sue. Avoiding lawsuits is always the administration's top priority, far higher than education.
The long and the short of it is that when a student makes a calculated decision to risk cheating, he's usually doing it based on a realistic assessment that the consequences of getting caught are extremely mild.
There is absolutely no way, at least at my school, that a student would ever be expelled for plagiarism. To get expelled, you would have to physically attack someone. You seem to be imagining a situation in which the professor and/or the school punishes the student just because a particular piece of software flashes a message on the screen saying "plagiarized." I can't believe that anyone would ever do that. Of course you're going to look at the text that matched, and see whether you really believe that it looks like it was plagiarized.
No, most professors do not have grad students to do this. I work at a community college. No grad students. My wife teaches at Cal State LA. They have grad students, but the grad students don't work as TAs or graders; the professors have to grade 100% of the written work.
I don't think anyone does trust such a decision to a program. They use the program as a first step.
Find free books.
And the Postmodernism Generator?
You don't have to write much of anything at all. Would you get a good grade? Fuck no. Would they FLUNK YOU FOR IT? Fuck no. Because its graded by untenured faculty who have to curry favour with students, or its graded by Grad Assistants who don't give a shit, and why should they.
Oh, look, a paper by Cindy Bleethstain. She's a fucking idiot. Let's see. Hmmmm. Yup. Incomprehensible bullshit, as usual. Give her a C+ because some of it is intelligible and kind of funny.
Oh, look another paper by Guido LeDouchebag. Bottlecaps are smarter than this turnip. Hmmm. Yup. More incomprehensible bullshit. C+. At least he finally discovered the spellchecker.
THAT'S what it is often like, unfortunately.
I read the paper, and if there is a passage that is noticeably different in tone, I'll copy past a section into Google and see where they pulled it. 9 times out of 10, it's a direct lift from a web page, unattributed. I send it back, and tell them "Footnotes, please. Also, automatic single grade loss. right off the top."
If it comes back still broken, then I nail 'em for plagiarism. It's a big deal, and requires paperwork I don't like to fill out...
So far I've only had one student have the cajones to not bother fixing their attributions, and he got crucified by the Ethics board. He was an arrogant little prick, too.
RS
Shoes for Industry. Shoes for the Dead.