Competition Seeks Best Approaches To Detecting Plagiarism
marpot writes "Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation."
Here's an insightful fact related to this article:
Little is known about plagiarism detection accuracy
Does your school/university check your homeworks/theses for plagiarism? Nowadays, probably Yes, but are they doing it properly? Little is known about plagiarism detection accuracy, which is why we conduct a competition on plagiarism detection, sponsored by Yahoo! We have set up a corpus of artificial plagiarism which contains plagiarism with varying degrees of obfuscation, and translation plagiarism from Spanish or German source documents. A random plagiarist was employed who attempts to obfuscate his plagiarism with random sequences of text operations, e.g., shuffling, deleting, inserting, or replacing a word. Translated plagiarism is created using machine translation
I think the hardest plagiarism to spot is one where you copy the main idea but you put everything into your own sentence. The main reason is that semantics is still an open problem in AI.
Simply using words would not constitute plagiarism. You just can't allow students to use words that somebody else has used before.
For more information of this technique, please read my recent paper, Clickous Verandim Redundo Berata Quizzomandus.
He's getting rather old, but he's a good mouse.
Just imagine everyone's surprise when all the entrants turn in the exact same process.
If brevity is the soul of wit, then how does one explain Twitter?
... use the same system the US Patent Office uses for finding prior art.
On second thought, scratch that idea.
Have gnu, will travel.
Calculate an md5 hash of the paper, if it matches the md5 of another, it's plagiarized.
Shit, by the time I came back to the keyboard after writing this post and not hitting submit, there were 30 other posts that said the same thing. I must be a plagiarist.... Damnit.