Software Finds Plagiarism In Research
shmG writes "Researchers from the Virginia Bioinformatics Institute have created a seek-and-destroy program — for plagiarism. Called ET Blast, it's designed to find plagiarism in scientific papers. It does a full-text analysis, and then looks for similar publications in several databases. 'We have better literature,' Garner said. 'There are abstracts and full papers, and a database called Crisp, where you compare stuff to every grant the NIH gets. It's compared to any research that's been funded.'"
if you resubmit your own work, it's not plagiarism.
I can't blame the submitter for this one. The article itself uses the term "search and destroy" early on, yet says absolutely nothing about destroying anything.
I once had an English teacher who said, "If you have more than five consecutive words matching a source, without a citation then it's plagiarism." Perhaps that's how freshman writing assignments are graded, but it's silly when applied to scientific papers. Pick up any math paper on number theory, and you're bound to find the sentence "Let p be an odd prime number." without citation, but that would hardly qualify as plagiarism. Yet, syntactic matching appears to be exactly what this program is doing.
What constitutes "plagiarism" in a scientific paper is very different from plagiarism in journalism or English literature. In scientific writing, it is expected that authors will use the same flat, impersonal style and repeat definitions and the results of others to save the reader the time of having to look them up. So, simple pattern matching between science papers will result in a great many false positives. In science (and math) writing what matters is the new result which the author is claiming. It seems to me that it would be nearly impossible for a computer program to detect the distinction.