Fake Scientific Paper Detector

← Back to Stories (view on slashdot.org)

Fake Scientific Paper Detector

Posted by ryuzaki0 on Tuesday April 25, 2006 @08:22AM from the paper-unnoticed-amidst-conference-white-noise dept.

moon_monkey writes "Ever wondered whether a scientific paper was actually written by a robot? A new program developed by researchers at Indiana University promises to tell you one way or the other. It was actually developed in response to a prank by MIT researchers who generated a paper from random bits of text and got it accepted for a conference."

5 of 277 comments (clear)

Min score:

Reason:

Sort:

Re:That's good and all by visgoth · 2006-04-25 08:34 · Score: 4, Informative

Oh, I'm sure the work of monkeys is quite easily identifiable.

--
My patience is infinite, my time is not.
Only works for scientific papers by gurps_npc · 2006-04-25 08:35 · Score: 4, Informative

If you try to use it on any human written NON scientific paper, such as Lincoln's gettyburg address, it almost always considers it false.
I suspect that it is looking for the conventional thinking with conventional word structure. As such, it is NOT a good idea i

--
excitingthingstodo.blogspot.com
1. Re:Only works for scientific papers by nasor · 2006-04-25 09:07 · Score: 4, Informative
  
  No, it doesn't even seem to work on scientific papers. I submitted four papers from the latest issue of Inorganic Chemistry and it thought 2 out of 4 were false:
  
  Inauthentic: Assembly of a Heterobinuclear 2-D Network: A Rare Example of Endo- and Exocyclic Coordination of PdII/AgI in a Single Macrocycle.
  
  Inauthentic: Pyrazolate-Bridging Dinucleating Ligands Containing Hydrogen-Bond Donors: Synthesis and Structure of Their Cobalt Analogues
  
  Authentic: Manganese Complexes of 1,3,5-Triaza-7-phosphaadamantane (PTA): The First Nitrogen-Bound Transition-Metal Complex of PTA
  
  Authentic: Structure, Luminescence, and Adsorption Properties of Two Chiral Microporous Metal-Organic Frameworks
  
  Based on this (small) sampling, the program doesn't appear to do any better than if it were to guess randomly. I wonder if this thing is even supposed to work, or if it just returns a random result based on a hash of the paper or something?
I am in awe by DingerX · 2006-04-25 09:02 · Score: 4, Informative

So I go there, and I start shoving it text from my hard drive. I try:

A) Text of an article (Philosophy) I (native English speaker) wrote in Italian: 98.5 Authentic.
B) Text of an article I wrote in English (History): 87.8
C) Text of an article (History) written in French by a native French speaker and translated into English: 93.2
D) Critical edition of a 14th-century Latin text (Theology): 97.7 Authentic.
E) Documentation to a Field Artillery Simulation: 95.3
F) A completely bogus narrative for a monastic order that doesn't exist, written in a style that mimics A)-C): 16.8% Inauthentic

So in this case, we have a human written document that has superficial meaning, but is written as a "fake scientific paper", and registering as such.

And yes, I did read the "purpose" of the page; I know it's not supposed to detect it.

And yet it does, decisively.
Read the Paper - Looks at Repetition by Constantine+Evans · 2006-04-25 11:43 · Score: 3, Informative

Read the paper listed in the menu of the website. The system essentially compresses the text with different window sizes, and then looks at the compression factors. In other words, it is only looking for repetition of strings. This is absurdly easy to fool, and the MIT generator could be easily fixed to pass this filter. For example, try entering a random text once (your post, for example). Note that it fails. Then append a few copies of the same text, and run that through. Your post, when run once, is too short. When run with two copies, it is rejected as 41.2%. When run with three, it passes with 93%. There is a window of repetition level required in order to pass - papers that do not repeat enough are classified as fake, as well as papers that repeat too much (try entering twenty copies of your post).

It should be relatively simple to make a random paper generator that always passes this test with a higher probability than human-written papers.