Slashdot Mirror


Fake Scientific Paper Detector

moon_monkey writes "Ever wondered whether a scientific paper was actually written by a robot? A new program developed by researchers at Indiana University promises to tell you one way or the other. It was actually developed in response to a prank by MIT researchers who generated a paper from random bits of text and got it accepted for a conference."

47 of 277 comments (clear)

  1. Yes! by stupidfoo · · Score: 4, Funny

    I am always wondering what those damn robots are up to!

    1. Re:Yes! by Krakhan · · Score: 2, Funny

      ROBOT HOUSE!!!

    2. Re:Yes! by Schemat1c · · Score: 2, Funny

      I am always wondering what those damn robots are up to!

      They use old people's medicine for fuel.

      --

      "Nobody knows the age of the human race, but everybody agrees that it is old enough to know better." - Unknown
    3. Re:Yes! by Ruff_ilb · · Score: 2, Interesting

      They did; the board that accepted the MIT paper, not consisting of specialists in the field, was likely confused by the pseudo-scientific gibberish they encountered. By mastering the methodology for the typical unification of access points and redundancy, the MIT students were able to effectively enter the scientific conference.

      --
      http://www.TheGamerNation.com/Forums
    4. Re:Yes! by MarkChovain · · Score: 2, Funny

      I for one, welcome our paper writing robot overlords.

      I for one am a paper writing robot overlord, you insensitive clod! I for one welcome our new video game consoles. They are called "hands". Shouldn't it be something like this will ever happen then you will see that they bring things out in managable increments. Sure it is a biggish program, but many lone hackers have written one in under one person/year.

  2. That's good and all by XxtraLarGe · · Score: 4, Funny

    but I wonder if it can tell if a paper was written by a million monkeys pounding on typewriters?

    --
    Taking guns away from the 99% gives the 1% 100% of the power.
    1. Re:That's good and all by denverradiosucks · · Score: 4, Funny

      Obligatory Simpson's Quote

      Monkey's typing on a typewriter as Mr. Burn's is working on the next great american novel:

      Burns: This is a thousand monkeys working at a thousand typewriters. Soon they'll have written the greatest novel known to man.
      (monkey smoking cigar typing on a typewriter)
      Burns: Lets see. It was the best of times, it was the BLURST of times! You stupid monkey! (Smacks monkey upside his head)

    2. Re:That's good and all by visgoth · · Score: 4, Informative

      Oh, I'm sure the work of monkeys is quite easily identifiable.

      --
      My patience is infinite, my time is not.
    3. Re:That's good and all by Rakshasa+Taisab · · Score: 2, Funny

      I kinda enjoy getting mod points, it would be sad if they replaced that feature.

      --
      - These characters were randomly selected.
    4. Re:That's good and all by iNetRunner · · Score: 3, Funny

      Seems like it would be easier to develop a program that automatically detects /. dupes.. but no.

      *At least the million /. pounding monkeys detect it..*

      --
      Store with salt
  3. Turing test? by Nesetril · · Score: 5, Insightful

    so can a robot write a paper and then decide whether the paper was written by a robot (itself)?

    --
    Jesus said to his disciples: "If you don't have a sword, sell your cloak and buy one" - Luke 22:36
    1. Re:Turing test? by ironring2006 · · Score: 2, Informative
      Speaking of Turing, this showed up in the references for the automatic paper that I generated:

      Turing, A., Wilkes, M. V., Nehru, B., Wang, F. Z., Subramanian, L., Zhao, W., Beaman, N. A., Turcotte, B. A., and Wu, V. Refining consistent hashing and 16 bit architectures with SandyEos. Journal of Efficient, Highly-Available Communication 1 (Apr. 2002), 50-62.
      Glad to see he's still contributing to the field from the grave!
  4. Testing... by OakDragon · · Score: 2, Interesting
    "We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."

    RESULTS: FAKE

    Yep, it works!

  5. A USEFUL application... by Flimzy · · Score: 2, Funny

    When will MIT modify this technology to filter all the spam from my mailbox?

  6. Discrimination by hsmith · · Score: 5, Funny

    I hope the ACLU will ensure that discrimination against metal people will not be allowed to continue.

    1. Re:Discrimination by Iron+Condor · · Score: 3, Funny
      That is people of metal, you biologist

      I think the preferred term is "Ferro-Americans".

      --
      We're all born with nothing.
      If you die in debt, you're ahead.
  7. An interesting experiment by fm6 · · Score: 4, Funny

    Has anybody fed Dvorak's latest column to this program? I've often wondered if he actually writes his columns, or just generate verbiage at random.

    1. Re:An interesting experiment by irregular_hero · · Score: 5, Funny

      "This text had been classified as
      INAUTHENTIC
      with a 24.9% chance of being authentic text"

      No kidding.

    2. Re:An interesting experiment by Anonymous Coward · · Score: 3, Funny

      Yep, I tried that too.
      I also tried another article from ABC News about meat eaters contributing to global warming (http://abcnews.go.com/Technology/story?id=1856817 &page=1). It was inauthentic/28.8%.

      Looks like they have a crafty team of robots there at abc :)

    3. Re:An interesting experiment by Ontain · · Score: 2, Informative

      that's not surprising. i did a few articles and they come up in the 20ish percent range. this detector isn't very good.

    4. Re:An interesting experiment by FhnuZoag · · Score: 2, Funny

      I liked the vast global robot conspiracy explanation better.

    5. Re:An interesting experiment by jacquems · · Score: 2, Funny

      I tested it on the text from the Time Cube index page, and it was rated as AUTHENTIC with a 95.3% chance of being an authentic paper.

  8. Sadly, It appears that I am a robot. by cbelt3 · · Score: 3, Interesting

    I've taken a long posting that I wrote on my blog and dropped it into the site. And I am Inauthentic. Now I understand the "Bladerunner Moment" comment in the article. I shall begin to surround myself with oddly colored polaroids and snapshots of theoretically implanted ancestors.

    The nice thing is that we've finally settled the argument if machines can be made to drink beer and like it !

  9. See what it says about slashdot by Locke2005 · · Score: 2, Funny
    According the the program, the comments to this article are rated as follows:

    This text had been classified as INAUTHENTIC with a 32.2% chance of being authentic text

    Bearing in mind that text over 50% chance will be classified as authentic, this add credence to the theory that slashdot comments are generated by monkeys randomly typing on keyboards.

    --
    I've abandoned my search for truth; now I'm just looking for some useful delusions.
  10. Self defeating? by benhocking · · Score: 5, Funny

    It seems like it wouldn't be too difficult to modify the MIT program to use this new anti-robot robot to write papers that this anti-robot robot would not be able to detect. Ideally, this would be done with a learning algorithm (so that it could easily be extended to other anti-robot robot programs), but reverse-engineering the anti-robot robot (by humans) should also provide a solution.

    Now that Indiana U has thrown down the gauntlet, I wouldn't be surprised if MIT responds. Hopefully it will result in an even better paper-writing robot. Ideally, it will lead to dissertation-writing robots. :)

    --
    Ben Hocking
    Need a professional organizer?
    1. Re:Self defeating? by cp.tar · · Score: 4, Interesting

      I recently had to check out an essay-grading robot for my Introduction to Natural Language Processing class.

      I'd fed it the introduction of a randomly generated essay. It got a 4/5 on all counts.

      I figure, if teachers are going to use robots to grade essays, we should use robots to create them in the first place.

      --
      Ignore this signature. By order.
    2. Re:Self defeating? by mctk · · Score: 5, Funny

      Eventually my students won't have to write papers and I won't have to grade them! Think of the potential application of this technology towards education!

      --
      Paul Grosfield - the quicker picker upper.
    3. Re:Self defeating? by BraksDad · · Score: 3, Insightful

      Maybe after a string of anti robot robots, MIT would come up with a robot that would generate a real scientific paper!

      next comes your anti robot robot
      then the anti anti robot robot robot
      and of course the anti anti anti robot robot robot robot
      and the anti anti anti anti robot robot robot robot robot
      ...
      I could go on since cut and paste is so easy ;-)

      Perhaps it would be a million anti's followed by a million and one robots before something useful came out of such an exercise, but wouldn't it be cool to witness?

      --
      Slowly waving my hand - "This is not the sig you are looking for."
    4. Re:Self defeating? by mctk · · Score: 2, Funny
      Only if their re-writing robots are designed intelligently...

      Okay, actually I just wanted to comment that I love the sig.

      --
      Paul Grosfield - the quicker picker upper.
    5. Re:Self defeating? by Frumious+Wombat · · Score: 2, Interesting

      Personally, I'd be more interested in modifying this for Fraud Detection. The robot looks over your data and text, and decides, "Sorry Dave, a leap of faith has occurred here." Presumably, at that point the robot locks you out of your lab.

      This could lead to a whole series of literary robots: The Too Many Coincidences in Fiction Detector, The Humanities Thesis Verbiage Reducer, The This Movie Is Going to Suck No Matter Who Acts In/Directs It Detector, and so forth.

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
  11. The program is a failure. by im_thatoneguy · · Score: 2, Interesting

    Apperantly I'm on average 49% artificial, based on school papers I wrote. I dub thee program: a failure.

  12. Only works for scientific papers by gurps_npc · · Score: 4, Informative
    If you try to use it on any human written NON scientific paper, such as Lincoln's gettyburg address, it almost always considers it false.

    I suspect that it is looking for the conventional thinking with conventional word structure. As such, it is NOT a good idea i

    --
    excitingthingstodo.blogspot.com
    1. Re:Only works for scientific papers by nasor · · Score: 4, Informative

      No, it doesn't even seem to work on scientific papers. I submitted four papers from the latest issue of Inorganic Chemistry and it thought 2 out of 4 were false:

      Inauthentic: Assembly of a Heterobinuclear 2-D Network: A Rare Example of Endo- and Exocyclic Coordination of PdII/AgI in a Single Macrocycle.

      Inauthentic: Pyrazolate-Bridging Dinucleating Ligands Containing Hydrogen-Bond Donors: Synthesis and Structure of Their Cobalt Analogues

      Authentic: Manganese Complexes of 1,3,5-Triaza-7-phosphaadamantane (PTA): The First Nitrogen-Bound Transition-Metal Complex of PTA

      Authentic: Structure, Luminescence, and Adsorption Properties of Two Chiral Microporous Metal-Organic Frameworks

      Based on this (small) sampling, the program doesn't appear to do any better than if it were to guess randomly. I wonder if this thing is even supposed to work, or if it just returns a random result based on a hash of the paper or something?

  13. Re:Typos by dlakelan · · Score: 2, Informative

    Do robots make typos? Do they make the same typos each time, or different ones?

    Based on the slashdot articles that get posted. I would say YES.

    Actually it's pretty easy to add random convincing misspellings to text, you could use a database from something like usenet, and a spell checker to map misspelled words to their real counterparts, and then have a straightforward algorithm for replacing some set of words with misspellings, and you could tune that for consistency. It would be easier than many other aspects of faking papers.

    --
    ((lambda (x) (x x)) (lambda (x) (x x))) http://www.endpointcomputing.com a scientific approach to custom computing.
  14. Ah.... by BaronSprite · · Score: 2, Funny

    Maybe slashdot can start running it on their links for "cold fusion in 1 year!".......

  15. Re:Typos by brian0918 · · Score: 4, Funny

    E-mail spambots have been making typos for years.

  16. Re:How about . . . by mypalmike · · Score: 2, Funny

    Hey, if you don't like 1-ply you can always fold it in half.

    And if you don't like 2-ply, you can separate the sheets. Keep in mind that this works best before you wipe.

    --
    There are 0x40000000 types of people: those who understand 32-bit IEEE 754 floating point, and those who don't.
  17. I am in awe by DingerX · · Score: 4, Informative

    So I go there, and I start shoving it text from my hard drive. I try:

    A) Text of an article (Philosophy) I (native English speaker) wrote in Italian: 98.5 Authentic.
    B) Text of an article I wrote in English (History): 87.8
    C) Text of an article (History) written in French by a native French speaker and translated into English: 93.2
    D) Critical edition of a 14th-century Latin text (Theology): 97.7 Authentic.
    E) Documentation to a Field Artillery Simulation: 95.3
    F) A completely bogus narrative for a monastic order that doesn't exist, written in a style that mimics A)-C): 16.8% Inauthentic

    So in this case, we have a human written document that has superficial meaning, but is written as a "fake scientific paper", and registering as such.

    And yes, I did read the "purpose" of the page; I know it's not supposed to detect it.


    And yet it does, decisively.

  18. Can fool it by duplicating first page by currivan · · Score: 2, Interesting

    Duplicating the first half of the sample fake paper after the end of the footnotes makes it go from inauthentic (17%) all the way up to 91% authentic. It seems to be looking for long-range n-gram repetition, but it doesn't have a ceiling on frequency or length or the repeated text.

    It shouldn't be hard to compare the distribution of n-gram recurrence rates (or distances between recurrences) to the observed distribution for actual papers. Something like a KL divergence would capture deviations in either direction.

  19. Re:It Caught Mine by Em+Adespoton · · Score: 2, Interesting

    This raises a question... how do Wikipedia articles fare? --I'd guess that they should be at least *somewhat* scientific....

  20. What it says about anything by Pi_0's+don't+shower · · Score: 2, Informative

    I just finished writing a scientific paper for publication. Apparently, this filter is very reliant on using long-term pattern recognition. When I fed this application my introduction only, it told me my work was INAUTHENTIC with a 35% chance of authenticity. When I fed it the first two sections, it said it was AUTHENTIC with a 66% chance of authenticity. And finally, when I fed it the entire paper, it said it was AUTHENTIC at the 87% level.

    So apparently, all you need to do to beat this filter is insert the same buzzwords and phrases at many different points in a long article, and you should be fine.

  21. President Bush's Biography by b0wl0fud0n · · Score: 2, Funny
    I'm in awe too. I put in George Bush's biography from the whitehouse.gov website and got
    This text had been classified as
    INAUTHENTIC
    with a 27.3% chance of being authentic text
    I'm amazed too! It works!
  22. Read the Paper - Looks at Repetition by Constantine+Evans · · Score: 3, Informative

    Read the paper listed in the menu of the website. The system essentially compresses the text with different window sizes, and then looks at the compression factors. In other words, it is only looking for repetition of strings. This is absurdly easy to fool, and the MIT generator could be easily fixed to pass this filter. For example, try entering a random text once (your post, for example). Note that it fails. Then append a few copies of the same text, and run that through. Your post, when run once, is too short. When run with two copies, it is rejected as 41.2%. When run with three, it passes with 93%. There is a window of repetition level required in order to pass - papers that do not repeat enough are classified as fake, as well as papers that repeat too much (try entering twenty copies of your post).

    It should be relatively simple to make a random paper generator that always passes this test with a higher probability than human-written papers.

  23. Re:That's EASY! by Hal_Porter · · Score: 5, Funny

    I, for one, peruse the blogosphere. On my Powerbook, wearing a black turtle neck and beret. Stroking my goatee thoughtfully. Sipping a latté in a café

    If I could just find a way to recharge my PowerBook from your hatred, I could stop carrying this ugly power adaptor.

    --
    echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  24. False positives by macklin01 · · Score: 2, Interesting

    Hmmm, it's an interesting idea, but it seems to give a lot of false positives. (So naturally, it will detect fake papers, if it thinks every paper is fake.)

    First thing I tried was some pages on computational oncology website, in particular, my cancer primer, which I wrote in not a short time. Everything I fed was determined to be inauthentic. Perhaps I just write like a robot. :-) I figured that perhaps the detector was more primed for real papers, so I figured it wasn't too big of a deal.

    So, next I tried my most recent research paper, and it, too, was determined to be inauthentic, and in fact with less authenticity than my website. So much for the theory of being primed for scientific papers only. This thing is starting to look pretty bogus to me ... but an interesting idea, nonetheless. -- Paul

    --
    OpenSource.MathCancer.org: open source comp bio
  25. Trying Wikipedia articles by Animats · · Score: 4, Interesting
    I've been trying my own papers and articles from Wikipedia. My own papers all score around 90%. Wikipedia articles that I consider good ones seem to score in the 80% range. Badly written fancruft scores very low.

    Some variant on this thing might be useful as a new article filter in Wikipedia. We need more automation over there to stem the flow of incoming dreck.

  26. Re:That's EASY! by Unski · · Score: 3, Funny

    Sir I regret to inform you that you are a ruffian. I for one sit not in a place so vile and common as a 'café', examining the flawed writings of others, but in a temple constructed purely out of my supercilious transcendent superiority. I consume nothing so plebeian as 'The Internet' but rather a rasterized, marked-up and projected form of my own rigourous, peerless stream of consciousness (with blue aqueous scroll-bars). I have no need for facial hair or indeed any of your corporeal trappings and hence know not the joy of stroking a 'goatee'.

    And now I must mod you as Troll, for surely you must know that the PowerBook, created on the Seventh day, is immaculate in it's design and conception and therefore the only possibility is that you seek to trifle with the emotions of our brethren, in crudely ascribing to Our Power Adapters the property of ugliness. If you were truly one of Us you would know that Steve created all in his own beautiful image.

    btw you haven't got a couple of rubber feet for an ibook going spare have you mate?