Slashdot Mirror


Hoax-Detecting Software Spots Fake Papers

sciencehabit writes: In 2005, three computer science Ph.D. students at the Massachusetts Institute of Technology created a program to generate nonsensical computer science research papers. The goal was "to expose the lack of peer review at low-quality conferences that essentially scam researchers with publication and conference fees." The program — dubbed SCIgen — soon found users across the globe, and before long its automatically generated creations were being accepted by scientific conferences and published in purportedly peer-reviewed journals. But SCIgen may have finally met its match. Academic publisher Springer this week is releasing SciDetect, an open-source program to automatically detect automatically generated papers. SCIgen uses a "context-free grammar" to create word salad that looks like reasonable text from a distance but is easily spotted as nonsense by a human reader.

35 of 61 comments (clear)

  1. Results? by tulcod · · Score: 1

    So? Surely, after coding this up, the first thing any scientist would do is scan, at the very least, all of arXiv, and see what comes out as fake? I mean I have seen my fair share of papers that might as well have been generated by SCIgen and the like.

    1. Re:Results? by I'm+not+god+any+more · · Score: 5, Funny

      1. The first thing SCIgen should do is to incorporate SciDetect, to make sure that their random papers pass the SciDetect test.
      2. SCIDetect should then improve their algorithms, and SCIgen should again take a snapshot of SciDetect source code and incorporate it.
      3. Run this loop a few times and what we'll have is some serious papers
      4. Profit!!!

    2. Re:Results? by phantomfive · · Score: 3, Interesting

      Of all the problems you might find at arXiv, I don't think "auto-generated papers going undetected" is one of their problems.

      ArXiv's problem is recognizing when human-written, realistic sounding papers are actually BS.

      --
      "First they came for the slanderers and i said nothing."
    3. Re:Results? by ckatko · · Score: 1

      Just because there's a way to scan papers (to help you trick the system) doesn't mean everyone is going to use it. The smart ones will, but that doesn't mean plenty of stupid people won't.

      If tool can't stop every bad guy doesn't mean it's useless. Even a professional will miss some. It's about reducing the numbers that get through.

    4. Re:Results? by zerro · · Score: 1

      Is there such thing as a Turing Race ?!

    5. Re:Results? by Em+Adespoton · · Score: 1

      Well why not automate the process? SCIgen should just subscribe to the SciDetect source repo, and auto-update its copy when the trunk updates. SciDetect should then subscribe to the SCIgen source repo, and ensure that it detects any newly missed sets.

      Leave this system alone for a while, and we won't need to write articles anymore, as SCIgen should do a better job of producing insightful but unintelligible drivel than you'd get from any peer-reviewed journal -- and it would detect itself to boot!

    6. Re:Results? by i.r.id10t · · Score: 2

      Sorta like turnitin.com's business model. Require students to give you their content in order to get a grade, and scrape the web for text content. Sell lookups of newly submitted content against that content archive back to educational institutions. Then start up a pre-processing service for students to check their submissions against first before they submit to the teacher for a grade.

      --
      Don't blame me, I voted for Kodos
    7. Re:Results? by Z00L00K · · Score: 1

      There is now!

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    8. Re:Results? by buchner.johannes · · Score: 1

      ArXiv's problem is recognizing when human-written, realistic sounding papers are actually BS.

      Actually each ArXiv section has an editor who screens the papers, checking if they have reasonable content. And it unfortunately happens that legitimate papers are withheld for several weeks, and the ArXiV administration is not responding reliably to emails (being understaffed and having many submissions). So unfortunately, ArXiV is not just a pre-print server anymore where everyone can upload, but has turned into a intransparently half peer-reviewed journal, which scientists read every day.

      --
      NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
  2. Got it... by Anonymous Coward · · Score: 4, Funny

    Software detecting papers written by software -- in the dark.

  3. Chicken chicken Chicken? by Irate+Engineer · · Score: 3, Funny
    --

    Left MS Windows for Linux Mint and never looked back!

    Vote for Bernie in 2016!

    1. Re:Chicken chicken Chicken? by Anonymous Coward · · Score: 1

      Don't forget the author's presentation on this article:
      https://www.youtube.com/watch?v=yL_-1d9OSdk

    2. Re:Chicken chicken Chicken? by PolygamousRanchKid+ · · Score: 1

      No, I think it was Buffalo, and not Chicken: http://en.wikipedia.org/wiki/B...

      --
      Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
    3. Re:Chicken chicken Chicken? by sumdumass · · Score: 1

      Let us not forget about the issues with mailing lists either.

      http://www.scs.stanford.edu/~d...

      http://www.vox.com/2014/11/21/...

  4. Evil tech? by Anonymous Coward · · Score: 5, Interesting

    The purpose of the scam papers was to expose scam journals.
    The purpose of this new software seems to be to all scam journals to continue scammng.
    So it's an evil software, that should not have been developed, right?

    I mean, if you were doing actual peer review, none of this would pass even a half-sentient peer's inpection.

    1. Re:Evil tech? by pla · · Score: 2

      I mean, if you were doing actual peer review, none of this would pass even a half-sentient peer's inpection.

      This, so much this!

      Seriously - If I don't do my job and my boss catches me playing online poker all day, should I attach a response to my HR writeup explaining that I have addressed my deficiency by rearranging my cube to make it harder for others to see my screen???


      The problem here has nothing to do with people submitting fake papers, Springer. Rather, you need to stop hiring fake editors.

  5. It is too much trouble to fix the problem by Attila+Dimedici · · Score: 4, Interesting

    Springer reveals that they are not interested in fixing the problem revealed by SCIgen, they just want to prevent that software from demonstrating that they have not fixed it. They aren't going to change the review process to ensure that they no longer publish papers which are nonsense. No, they developed software to eliminate those papers which were generated by other software.

    --
    The truth is that all men having power ought to be mistrusted. James Madison
  6. Interesting Response by Roger+W+Moore · · Score: 4, Insightful

    arXiv is not peer reviewed. What I found interesting though was the response of the publisher: write a program to detect fake papers. Even the most simplistic peer review - i.e. reading the paper - would immediately catch these papers. If they need to write a program to catch fake papers then their peer review model is essentially worthless and frankly a journal that poor is no better, and liekly worse, than arXiv: at least arXiv doesn't pretend to have peer review.

  7. Authentic Frontier Gibberish by Registered+Coward+v2 · · Score: 2

    So a program designed to write fake papers to unmask sham journals and conferences gets used to write fake papers to prop up sham degrees? Some what ironic; although in fairness to the authors of the paper writing program they never intended it to be used in such a manner. It would seem, as Springer acknowledged, that they should do a good peer review; which would eliminate the need to run paper through a hoax detector unless they started getting so many fake papers that their peer review process was overwhelmed. In that case, a first run through a program would be justified. A more subtle point in the article is that claimed publications from some countries, such as China, should be viewed with suspicion.

    As a side note, the sham conference industry is interesting. I periodically get, via LinkedIn, invite stop attend an "important conference" and speak and get a "prestigious award" based on my "outstanding accomplishments and renowned expertise" in my field. Funny how, when I send them my speaking fee requirements they never get back to me nor mail me the award as I request if I am unable to make the conference.

    --
    I'm a consultant - I convert gibberish into cash-flow.
    1. Re:Authentic Frontier Gibberish by Anonymous Coward · · Score: 1

      It would seem, as Springer acknowledged, that they should do a good peer review; which would eliminate the need to run paper through a hoax detector unless they started getting so many fake papers that their peer review process was overwhelmed. In that case, a first run through a program would be justified.

      Sorry, I don't buy it. It only takes what, 2 seconds or less for an actual human to detect a phoney paper like chicken chicken chicken. I don't care how "inconvenient" it is to Springer, if I am paying for a subscription to a peer reviewed magazine I expect the papers presented in that magazine to actually be peer reviewed.

  8. How naive.... by Anonymous Coward · · Score: 1

    Publishing houses have 1000's of "peer reviewed" journals to print. They don't have time or actual experts to read them, that is the job of the peers that buy the journal.

  9. Lazy professors by magarity · · Score: 1

    "SCIgen uses a "context-free grammar" to create word salad that looks like reasonable text from a distance"

    This is great for students who have lazy professors. Write a good introduction on page 1, a good conclusion on page 52, and use SCIgen on pages 2-51.

  10. Trace Buster Buster to bust his shit by geoskd · · Score: 1

    an open-source program to automatically detect automatically generated papers.

    Just wait till I bust out my Trace Buster Buster.

    --
    I wish I had a good sig, but all the good ones are copyrighted
  11. I don't care about hoax papers by aaaaaaargh! · · Score: 2

    What bothers me is that in the humanities there are whole communities and sub-disciplines in which there is barely any real peer reviewing. These are small niche areas in which everyone knows everyone and basically the whole research is based on invited contributions and papers that are not properly blind peer reviewed - they are cursorily scanned by colleagues who know who wrote the article. In such a field there are about 5-10 journals in total and the authors jump back and forth between them. Most of them are unable to publish articles in top journals of the discipline as a whole. I personally know professors who have built a whole career on the basis of quoting themselves and by doing light editorial work. I know a cross-disciplinary field of study in the humanities that is entirely dominated by two professors, all the rest are scholars of them, and each of them wrote around 40 books, always on the same topic, and all of them more or less repeating the same two pseudo-competing themes over and over.

    It's pretty sad to see these people recognized as experts when at the same time in other fields there is hard work and real progress.

  12. Well done Springer by nmpg · · Score: 1

    Fron this I read that Springer instead of promoting measures to ensure real peer-review and avoid these scam conferences, actually builds a program that helps these scam conferences. Well done.

  13. Good for Springer by damn_registrars · · Score: 1

    At least they have done something to warrant their publication costs. I figured the charges were just all going to the CEO, now we see that some very small part of them went to hire a CSci intern for a few weeks.

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
    1. Re:Good for Springer by Z00L00K · · Score: 1

      Raise the stakes and detect lying politicians.

      It may be easier to detect when they are speaking the truth however.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  14. Perhaps there should be fewer papers by Karmashock · · Score: 1

    The biggest source of these fake papers appears to be phd papers. And given that we're producing more phds than ever before, maybe we should reform the way we do that. Because in requiring that they actually discover or examine something new the chances are that they're going to lie about something.

    If we had fewer phds maybe they wouldn't do that so much. But the issue is that there are so many papers that no one can read them. And that means trying to audit this stuff is impractical.

    The solution of having robots audit the papers is interesting but for that to be really effective, I think the papers need to be optimized for that sort of scan.

    Less prose for example outside of the abstract. More data, more equations, more graphs... more things an expert system could take apart.

    Here is my real beef with this idea... i'm pretty sure I could cheat just as easily with this system as I could right now. I think the only thing that would change is that the people reading my work would feel less of a need to check my work and just trust the robot. But if I know how to fool the robot then I automatically win.

    Robots are stupid guys. I've never met a machine I couldn't outsmart.

    --
    I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
    1. Re:Perhaps there should be fewer papers by Karmashock · · Score: 1

      Stupid insults from an Anonymous Coward? Shocking.

      --
      I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
    2. Re:Perhaps there should be fewer papers by Karmashock · · Score: 1

      In what way does my statement trivialize the process?

      BE SPECIFIC. SAY "WHY".

      Then you say something is 100 percent bullshit but don't say why that is either.

      Absent "why" you have no argument and therefore your post is a NULL statement.

      Why am I wrong?

      Why do people have such a fundamental difficulty with making a rational statement? It is baffling to me.

      --
      I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
    3. Re:Perhaps there should be fewer papers by Karmashock · · Score: 1

      Okay, so you open with a silly attempt to browbeat me on the grounds that my posts are often not grammatically correct... on an internet forum.

      And on the that basis you attempt to justify the statement that I am out of my depth in all issues... I mean, you say I don't proofread but you need to think over your arguments a bit more, sport. This crap is sad.

      And then you say I am emotionally breaking down? On what basis? I assume your mind reading powers.

      Your post was either logically unsustainable such as your first two statements or was likely projection on your last point.

      Either way... You've abandoned any attempt to remain on topic and have fallen entirely into insults.

      That's a win for me. You say "I" broke down... if you abandoned any attempt to sustain your point and have fallen to calling me a poopy head... I win.

      Good game.

      --
      I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
    4. Re:Perhaps there should be fewer papers by Karmashock · · Score: 1

      I actually do know how AC works. You chose to not use your fake name on the site because you're a weasel or too lazy to log on.

      --
      I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
    5. Re:Perhaps there should be fewer papers by Karmashock · · Score: 1

      I didn't trivialize anything.

      You're pushing a strawman and I'm tired of indulging your deceit.

      We're done.

      --
      I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
    6. Re:Perhaps there should be fewer papers by Karmashock · · Score: 1

      It exists only in your mind.

      I have nothing but respect for those that go through a PhD program and I have nothing but respect for the education and the disciplines involved... so long as the people involved in them have respect for them as well. There are examples of fraud and I have no respect for them.

      The mere fact that I am arguing against you so strenuously here proves that you misunderstood my intentions. If I did feel that way, then I would agree with your position... right? And yet I don't... which means that clearly wasn't the message I was trying to send which means you're wrong.

      I know I know... you like strawmen... but they're logically unsupportable so that's just too bad.

      In any case, you're not interested in a constructive discussion but in some little emotional crusade. You are neither informed nor interesting. And lacking both I can't see why anyone would want to talk to you about anything.

      Good day.

      --
      I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
  15. Peer review? by jbmartin6 · · Score: 1

    The existence of this tool is admitting that these papers aren't peer reviewed. Wouldn't it be simpler to just admit that and stop committing fraud?

    --
    This posting is provided 'AS IS' without warranty of any kind, implied or otherwise.