Hoax-Detecting Software Spots Fake Papers
sciencehabit writes: In 2005, three computer science Ph.D. students at the Massachusetts Institute of Technology created a program to generate nonsensical computer science research papers. The goal was "to expose the lack of peer review at low-quality conferences that essentially scam researchers with publication and conference fees." The program — dubbed SCIgen — soon found users across the globe, and before long its automatically generated creations were being accepted by scientific conferences and published in purportedly peer-reviewed journals. But SCIgen may have finally met its match. Academic publisher Springer this week is releasing SciDetect, an open-source program to automatically detect automatically generated papers. SCIgen uses a "context-free grammar" to create word salad that looks like reasonable text from a distance but is easily spotted as nonsense by a human reader.
Software detecting papers written by software -- in the dark.
Chicken chicken, (chicken) chicken?
https://www.improbable.com/airchives/paperair/volume12/v12i5/chicken-12-5.pdf
Left MS Windows for Linux Mint and never looked back!
Vote for Bernie in 2016!
The purpose of the scam papers was to expose scam journals.
The purpose of this new software seems to be to all scam journals to continue scammng.
So it's an evil software, that should not have been developed, right?
I mean, if you were doing actual peer review, none of this would pass even a half-sentient peer's inpection.
1. The first thing SCIgen should do is to incorporate SciDetect, to make sure that their random papers pass the SciDetect test.
2. SCIDetect should then improve their algorithms, and SCIgen should again take a snapshot of SciDetect source code and incorporate it.
3. Run this loop a few times and what we'll have is some serious papers
4. Profit!!!
Springer reveals that they are not interested in fixing the problem revealed by SCIgen, they just want to prevent that software from demonstrating that they have not fixed it. They aren't going to change the review process to ensure that they no longer publish papers which are nonsense. No, they developed software to eliminate those papers which were generated by other software.
The truth is that all men having power ought to be mistrusted. James Madison
arXiv is not peer reviewed. What I found interesting though was the response of the publisher: write a program to detect fake papers. Even the most simplistic peer review - i.e. reading the paper - would immediately catch these papers. If they need to write a program to catch fake papers then their peer review model is essentially worthless and frankly a journal that poor is no better, and liekly worse, than arXiv: at least arXiv doesn't pretend to have peer review.
So a program designed to write fake papers to unmask sham journals and conferences gets used to write fake papers to prop up sham degrees? Some what ironic; although in fairness to the authors of the paper writing program they never intended it to be used in such a manner. It would seem, as Springer acknowledged, that they should do a good peer review; which would eliminate the need to run paper through a hoax detector unless they started getting so many fake papers that their peer review process was overwhelmed. In that case, a first run through a program would be justified. A more subtle point in the article is that claimed publications from some countries, such as China, should be viewed with suspicion.
As a side note, the sham conference industry is interesting. I periodically get, via LinkedIn, invite stop attend an "important conference" and speak and get a "prestigious award" based on my "outstanding accomplishments and renowned expertise" in my field. Funny how, when I send them my speaking fee requirements they never get back to me nor mail me the award as I request if I am unable to make the conference.
I'm a consultant - I convert gibberish into cash-flow.
Of all the problems you might find at arXiv, I don't think "auto-generated papers going undetected" is one of their problems.
ArXiv's problem is recognizing when human-written, realistic sounding papers are actually BS.
"First they came for the slanderers and i said nothing."
What bothers me is that in the humanities there are whole communities and sub-disciplines in which there is barely any real peer reviewing. These are small niche areas in which everyone knows everyone and basically the whole research is based on invited contributions and papers that are not properly blind peer reviewed - they are cursorily scanned by colleagues who know who wrote the article. In such a field there are about 5-10 journals in total and the authors jump back and forth between them. Most of them are unable to publish articles in top journals of the discipline as a whole. I personally know professors who have built a whole career on the basis of quoting themselves and by doing light editorial work. I know a cross-disciplinary field of study in the humanities that is entirely dominated by two professors, all the rest are scholars of them, and each of them wrote around 40 books, always on the same topic, and all of them more or less repeating the same two pseudo-competing themes over and over.
It's pretty sad to see these people recognized as experts when at the same time in other fields there is hard work and real progress.
Sorta like turnitin.com's business model. Require students to give you their content in order to get a grade, and scrape the web for text content. Sell lookups of newly submitted content against that content archive back to educational institutions. Then start up a pre-processing service for students to check their submissions against first before they submit to the teacher for a grade.
Don't blame me, I voted for Kodos