Menu
  • Solutions
  • Resources
  • Pricing
  • Buy Credits

The War Against Gibberish

Posted by Jonathan Bailey on Mar 18, 2014 8:00:00 AM

Last month, Richard Van Noorden at Nature reported that two major publishers were removing some 120 papers from their subscription services after it was revealed that the papers were computer-generated.

TheiStock_000004916826Small papers involved were not submitted to any of the publishers’ journals, but rather, were included in their subscription services after being featuredat various conferences, mostly in China.

The incident wasn’t the first instance of fake, computer-generated papers being published. Cyril Labbé, the researcher who detected the fake papers, had previously published over 100 fake papers in Google Scholar, turning a fake researcher into one of the most-cited scientists in the world. The incidents of publishing hoax papers even go back to the 90s as physicist Alan Sokal published a parody paper in a journal as a means of sparking a debate on the merits of “cultural studies”.

Many of the computer-generated papers are coming from a program called SCIgen, an online generator that, with nothing more than the names of the authors, creates a fake computer-science paper that’s designed to appear convincing. First launched in 2005, SCIgen has been used by countless researchers to expose conferences and journals with inadequate quality control.

But the use of SCIgen has become so prevalent that a SCIgen detector has also been built. It’s a site where one can upload a suspicious paper and see if it closely matches one generated by SCIgen.

However, such a detector should not be necessary. Even a cursory glance at an automatically generated paper is enough to tell that the paper is fake. The sentences rarely make sense and the generated charts are both nonsensical and have nothing to do with the subject at hand. One doesn’t need to be an expert on computers to spot the problems.

The issue instead is poor quality control by a relative handful of conferences and journals. Considering that the editorial process of a particular publication is, in many cases, the only line of defense from keeping a work in appearing in large catalogues of research, a bad paper that slips through the editorial process in one place can wind up in the major online databases, including those hosted by prominent publishers.

But what’s more disturbing is not how it happens, but why. In most of the circumstances where the papers were published, they were submitted to journals or conferences that required a publication fee, often a large one. While publication fees are common among legitimate journals, there’s a vast chasm between journals that charge a fee to publish legitimately reviewed and accepted works and those who will simply publish anything that pays the toll.

The latter is an outright scam, both of the scientific community and of the students and researchers that submit to them. We discussed these predatory journals back in May of last year, but the recent incidents of “gibberish” papers has not only shown that these journals (and conferences) are doing well, but that the poor quality work they contain are bleeding into other, more trusted resources.

Unfortunately, as Navin Kabra pointed out when he successfully published a generated paper, it can be very difficult to tell the legitimate journals/conferences from the predatory ones. Predatory publications can boast some impressive sounding credentials and make excellent promises that they simply never follow up on.

So while there certainly is a war on gibberish and there are some who will try to use generated papers to get published, the more serious problem is the predatory journal/conference problem, the same one that has been around for years.

After all, it’s highly unlikely that a computer-generated paper would make it past a legitimate editorial review process. As such, if we can find a way to stop predatory journals and conferences, we can also stop gibberish from finding its way into reputable databases.

The opinions expressed here are my own and do not represent the views of iThenticate.