Adopting a Culture of Transparency

When a journal retracts a published study, the research doesn’t vanish; it gets flagged as unreliable. — In 2016, 1,405 of approximately 2 million published papers were retracted, about two-thirds of them due to misconduct by the papers' authors.

In 2000, 36 of the million scientific papers published worldwide that year were retracted by their publishers. In 2016, when about twice as many papers were published worldwide, 1,405 papers were retracted. About two-thirds of the retractions resulted from misconduct by the papers' authors, says Ivan Oransky, co-founder of Retraction Watch, a blogging unit of the non-profit academic watchdog Center for Scientific Integrity.

After a recent analysis of nearly 1,000 papers published in Molecular and Cell Biology, Elisabeth Bik, a microbiologist and scientific editor at uBiome.com, says that a simple extrapolation of the number of papers retracted for "inappropriate image duplication" suggested that some 35,000 similar papers in the literature are good candidates for retraction for that reason. Simple errors are usually addressed by a correction in the journal involved, while retractions usually are reserved for cases where deliberate manipulation is suspected, Bik says.

A crime wave in the lab? The collapse of science publishing standards? The result of sloppy peer review? Much of the tidal wave of retractions comes from increased scrutiny of published papers by scientific sleuths using new automated error-detection tools. Even so, Oransky notes, retractions occur in fewer than one-tenth of 1% of the two million scientific papers published worldwide every year.

"Our statistics tell you about retracted papers, not about problematic papers that maybe should be retracted," Oransky says. "So is the problem worsening? We don't really know. It may be that we are now paying more attention to all the problems, so you could argue that an increase in retractions is a good thing."

One of the data sleuths "paying attention" is James Heathers, a self-described "scientist, author, and scalawag" at the Computational Behavioral Science Lab of Northeastern University. Two years ago, he co-authored a paper introducing GRIM (Granularity-Related Inconsistency of Means), a test for verifying the reasonableness of summary statistics in published psychology reports.

This year, Heathers co-authored a paper that took GRIM several steps further. Called SPRITE, for Sample Parameter Reconstruction via Iterative TEchniques, the new method uses a heuristic algorithm to reconstruct the raw data presented in a paper given just a few basic summary statistics about it; namely, the mean, the standard deviation, the sample size, and the lower and upper bounds of the item values. It can generate plausible raw data sets from these statistics, provided they consist of bounded integer data, such as in a survey where people might answer with choices from 1 to 5. It will tell the user when no possible set of data could have produced the mean and standard deviation reported, or if solutions are possible but unlikely.

SPRITE is useful in cases, which are very common, in which the author has not, or will not, publish the raw data behind a paper. For example, in the 2012 paper, "Attractive Names Sustain Increased Vegetable Intake in Schools," Brian Wansink, a prominent food scientist at the Food and Brand Lab of Cornell University, looked at the question of whether changing the name of mundane foods such as 'carrots' and 'broccoli' to something more exciting would lead school children to eat more of them. Heathers applied SPRITE to the means and standard deviations reported by Wansink, and discovered one of the very young children would have to have eaten 60 or more carrots at a sitting for the data reported to be accurate; possible if the carrots were really tiny, but unlikely.

Further study of the paper, which was first corrected by Wansink and then retracted by the publisher, turned up a pattern of inconsistencies. "It seems something went badly wrong with this paper. We don't know what exactly — the data collection, the analysis, the measurement technique, or all of the above," Heathers says in a recent paper about SPRITE.

Since then, numerous papers by Wansink have been corrected or retracted, some after being flagged by SPRITE, and Cornell is now investigating Wansink's work.

Meanwhile, Sean Wilner and Katherine Wood, Ph.D. students in informatics at the University of Illinois at Urbana-Champaign, have developed another way to reconstruct the hidden data behind a scientific study. Complete Recovery of Values In Diophantine Systems (CORVIDS) generates polynomial equations, which are restricted to a set of integers like those in a study based on survey results. It finds all combinations of data that solve the equations — all the possible raw data sets that the experimenter could have used to compute the reported means and standard deviations. Like SPRITE, it can spotlight shoddy or suspicious research by finding impossible values, or strangely skewed data, and displaying them in a set of histograms.

CORVIDS and SPRITE are similar in concept. SPRITE is approximate, using a heuristic algorithm and random sampling to converge on possible solutions. CORVIDS's deterministic algorithm will always find every possible solution given the summary statistics, or will mathematically guarantee that no solution is possible. The trade-off is that CORVIDS is more computationally demanding, and so may become impractical to use on large problems.

CORVIDS and SPRITE are part of a growing array of software tools for finding errors, both deliberate and accidental, in scientific studies. Statcheck will scan PDF and other standard-format documents and verify the accuracy of reported p-values, a commonly used measure of statistical significance. Other software examines digital images, often western blots that show amino-acid sequences in proteins, looking for signs of duplication or other manipulation. Some image-data watchdogs blow up and enhance suspect images in Photoshop looking for signs of tampering, and journal editors increasingly use a variety of plagiarism-detection software packages looking for text duplication without attribution.

The increased scrutiny given scientific papers – both manual and automated – and the increased press attention given to retractions in recent years have not been enthusiastically received by some researchers, journals, and funding institutions. However, it is having a beneficial effect, at least in the field of psychology, says the University of Illinois' Wood.

"It's now much more common that you will make your methods public, your code public, and even your raw data public," Wood says. "We are adopting a culture of transparency."

Gary Anthes is a technology writer and editor based in Arlington, VA, USA