Opinion
Software Engineering and Programming Languages

Repeat, Reproduce, Replicate

The pressure to publish versus the will to defend scientific claims.

Posted
collection of different patterns, illustration

Dear KV,

Even though I am not in academia, I try to keep up with the research in computing science, mostly by reading the abstracts of papers in the SIGs I was a member of in college. From time to time, I find an interesting paper that contains something I can apply in my day job. Sometimes I am even lucky enough to find that the researchers have posted their code to their web page or a service like GitHub. But, more often than not, whenever I try the code, I find it has many problems. Sometimes it does not build at all and appears to be abandonware. There are even times when it will build but then does not operate in the same way the paper indicated. Occasionally, I have emailed the researchers only to find they are graduate students who have moved on to other work and either do not reply, or—when they do—it is only to shrug me off and wish me luck.

It seems that if researchers have gone to the trouble of posting code, it ought to actually work, right?

Irreproducible

Dear Irreproducible,

As I am sure you know from previous installments of the Kode Vicious column, I too try to follow the research in my area, which, broadly covers “systems”—also known as those unsexy bits of software that enable applications to use computing hardware and the occasional network. Down here in the sub-sub-basement of computing science, we try to improve the world by applying the scientific method. So, I am always happy for the occasional missive that floats down from the ivory towers of those who have managed to convince program committees that their work has merit.

It may shock you to know that most conferences and publishing venues do not require researchers to submit their experimental data or systems in order to be allowed to publish their results. I am told this is now changing. In fact, ACM introduced a badging system for software artifacts back in 2020 (see https://bit.ly/3LQsKsd).

While the badging system is a step in the right direction (albeit with an annoyingly silly set of three R’s—repeatability, reproducibility, and replicability that are hard enough for native English speakers to differentiate, never mind for those of our colleagues who did not start out life speaking English), it is not a requirement for publication, and herein lies one of the problems.

A hallmark of the scientific method over the past several-hundred years—and the thing that differentiates science from belief or faith—is that other people must be able to independently replicate the result of an experiment. In computing science, we do not take this seriously enough. In fact, if you talk to some researchers about this, they will chuckle and point out that what gets published might be based on a graduate student finally getting their code to run once and produce a graph.

To say that these are shifting sands on which to build up a body of scientific knowledge is an understatement. In a world that depends, day in and day out, on the results of experiments in computing science, it qualifies as a dangerous outrage. Do you want the algorithm that determines when and how hard to apply your car brakes to be one that was embraced on the basis of one lucky run of test code?

There are several reasons for this disconnect between research and the rest of us, and they include concerns outside the realm of computing science—issues related to economics and politics, for example. But in the end, it all comes down to a fundamental disconnect between incentives. In the academic world, the incentives revolve largely around “publish or perish”—a well-worn phrase that even those outside of the academic world surely know. The people who produce the research are graded not so much on how well their ideas work—although some ideas that win a following do end up propelling careers—but instead, by how many papers have been accepted into prestigious journals and conferences.

This pressure to publish, along with the fact that the field of computing is now one of the most lucrative in the world, has twisted things such that people are publishing at any cost. This, in fact, has led to a huge amount of academic chicanery, such as paper mills, where prior research is mixed and matched to yield seemingly new results that might get published somewhere, even if not in the top-tier journals. In some fields, like medicine, this pressure has become so intense that great reams of research have been torn up after they were found to be based on faulty or even fraudulent data.

Another challenge confronting reproducible results in computing science is the very speed with which the field changes. Finding a computer that is largely comparable to the one used just five years earlier to produce a result can prove challenging. And finding a system with the exact same configurations or memory, disk, bus, and CPU is sure to be even more difficult. KV doubts that conferences anytime soon will be requiring researchers to hand over their hardware as well as their software in order to submit a paper. But I have to admit this is an amusing thought.

KV likes to look at physics as a gold standard in the sciences. I am sure some angry physicists will now send me missives to tell me I am dead wrong to believe this (and some of those folks make bombs, so I really should watch what I say). Still, I asked a physicist of my acquaintance—one with a long career and many published papers and books—what he thought of a recent statistic indicating that, in my area, systems, only 24% of the research a group attempted to reproduce proved to be reproducible (see https://bit.ly/3WKxaaF). Once he had stopped laughing, my friend mentioned two names, Fleischmann and Pons, which I then had to look up. These are the guys who claimed to have achieved “cold fusion” and now are infamous for having gotten that all wrong.

All of which is to say that if computing science wants to really be a science, and not just in name, we are going to have to take a pause, take stock, and think about how we encourage (or, on my more angry days force) people to defend their scientific claims with reproducible results. Since most researchers have a cadre of graduate students working for them perhaps it might be good training to have the first-year students reproduce results from recent, award-winning, papers as both a learning experience, and, if they find errors, a way to have their own early papers to publish. While the problems of hardware and software moving quickly are undeniable, the reduction in cost for computing hardware actually argues in favor of reproducing results.

Unless a result relies on a specific hardware trick, such as a proprietary accelerator or modified instruction set, it is possible to reproduce the results of one group by a different one. Unlike the physicists we do not have to build a second Hadron Collider to verify the result of the first. We have millions of similar, and sometimes identical, devices, on which to reproduce our results. All that is required is the will to do so.

KV

Related articles

Arrogance in Business Planning
Paul Vixie
https://queue.acm.org/detail.cfm?id=2008216

Databases of Discovery
James Ostell
https://queue.acm.org/detail.cfm?id=1059806

Above the Line, Below the Line
Richard I. Cook, M.D.
https://queue.acm.org/detail.cfm?id=3380777

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More