Opinion
Computing Profession Viewpoint

We Need to Automate the Declaration of Conflicts of Interest

Leveraging existing data sources to improve the declaration and management of authorship conflicts of interest.
Posted
  1. Article
  2. References
  3. Authors
  4. Footnotes
hand touching futuristic interface

Over the last 70 years of computer science research, our handling of conflicts of interest has changed very little. Each paper's corresponding author must still manually declare all their co-authors' conflicts of interest, even though they probably know little about their most senior coauthors' recent activities. As top-tier conference program committees increase past 500 members, many with common, easily confusable names, PC chairs with thousands of reviews to assign cannot possibly double-check corresponding authors' manual declarations against their paper's assigned reviewers. Nor can reviewers reliably catch unreported conflicts. Audits at recent top-tier venues across several areas of computer science each uncovered more than 100 instances where, at the first venue, a pair of recent coauthors failed to declare their conflict of interest; at the second venue, someone was assigned to review a recent co-author's submission; and at the third venue, someone reviewed a submission written by a prior co-author from any year. Even the concept of a conflict deserves closer scrutiny: an audit at yet another recent top-tier venue edition found more than 100 cases in which prior co-authors from any year reviewed each other's submissions.

These are issues of scale. Seventy years of exponential growth have turned our little village into a metropolis, and our handling of conflicts of interest (conflicts for short) has not kept pace with our community's growth. But we computer scientists are experts at scaling up! We have already addressed issues of scale in many other aspects of our review processes, including enhancements such as double-blind review, multiple submission deadlines, opportunities for revision and rebuttal, and online submission and review management systems.

It is time for our venues to leverage existing data sources to improve the declaration and management of conflicts, as follows.

  1. Uniquely identify all authors in bibliographic data sources, as well as all authors, reviewers, and meta-reviewers in manuscript management systems. (Meta-reviewers are those who manage the review process, such as editors-in-chief and program committee chairs.)

Duplicate names already make it impossible to unambiguously identify by name those involved in the review process, and even make it difficult for conference organizers to ensure they are inviting the right people to join their program committees. Fortunately, authenticated ORCIDsa exist for exactly this purpose, and we should require their use.

  1. Disallow changes in the author list after submission. Conflict declarations are based on the author list at the time of submission; subsequent changes may introduce new conflicts not considered during reviewer assignment.
  2. Require automated reporting of all observable conflicts. PC chairs can use a service that identifies all conflicts observable in publicly available information on co-authorships, institutional affiliations, and advisor relationships, as explained here.
  3. Require authors to self-report only non-observable conflicts, such as new employers, new collaborations, family members, and friends.
  4. Automatically audit self-reports in retrospect and share the results with the venue's sponsor or publisher, which should have the power to examine all data they consider relevant and to impose appropriate sanctions for serious violations.
  5. Use an independent and conflict-of-interest-free committee to select best papers.
  6. Consider the use of a more sophisticated definition of conflict of interest, as explained here.
  7. Involve the community and our professional societies as needed, as discussed here.

To see how an automated conflict reporting service for manuscript management systems can work, consider the traditional definition of a conflict: two people have a conflict if they wrote a paper together in the past two years, are at the same institution, are close relatives or friends, were advisor and advisee, or worked together closely on a project in the past two years. Bibliographic databases such as Google Scholarb and DBLPc implicitly provide a graph of the relevant co-authorship relationships, and can also be mined with high accuracy to identify advisor-advisee relationships.2 DBLP already uses data-driven disambiguationd of individuals and associates authors with ORCIDs and employers; see, for example, how DBLP handles its 218 different Wei Wangs.e Authenticated employer information (including unique IDs for institutions) and educational affiliations are also available directly from the ORCID service, and perhaps authenticated advisor information eventually as well.


It is time for our venues to leverage existing data sources to improve the declaration and management of conflicts.


The conflict service's input is: for each paper, the set of (uniquely identified) authors; the set of reviewers and meta-reviewers, also uniquely identified; and a menu-style specification of the venue's conflict policy. For each paper, the conflict service returns the paper's conflicts, that is, all reviewers and meta-reviewers who have an observable conflict with an author of the paper, along with an explanation of the source of the conflict. These conflicts must be added to the self-reports in the submission system, after which conference organizers can use any method of assigning papers to reviewers, for example, manually, based on bids, or using the Toronto Paper Matching Service.1 As usual, the assignment algorithm will automatically avoid all review assignments that involve a conflict. Note that the conflict service need not learn anything about a venue's submissions, beyond the set of all authors.

Two standalone beta versions of conflict services are already available to PC chairs, driven by DBLP data; onef requires authors and reviewers to be uniquely identified, and the otherg requires only reviewers to be uniquely identified.3 In the longer run, we recommend that outcalls to a conflict service be directly supported by manuscript management systems, so that the system can automatically invoke the conflict service to augment self-reports of conflicts before reviewers are assigned. We also recommend that the authors of reviewer assignment algorithms extend them to avoid additional more subtle biases in the review process, by ensuring diversity of institutions, localities, and countries of origin. Computer science research is now a global enterprise, and we should take advantage of that diversity throughout the review process.

Villagers might not need to lock their doors, but metropolis dwellers would be foolish not to. As village life slowly gave way to the anonymity of the big city, our community has had to establish ethics committees and codes of ethics, policies on plagiarism, authorship, sexual harassment, and so on. Automated reporting of observable conflicts will greatly reduce the big-city crimes of impersonating others and deliberately underreporting conflicts. Automated audits will offer a further deterrent once the conflict service is integrated into submission systems: the system can automatically recompute the observable conflicts some months after the submission deadline and compare them to those stored in the system. At a minimum, missing self-reports should result in a warning letter.

Currently, conflicts are all-or-nothing: today two recent co-authors absolutely cannot review each other's papers, but maybe tomorrow they absolutely can. Big-city life demands a more nuanced definition that recognizes all the shades of gray, so let us acknowledge that conflicts differ in their severity, drop the binary definition of conflict, and define a (degree of) conflict as a real number in [0, 1] computed by a formula specified by the publication venue (the aforementioned menu-style specification). Then we can differentiate between the severity of a conflict and a venue's publicized threshold for automatically disqualifying a reviewer, which will legitimately differ between venues (for example, a workshop versus a top-tier conference). The conflict service described here can easily support such venue-specific cutoff scores and real-valued functions for computing conflicts, making it easy for venues to define and experiment with more sophisticated measures.


The prototypes mentioned here show that one can already build useful standalone conflict services that rely on readily available data.


We also need to recognize that multiple co-authorships indicate a stronger tie. A dozen papers co-authored five years ago may pose as much of a conflict as does a single paper co-authored last year, because those dozen papers indicate a very strong tie. Further, conflicts can have multiple contributing facets, for example, same institution, same city, or a highly overlapping set of prior co-authors. We can weight each type of tie between researchers according to the strength of their tie, model the fading of ties over time as a continuous function, and devise a method to gracefully combine multiple weighted and faded factors into an overall conflict score, corresponding to our best estimate of the chance that two people cannot impartially review each other's work.

The prototypes mentioned here show that one can already build useful standalone conflict services that rely on readily available data. But we will need greater community involvement to reach the ultimate solution. Beyond the steps outlined that each venue can take today, we advocate four steps at the community level.

  1. To reach a solution suitable for all of computer science, our community will need to provide coordination and funding for infrastructure construction. This could come from the ACM Publications Board, the SIG Governing Board, the IEEE Technical Activities Board, ACM and/or IEEE as a whole, or even a computing-wide consortium that includes non-profit societies and for-profit publishers.
  2. To expand the definition of a conflict and devise the infrastructure to support that definition, we may need input from experts on the social issues of privacy and security; the technical issues of data collection, organization, and maintenance; the policy issues inherent in defining conflict broadly yet specifically; and the administrative issues in long-term maintenance and evolution of a conflict service.
  3. We should encourage research into relevant topics, including definitions of conflict, scalable algorithms to identify conflicts, and sources and methods for handling suspected false positives.
  4. Once they are in place, we should share our community's metrics, mechanisms, and infrastructure with the global research enterprise, including other scientific disciplines and the National Academies of interested countries.

Life in the big city poses new threats and challenges, but we can leverage the metropolis's great infrastructure to address those problems. By taking advantage of existing datasets, services, and mining algorithms, we can eliminate almost all the tedium of declaring and managing conflicts, with the pleasant side effect of reducing the metropolitan crime rate. With those measures in place, we can move on to develop a more nuanced understanding of what constitutes a conflict of interest.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More