Sign In

Communications of the ACM

Viewpoint

We Need to Automate the Declaration of Conflicts of Interest


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
hand touching futuristic interface

Credit: Alexander Supertramp

Over the last 70 years of computer science research, our handling of conflicts of interest has changed very little. Each paper's corresponding author must still manually declare all their co-authors' conflicts of interest, even though they probably know little about their most senior coauthors' recent activities. As top-tier conference program committees increase past 500 members, many with common, easily confusable names, PC chairs with thousands of reviews to assign cannot possibly double-check corresponding authors' manual declarations against their paper's assigned reviewers. Nor can reviewers reliably catch unreported conflicts. Audits at recent top-tier venues across several areas of computer science each uncovered more than 100 instances where, at the first venue, a pair of recent coauthors failed to declare their conflict of interest; at the second venue, someone was assigned to review a recent co-author's submission; and at the third venue, someone reviewed a submission written by a prior co-author from any year. Even the concept of a conflict deserves closer scrutiny: an audit at yet another recent top-tier venue edition found more than 100 cases in which prior co-authors from any year reviewed each other's submissions.

These are issues of scale. Seventy years of exponential growth have turned our little village into a metropolis, and our handling of conflicts of interest (conflicts for short) has not kept pace with our community's growth. But we computer scientists are experts at scaling up! We have already addressed issues of scale in many other aspects of our review processes, including enhancements such as double-blind review, multiple submission deadlines, opportunities for revision and rebuttal, and online submission and review management systems.

It is time for our venues to leverage existing data sources to improve the declaration and management of conflicts, as follows.

  1. Uniquely identify all authors in bibliographic data sources, as well as all authors, reviewers, and meta-reviewers in manuscript management systems. (Meta-reviewers are those who manage the review process, such as editors-in-chief and program committee chairs.)

Duplicate names already make it impossible to unambiguously identify by name those involved in the review process, and even make it difficult for conference organizers to ensure they are inviting the right people to join their program committees. Fortunately, authenticated ORCIDsa exist for exactly this purpose, and we should require their use.

  1. Disallow changes in the author list after submission. Conflict declarations are based on the author list at the time of submission; subsequent changes may introduce new conflicts not considered during reviewer assignment.
  2. Require automated reporting of all observable conflicts. PC chairs can use a service that identifies all conflicts observable in publicly available information on co-authorships, institutional affiliations, and advisor relationships, as explained here.
  3. Require authors to self-report only non-observable conflicts, such as new employers, new collaborations, family members, and friends.
  4. Automatically audit self-reports in retrospect and share the results with the venue's sponsor or publisher, which should have the power to examine all data they consider relevant and to impose appropriate sanctions for serious violations.
  5. Use an independent and conflict-of-interest-free committee to select best papers.
  6. Consider the use of a more sophisticated definition of conflict of interest, as explained here.
  7. Involve the community and our professional societies as needed, as discussed here.

To see how an automated conflict reporting service for manuscript management systems can work, consider the traditional definition of a conflict: two people have a conflict if they wrote a paper together in the past two years, are at the same institution, are close relatives or friends, were advisor and advisee, or worked together closely on a project in the past two years. Bibliographic databases such as Google Scholarb and DBLPc implicitly provide a graph of the relevant co-authorship relationships, and can also be mined with high accuracy to identify advisor-advisee relationships.2 DBLP already uses data-driven disambiguationd of individuals and associates authors with ORCIDs and employers; see, for example, how DBLP handles its 218 different Wei Wangs.e Authenticated employer information (including unique IDs for institutions) and educational affiliations are also available directly from the ORCID service, and perhaps authenticated advisor information eventually as well.


It is time for our venues to leverage existing data sources to improve the declaration and management of conflicts.


The conflict service's input is: for each paper, the set of (uniquely identified) authors; the set of reviewers and meta-reviewers, also uniquely identified; and a menu-style specification of the venue's conflict policy. For each paper, the conflict service returns the paper's conflicts, that is, all reviewers and meta-reviewers who have an observable conflict with an author of the paper, along with an explanation of the source of the conflict. These conflicts must be added to the self-reports in the submission system, after which conference organizers can use any method of assigning papers to reviewers, for example, manually, based on bids, or using the Toronto Paper Matching Service.1 As usual, the assignment algorithm will automatically avoid all review assignments that involve a conflict. Note that the conflict service need not learn anything about a venue's submissions, beyond the set of all authors.

Two standalone beta versions of conflict services are already available to PC chairs, driven by DBLP data; onef requires authors and reviewers to be uniquely identified, and the otherg requires only reviewers to be uniquely identified.3 In the longer run, we recommend that outcalls to a conflict service be directly supported by manuscript management systems, so that the system can automatically invoke the conflict service to augment self-reports of conflicts before reviewers are assigned. We also recommend that the authors of reviewer assignment algorithms extend them to avoid additional more subtle biases in the review process, by ensuring diversity of institutions, localities, and countries of origin. Computer science research is now a global enterprise, and we should take advantage of that diversity throughout the review process.

Villagers might not need to lock their doors, but metropolis dwellers would be foolish not to. As village life slowly gave way to the anonymity of the big city, our community has had to establish ethics committees and codes of ethics, policies on plagiarism, authorship, sexual harassment, and so on. Automated reporting of observable conflicts will greatly reduce the big-city crimes of impersonating others and deliberately underreporting conflicts. Automated audits will offer a further deterrent once the conflict service is integrated into submission systems: the system can automatically recompute the observable conflicts some months after the submission deadline and compare them to those stored in the system. At a minimum, missing self-reports should result in a warning letter.

Currently, conflicts are all-or-nothing: today two recent co-authors absolutely cannot review each other's papers, but maybe tomorrow they absolutely can. Big-city life demands a more nuanced definition that recognizes all the shades of gray, so let us acknowledge that conflicts differ in their severity, drop the binary definition of conflict, and define a (degree of) conflict as a real number in [0, 1] computed by a formula specified by the publication venue (the aforementioned menu-style specification). Then we can differentiate between the severity of a conflict and a venue's publicized threshold for automatically disqualifying a reviewer, which will legitimately differ between venues (for example, a workshop versus a top-tier conference). The conflict service described here can easily support such venue-specific cutoff scores and real-valued functions for computing conflicts, making it easy for venues to define and experiment with more sophisticated measures.


The prototypes mentioned here show that one can already build useful standalone conflict services that rely on readily available data.


We also need to recognize that multiple co-authorships indicate a stronger tie. A dozen papers co-authored five years ago may pose as much of a conflict as does a single paper co-authored last year, because those dozen papers indicate a very strong tie. Further, conflicts can have multiple contributing facets, for example, same institution, same city, or a highly overlapping set of prior co-authors. We can weight each type of tie between researchers according to the strength of their tie, model the fading of ties over time as a continuous function, and devise a method to gracefully combine multiple weighted and faded factors into an overall conflict score, corresponding to our best estimate of the chance that two people cannot impartially review each other's work.

The prototypes mentioned here show that one can already build useful standalone conflict services that rely on readily available data. But we will need greater community involvement to reach the ultimate solution. Beyond the steps outlined that each venue can take today, we advocate four steps at the community level.

  1. To reach a solution suitable for all of computer science, our community will need to provide coordination and funding for infrastructure construction. This could come from the ACM Publications Board, the SIG Governing Board, the IEEE Technical Activities Board, ACM and/or IEEE as a whole, or even a computing-wide consortium that includes non-profit societies and for-profit publishers.
  2. To expand the definition of a conflict and devise the infrastructure to support that definition, we may need input from experts on the social issues of privacy and security; the technical issues of data collection, organization, and maintenance; the policy issues inherent in defining conflict broadly yet specifically; and the administrative issues in long-term maintenance and evolution of a conflict service.
  3. We should encourage research into relevant topics, including definitions of conflict, scalable algorithms to identify conflicts, and sources and methods for handling suspected false positives.
  4. Once they are in place, we should share our community's metrics, mechanisms, and infrastructure with the global research enterprise, including other scientific disciplines and the National Academies of interested countries.

Life in the big city poses new threats and challenges, but we can leverage the metropolis's great infrastructure to address those problems. By taking advantage of existing datasets, services, and mining algorithms, we can eliminate almost all the tedium of declaring and managing conflicts, with the pleasant side effect of reducing the metropolitan crime rate. With those measures in place, we can move on to develop a more nuanced understanding of what constitutes a conflict of interest.

Back to Top

References

1. Charlin, L. and Zemel, R.S. The Toronto paper matching system: An automated paper-reviewer assignment system. In Proceedings of the International Conference on Machine Learning (ICML) 2013.

2. Wang, C. et al. Mining advisor-advisee relationships from research publication networks. In Proceedings of the 16th ACM Conference on Knowledge Discovery and Data Mining (KDD), 2010.

3. Wu, S. PISTIS: A conflict of interest declaration and detection system for peer review management. In Proceedings of the 2018 ACM SIGMOD/PODS Conference, 2018.

Back to Top

Authors

Richard T. Snodgrass (rts@email.arizona.edu) is a Professor and Galileo Scholar at the University of Arizona, Tucson, AZ, USA. He is an ACM Fellow, has served as editor-in-chief of ACM TODS and as chair of ACM SIGMOD and the ACM Publications Board, and was founding co-chair of the ACM History Committee.

Marianne Winslett (winslett@illinois.edu) is a research professor emerita at the University of Illinois, Urbana IL, USA. She is an ACM Fellow and has served as a coeditor-in-chief of ACM TWEB, as an officer of SIGMOD and SIGART, on the steering committees of ACM CIKM and ACM CCS, and on the editorial boards of ACM TODS, ACM TISSEC, ACM TWEB, IEEE TKDE, and the VLDB Journal.

Back to Top

Footnotes

a. The Open Researcher and Contributor ID (ORCID) is an international non-profit initiative to uniquely identify scientific and other academic authors; see https://orchid.org

b. See https://google.scholar.com

c. See https://dblp.org

d. See https://dblp.uni-trier.de/faq/17334571.html

e. See https://dblp.uni-trier.de/pers/hd/w/wang:wei

f. See https://github.com/ebina1/conflict-of-interest

g. See https://www.ntu.edu.sg/home/assourav/research/DARE/closet.html


Copyright held by authors.
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2020 ACM, Inc.


Comments


Sushil Jajodia

I wholeheartedly agree - it's high time that this is done. Perhaps NSF will see fit to make this happen.
Sushil Jajodia


Displaying 1 comment