CLOSET: Data-Driven COI Detection and Management in Peer-Review Venues

Our peer review process aims to ensure our published results truly advance the state of the art. Our professional societies (for example, ACM, IEEE, VLDB Endowment) have worked hard toward ensuring the review and selection processes are fair and continue to fulfill this intended purpose. Our community has introduced various refinements such as single- and double-anonymous reviews, automated matching of reviewers to topics and to individual papers, multiple submission deadlines, opportunities for revision and rebuttal, “review quality week” to manually ensure quality of reviews, and a variety of guidelines for reviewer selection and manuscript handling. These refinements have greatly helped to improve the quality of refereed publications.

CLOSET detects co-authorship and institution-based unreported COIs and COI violations based on a venue-specific COI policy.

Yet during all these refinements to review processes, the operational definition of conflicts of interest (COI) and the means of declaring and handling them has changed little.^1,2 Venues typically define a COI as valid if two people wrote a paper together in the (recent) past, are at the same institution currently or in the near past, are close relatives or friends, are advisor and (former) advisee, or worked together closely on a project in the (recent) past. Conference venues often require authors to self-report COI with all potential reviewers—that is, the members of the program committee (PC). As conference PCs may easily swell beyond 200 members and with thousands of reviews to assign, PC chairs cannot realistically double-check such manual declarations of COI. Consequently, there is a great need for a framework that can automatically check whether all conflicts were reported, which is important in any case but vital if anyone should ever try to circumvent the system by deliberately under-reporting COI.^1,2

CLOSET is a novel data-driven system, designed, and developed at Nanyang Technological University (NTU) for detecting and reporting unreported COIs and COI violations, if any, in a peer-reviewed venue.^a The accompanying figure depicts the framework of CLOSET. Given a set of bibliographic data sources (for example, DBLP) and a set of reviewers (initially) assigned to each paper by a review management system (RMS; for example, Microsoft’s CMT, EasyChair) hosting a specific venue, it first takes a semiautomated approach to prepare venue-specific data for COI management by integrating, cleaning, and storing relevant data. Next, it deploys an efficient COI-detection technique that automatically finds various types of COI.

Figure. CLOSET framework.

The current version of CLOSET detects co-authorship and institution-based unreported COIs and COI violations based on a venue-specific COI policy. It also detects potential cases of submarine COIs¹ between author-reviewer pairs for submissions, which are COIs due to potential bias that may still exist between two people even if they do not have past co-authorship/co-worker/ family relationship. Finally, these results are exposed to PC chairs in a variety of user-friendly reports (for example, structured tables, charts, network views) with natural language-based explanations to aid their decision making. Note that CLOSET is orthogonal to the “secret sauce” an RMS uses for reviewer assignment to papers. The results of CLOSET can also be uploaded to an RMS to facilitate efficient COI-free reviewer reassignment. Currently, CLOSET has successfully interfaced with CMT to this end.

At first glance, it may seem the COI detection and management problem is straightforward as we can simply check the existence of an author’s name in the co-author list of a reviewer from any bibliographic data source or check if an author-specified domain name is identical to that of the reviewer (such as a co-worker). While this may unearth some unreported COIs, our investigation with real-world data reveals that such a strategy may often miss out many valid COI cases due to challenges brought by the characteristics of RMS data (for example, unstructured, dirty, missing data, homonymous names).

Consequently, CLOSET has an built-in data cleaning and an efficient homonymous name detection mechanism to tackle these challenges. In particular, the explanation generation component of CLOSET annotates potential homonymous name cases in results with explanations for PC chairs to investigate. Furthermore, automatic discovery of submarine COIs is an intriguing problem. To this end, CLOSET leverages a novel algorithm grounded on network science, data analytics, and social psychology theories (for example, social influence theory, social impact theory).

CLOSET has made a significant impact on how the review process is managed at premium data management venues.

To date, CLOSET has been deployed in more than 18 venues. It revealed the COI management mechanism in existing industrial-strength RMS can often be inadequate for the intended purpose. For all deployed venues, CLOSET has detected a significant number of unreported COIs (on average, at least 25% of the submissions in a venue have unreported COIs) and COI violations that have evaded the detection mechanisms of hosting RMS. Specifically, it has made a significant impact on how the review process is managed at premium data management venues such as SIGMOD and VLDB. It has influenced changes to the specifications and implementation of long-standing COI policies and management of COI in these venues.

CLOSET is now used as the default tool in these venues. It has also inspired several community members to contribute on top of it (for example, richer visualization support by the University of Illinois Urbana-Champaign group led by Marianne Winslett, a plug-in for ingesting data from several theoretical computer science venues led by Christel Baier). In summary, CLOSET is a successful example of an end-to-end, data-driven system originating from a southeast Asian university, built on the solid foundation of network science, data analytics, and social psychology theories, and used in practice. We are not aware of any other system with similar features as CLOSET deployed in the real world.