Artificial Intelligence and Machine Learning

Directions of Technical Innovation for Regulatable AI Systems

Public sector AI procurement checklists can help guide efforts to create regulatable AI systems.

By Xudong Shen, Hannah Brown, Jiashu Tao, Martin Strobel, Yao Tong, Akshay Narayan, Harold Soh, and Finale Doshi-Velez

Posted Oct 1 2024

Inputs of the Model: (Pretraining) Data Checks
Outputs of the Model: (Post-hoc) System Monitoring
Inspecting the Model: Global Explanations for Model Validation
Inspecting the Model: Local Explanations About Individual Decisions
Designing the Model: Objective Design
Designing the Model: Privacy
Interacting with the Model: Human + AI Systems
Conclusion
Acknowledgments
More Online
References

As AI systems become more advanced and integrated into our lives, there has been a corresponding urgency to ensure they align with social values and norms, and that their benefits significantly outweigh any potential harms. In response to this imperative, legal and regulatory bodies globally are engaged in a concerted effort to develop comprehensive AI regulations.

The increasing size, generality, opaqueness, and closed nature of present-day AI systems, however, pose significant challenges to effective regulation. Even when requirements can be articulated, it remains uncertain whether and how we can verify an AI system’s compliance with these standards: A requirement that cannot be checked will not provide effective protection. If we believe that AI systems should be regulated, then AI systems must be designed to be regulatable. In this article, we consider the following question: What advances in AI systems are needed for them to be effectively regulated? We explore the answer through the lens of public sector AI procurement checklists, which offer a pragmatic perspective on the broader challenges of regulatable AI systems. The public sector uses procurement checklists to ensure its purchased products align with its organizational needs and values. In the context of AI procurement checklists, most pertinent to us are the items related to technical criteria, such as ensuring data quality, privacy, fairness, and appropriate monitoring and oversight. The desiderata distilled in these checklists are comprehensive, and are also reflected in still nascent efforts to regulate AI in the private sector. Furthermore, public sector procurement checklists are among the more developed AI regulations, some having gone through several rounds of refinement. We emphasize that we focus on public sector procurement checklists to make our discussion concrete; the technical innovations needed to satisfy those checklists are relevant to the wider discourse on creating regulatable AI systems.

Key Insights

We assess two public sector procurement checklists to identify innovation directions for creating regulatable AI systems.
We identify that while we have many existing metrics for evaluating data and model quality, there are multiple gaps. For example, new methods are needed to connect data quality and objective functions to outcomes, to identify what is possible with limited data and model access, to monitor continuously learning agents, to balance transparency and privacy, and to enhance effective human+AI interaction.
Moreover, many high-level decisions within these criteria inherently require an interdisciplinary approach.

Specifically, we closely examine the technical criteria from two existing procurement checklists: the World Economic Forum’s AI Procurement in a Box (WEF)⁴⁵ and the Canadian Directive on Automated Decision-Making (CDADM).¹³ Both of these checklists have gone through extensive review. The ideation process for the CDADM began in 2016 with input from scholars, civil society advocates, and governmental officials; it has been thrice reviewed and amended since it was enacted in 2019, establishing the groundwork for the broader 2022 Artificial Intelligence and Data Act (AIDA) and a recent guide on the use of generative AI. The guidelines in the WEF’s AI Procurement in a Box were created with involvement from 200 stakeholders across government, academia, and industry; the efficacy and applicability of these guidelines have been validated through two pilot studies conducted in the U.K. and Brazil. In our review, we found these checklists to be fairly comprehensive, with perhaps the only technical area lacking sufficient attention being the human-computer interaction (HCI) elements of how the AI would be integrated into its intended environment.

The remainder of this article is organized into sections aligned with the criteria in these two checklists. The structure highlights some differences in how AI researchers and policymakers approach the same problems. For example, there is no section on fairness, as the CDADM and WEF weave it into multiple criteria. In each section, we first summarize existing technical approaches that could be used to construct AI systems that meet those criteria, and then identify areas in which current work is insufficient for vetting AI systems in the increasing breadth of contexts in which they are now being used. In such cases, additional technical innovation in AI would assist in the creation of AI systems that can be more easily regulated. Finally, we briefly outline aspects of these criteria that may seem technical but actually require interdisciplinary approaches to vet.

Throughout this exercise, we assume no concerns about expertise; that is, we presume there are sufficiently qualified AI and domain experts to review whether the AI meets the checklist criteria. Our concern is to identify to what extent experts can currently vet AI systems against these criteria, and provide a (noncomprehensive, but concrete) list of directions for technical innovation to bridge the gap toward regulatable AI systems. If AI systems can be verified against these checklists, then we will have made significant progress toward creating regulatable AI systems in general.

Inputs of the Model: (Pretraining) Data Checks

Training data characteristics have a large influence on the behavior of an AI system. For this reason, the CDADM requires that training data is “tested for unintended data biases and other factors that may unfairly impact the outcomes” as well as that the data is “relevant, accurate, up-to-date.” Similarly, the WEF asks if “relevant data” is available for the project, specifically requiring the data to “fit the criterion of fairness,” “be representative of the population that the AI solution will address,” as well as be “reasonably recent.” The CDADM further requires that the data used by the AI system is “traceable [fingerprinting], protected and accessed appropriately, and lawfully collected, used, retained, and disposed.” The technical questions underlying these criteria have to do with data documentation procedures that expose potential risks in areas such as fairness, generalization, and privacy.

What we know how to do. We have proxies for checking many properties in these criteria (for example, data privacy, label quality, feature selection, fairness, and so on) using exploratory data analysis. As examples, we can inspect the annotation process and check inter-annotator agreement to get an idea of label quality.⁵ We can also measure (and correct for) imbalance in data if we are given group labels that segment the dataset. We have techniques for identifying influential points,⁶ outliers,⁷ and mislabeled points,¹¹ which may cause models to exhibit poor performance or bias. Further, there exist standards for reporting dataset information.⁸