Research and Advances
Artificial Intelligence and Machine Learning

Directions of Technical Innovation for Regulatable AI Systems

Public sector AI procurement checklists can help guide efforts to create regulatable AI systems.

Posted
robots examining checklists, illustration

As AI systems become more advanced and integrated into our lives, there has been a corresponding urgency to ensure they align with social values and norms, and that their benefits significantly outweigh any potential harms. In response to this imperative, legal and regulatory bodies globally are engaged in a concerted effort to develop comprehensive AI regulations.

The increasing size, generality, opaqueness, and closed nature of present-day AI systems, however, pose significant challenges to effective regulation. Even when requirements can be articulated, it remains uncertain whether and how we can verify an AI system’s compliance with these standards: A requirement that cannot be checked will not provide effective protection. If we believe that AI systems should be regulated, then AI systems must be designed to be regulatable. In this article, we consider the following question: What advances in AI systems are needed for them to be effectively regulated? We explore the answer through the lens of public sector AI procurement checklists, which offer a pragmatic perspective on the broader challenges of regulatable AI systems. The public sector uses procurement checklists to ensure its purchased products align with its organizational needs and values. In the context of AI procurement checklists, most pertinent to us are the items related to technical criteria, such as ensuring data quality, privacy, fairness, and appropriate monitoring and oversight. The desiderata distilled in these checklists are comprehensive, and are also reflected in still nascent efforts to regulate AI in the private sector. Furthermore, public sector procurement checklists are among the more developed AI regulations, some having gone through several rounds of refinement. We emphasize that we focus on public sector procurement checklists to make our discussion concrete; the technical innovations needed to satisfy those checklists are relevant to the wider discourse on creating regulatable AI systems.

Key Insights

  • We assess two public sector procurement checklists to identify innovation directions for creating regulatable AI systems.

  • We identify that while we have many existing metrics for evaluating data and model quality, there are multiple gaps. For example, new methods are needed to connect data quality and objective functions to outcomes, to identify what is possible with limited data and model access, to monitor continuously learning agents, to balance transparency and privacy, and to enhance effective human+AI interaction.

  • Moreover, many high-level decisions within these criteria inherently require an interdisciplinary approach.

Specifically, we closely examine the technical criteria from two existing procurement checklists: the World Economic Forum’s AI Procurement in a Box (WEF)45 and the Canadian Directive on Automated Decision-Making (CDADM).13 Both of these checklists have gone through extensive review. The ideation process for the CDADM began in 2016 with input from scholars, civil society advocates, and governmental officials; it has been thrice reviewed and amended since it was enacted in 2019, establishing the groundwork for the broader 2022 Artificial Intelligence and Data Act (AIDA) and a recent guide on the use of generative AI. The guidelines in the WEF’s AI Procurement in a Box were created with involvement from 200 stakeholders across government, academia, and industry; the efficacy and applicability of these guidelines have been validated through two pilot studies conducted in the U.K. and Brazil. In our review, we found these checklists to be fairly comprehensive, with perhaps the only technical area lacking sufficient attention being the human-computer interaction (HCI) elements of how the AI would be integrated into its intended environment.

The remainder of this article is organized into sections aligned with the criteria in these two checklists. The structure highlights some differences in how AI researchers and policymakers approach the same problems. For example, there is no section on fairness, as the CDADM and WEF weave it into multiple criteria. In each section, we first summarize existing technical approaches that could be used to construct AI systems that meet those criteria, and then identify areas in which current work is insufficient for vetting AI systems in the increasing breadth of contexts in which they are now being used. In such cases, additional technical innovation in AI would assist in the creation of AI systems that can be more easily regulated. Finally, we briefly outline aspects of these criteria that may seem technical but actually require interdisciplinary approaches to vet.

Throughout this exercise, we assume no concerns about expertise; that is, we presume there are sufficiently qualified AI and domain experts to review whether the AI meets the checklist criteria. Our concern is to identify to what extent experts can currently vet AI systems against these criteria, and provide a (noncomprehensive, but concrete) list of directions for technical innovation to bridge the gap toward regulatable AI systems. If AI systems can be verified against these checklists, then we will have made significant progress toward creating regulatable AI systems in general.

Inputs of the Model: (Pretraining) Data Checks

Training data characteristics have a large influence on the behavior of an AI system. For this reason, the CDADM requires that training data is “tested for unintended data biases and other factors that may unfairly impact the outcomes” as well as that the data is “relevant, accurate, up-to-date.” Similarly, the WEF asks if “relevant data” is available for the project, specifically requiring the data to “fit the criterion of fairness,” “be representative of the population that the AI solution will address,” as well as be “reasonably recent.” The CDADM further requires that the data used by the AI system is “traceable [fingerprinting], protected and accessed appropriately, and lawfully collected, used, retained, and disposed.” The technical questions underlying these criteria have to do with data documentation procedures that expose potential risks in areas such as fairness, generalization, and privacy.

What we know how to do.  We have proxies for checking many properties in these criteria (for example, data privacy, label quality, feature selection, fairness, and so on) using exploratory data analysis. As examples, we can inspect the annotation process and check inter-annotator agreement to get an idea of label quality.5 We can also measure (and correct for) imbalance in data if we are given group labels that segment the dataset. We have techniques for identifying influential points,6 outliers,7 and mislabeled points,11 which may cause models to exhibit poor performance or bias. Further, there exist standards for reporting dataset information.8

Directions requiring additional AI innovation: 

  • Metrics and generalizability. More work is needed to connect data metrics with impact on outcomes. For example, if a traffic-image dataset has a certain annotator disagreement score, what does that imply for an autonomous vehicle with a vision system trained on that data? Metrics along these lines are needed to capture which applications a dataset may be safely used for.

  • Data quality checks in the context of pretrained models. Given the prevalence of pretrained models and the (currently) limited transparency about their training data, can we develop data checks that rely solely on accessing the model, or do we require disclosure of certain information about the training data? Do checks for fine-tuning data differ from checks for pretraining data?

  • Unstructured data. For unstructured data such as images or social media messages, existing standards (such as those above) focus on reporting the statistics of the metadata. In our traffic-images example, is it sufficient to provide information about, for example, where the images were collected? Or should we also be providing information about the statistics of the image data itself?

Areas that require interdisciplinary engagement.  The specific metrics that would enable meaningful inference about the quality of the data will depend on the application. Questions around bias and fairness are also inherently multifaceted and use-case dependent. Determining how data collection respects copyright, obtains appropriate consent (opt-in vs. opt-out), and avoids misrepresentation of or detriment to the owner requires legal and social science input.15 Privacy tensions—what data is retained, what statistics are made public, what kind of access is granted to trusted auditors—must also be resolved within the broader sociotechnical context.

Outputs of the Model: (Post-hoc) System Monitoring

Once a system is deployed, it is essential to monitor its operations. The CDADM mandates the development of processes to monitor the AI system’s outcomes and “verify compliance with institutional and program legislation…on a scheduled basis.” It requires the publication of information on the system’s effectiveness and efficiency. The WEF emphasizes the need for “systematic and continuous risk monitoring during every stage of the AI solution’s life cycle” as “[t]esting the model on an ongoing basis is necessary to maintain its accuracy.” The WEF also advocates for the establishment of end-to-end auditability. Technical questions associated with these criteria include how to monitor performance (related to measuring performance more generally) and identify drift and anomalies that warrant attention.

What we know how to do.  Given a specific metric, it is relatively easy to put the monitoring into place. Methods exist that establish distributions for “normal operation” and flag anomalous values during actual operation.20 These techniques can be employed to detect shifts in top features, inputs, outputs, model confidences, calibrations, and fairness metrics. Knowing the causal structure of the environment, monitoring checks can specifically identify new confounders and mediators. In reinforcement learning (RL) settings, we can monitor whether reward distributions have changed. More generally, many best practices exist for testing AI systems.38 For example, AI developers should test their systems with multiple external datasets, and results should be stratified by task difficulty and subpopulations of interest.

Directions requiring additional AI innovation: 

  • Monitoring many metrics. Monitoring multiple metrics increases the risk of false positives, overwhelming engineers. How can we monitor many metrics efficiently while not incorrectly flagging too many cases for review? Relatedly, once in operation, what data should be gathered so that we can check additional metrics in the future? For example, consider an autonomous vehicle that has a very large number of machine learning (ML) components, processing data at fast rates: What should be stored? These questions remain despite advances in MLOps.

  • Certification of use cases. How can we certify—or at least provide appropriate confidence for—use cases for which an AI system is supposed to work well, for example, where an autonomous vehicle should be able to drive safely? Certifying neural networks for safety-critical systems is an active research area,21 as are attempts to define appropriate levels of certification. Innovation to prevent model misuse is also needed.

  • Correcting models after deployment. There exists some work on correcting deployed models in a way that does not require retraining end to end, including unlearning and fine-tuning.27 But more work remains to be done, especially for AI systems with many interacting parts.

  • Identifying relevant distribution shift. Many shifts are possible, and they may take many forms. Input distributions, relationship between inputs and outputs, and rewards (objective) may all change. For example, in newer cars, acceleration and popular colors may be different. Can we identify relevant and irrelevant shifts (for example, along the lines of Chuang et al.14)? If shifts occur in an uninterpretable embedding space, how can we explain them?

  • Monitoring online learning agents. Beyond major adverse effects, how can we identify more subtle issues, such as initial signs of catastrophic forgetting, cheating, and other harms that occur while the agent continues to perform well on its reward metric? We need more work on identifying unintended consequences early.41

Areas that require interdisciplinary engagement.  There will always be a need to make high-level decisions regarding what should be monitored in a specific context and what safety assurances or guarantees are required, for example in healthcare.18 It is crucial to translate the monitored metrics into meaningful implications that enable people to make informed decisions within the broader sociotechnical system.

Inspecting the Model: Global Explanations for Model Validation

Global explanations describe a model as a whole and are often useful for inspection or oversight. Both the CDADM and WEF require documentation that details the workings of different components in the AI system, the training data utilized, and any known biases. The CDADM notes that qualified experts should review the AI system before it goes into production. The WEF emphasizes that AI decision making should be made “as transparent as possible,” including that developers should be able to explain the AI system for public scrutiny and allow independent audits.

What we know how to do.  We can build inherently interpretable models (for example, generalized additive models, decision trees, rule-based models) for tabular and structured data.34 We have some tools for interpreting neural networks in terms of human-understandable components, such as circuits44 or even natural language.9 We can partially explain neural networks, for example, by visualizing weights or computing concept-activation vectors.35

Directions requiring additional AI innovation: 

  • Inherently interpretable models for more data types. Building interpretable models for non-tabular data (for example, images or audio) is still nascent. Learning concepts on top of the input dimensions may be useful.

  • “Openboxing” large models. Can we build interactive, hierarchical, and semantically aligned views of large models, such that these models are to some extent inherently interpretable? For example, a traffic-image classifier recognizing objects by multiplying object templates with transformation matrices would be inherently more explainable than another model without such hierarchical structure. Can we allow users to explore explanations at different levels of fidelity for different contexts? Techniques exist but have limitations.29

  • Checking value alignment. Whether it is criminal justice, benefits allocations, or autonomous driving, AI systems are increasingly used in situations that require value judgments. How do we elicit and encode societal and individual values in diverse situations? What metrics can effectively measure value alignment? How do we make this mapping transparent so that others can understand the value choices made (for example, the drivers of other cars next to the autonomous vehicle)? Advancing existing work12 is needed for our increasing use cases.

Areas that require interdisciplinary engagement.  There are the questions of what information to offer, and to whom. For example, releasing the code and environment may allow some to directly answer their questions.32 Providing an explanation broadens who can inspect the model, including users and domain experts; however, what information to release, how to extract it, and how often the information should be updated during the model’s life cycle will depend on the context. We will also need mechanisms for requesting more information about a model as new concerns emerge.

Inspecting the Model: Local Explanations About Individual Decisions

Local explanations aim to inform how a specific decision is made. In some cases, it may be sufficient to provide a “meaningful explanation” (CDADM and WEF) that describes “how and why a model performed in the way it did,” including “exactly how a machine-learning model has arrived at a result” (WEF). Technically, this might involve exposing what training data or features contribute the most to a specific output. In other cases, it may be desirable to provide “applicable recourse options” (CDADM), giving users actionable ways to change the decision.26

What we know how to do.  Given a distance notion, there are many techniques for providing local explanations. Specifically, we can find the closest point that leads to a desired output.23 This is a counterfactual, and can help users determine what features set nearby points apart. It also lays the foundation for algorithmic recourse.26

Directions requiring additional AI innovation: 

  • Defining distance metrics. As noted earlier, local explanations rely heavily on notions of nearby data. It can be difficult to adjudicate what correlations in the data should be preserved and what should not. For example, if there are correlations between the kind of sign and the geographic location in a traffic-image dataset, should those correlations be retained in the distance metric? Some work exists on using human input to define the appropriate distance metric for the purposes of explanation and recourse,26 but more work is needed.

  • Data without interpretable dimensions. The challenges of choosing distance metrics become more complex when the data dimensions are not interpretable, such as pixels in the traffic images described earlier. What is a meaningful explanation in this case? Does it take the form of other images in the dataset? Should the input first be summarized into interpretable concepts?22 Similar issues arise with text and time series.

  • Provenance adjudication. We may want to know if a particular input was used in a particular way to create a particular output (for example, in copyright). Beyond small models, this approach is in a very nascent stage.43 

  • Trade-offs between explainability and privacy/security. Releasing information for audit or recourse may allow bad actors access to private information or to game the system.37 For example, explanations in the form of training samples, like those of the traffic images, may allow actors to not only learn how to trick the autonomous vehicle but also learn about other elements of those images (that are not road signs). Advancing existing work42 is necessary to understand the resulting dynamics.

Areas that require interdisciplinary engagement.  The biggest question in these criteria is what makes an explanation “meaningful.”39 This definition will depend on the context of the task: What is a meaningful explanation for a loan denial may not be the same for a medical error. In some contexts, a single recourse may be sufficient; in others, it may be appropriate to provide multiple options. Also, recourse generated from a local explanation may not always be the right way to assist a user unhappy with a decision. For example, suppose someone believes a voice-based COVID test is in error about their disease status. Rather than providing an explanation of the voice features used to make the decision, the appropriate recourse may be to allow that person to take a traditional COVID test instead.

Designing the Model: Objective Design

All AI systems require converting a general goal (for example, drive safely) into precise, mathematical terms. However, an incorrect conversion will result in the AI behaving in unintended ways. The WEF recommends formulating the problem in a technology-agnostic way to prioritize the “development of a clear problem statement” instead of “detailing the specifics of a solution.” By emphasizing challenges and opportunities, higher-priority issues and alternative (non-AI) solutions may emerge. The WEF also underscores the importance of collaboration with peers and market partners during the objective design process, to ensure that true goals are addressed rather than just treating symptoms.

What we know how to do.  In some cases, it is possible to disaggregate a complex task into simpler components. For example, we might evaluate an autonomous vehicle for its ability to identify and forecast the trajectories of other objects in its environment, and a planner’s ability to make safe decisions given this information. Algorithms for multi-objective optimization can find a Pareto front of options corresponding to different trade-offs between desiderata.36 There is also recent work in inferring what objectives are truly desired given observed reward functions.3

Directions requiring additional AI innovation: 

  • Metrics for metrics: Measuring match to goals. What are the measures that can be used to determine whether some technical objective matches our policy goals? Objective and reward design are relatively well-studied in some domains, such as RL,24 but unsolved for many more situations—from autonomous vehicles to email text completion—in which we see AI systems used today. Further, our goals may be multifaceted; the objective must not only be faithful to our goal but also transparent in how it is faithful.

  • Robustness to various objectives. Further research is necessary to create agents that excel across a range of objectives. In RL, this research can strengthen the robustness of learned policies when objectives are not perfectly specified.30 It also applies to language models, which are trained to perform next token prediction but asked to perform tasks with various constraints and objectives and in many different worlds.1 

  • Computational constraints for more robust objectives. Related to the above, are there computational constraints (for example, Lipschitz, sparsity, uncertainty) that might prevent the agent from overfitting to an imperfectly specified technical objective in ways that tend to align with broader policy goals?

  • Understanding connections between objectives and learned model behavior. Can we efficiently explain how changes in the objective function affect model behavior? Conversely, can we explain policies in terms of compatible reward functions? Can we efficiently identify where two reward functions may result in different policies in human-understandable terms? Some prior works19 try to answer this; however, more analysis will help refine the reward function to better match the intended objectives. A further task is disentangling the influences of objective and training data on model behavior. For example, the mix of possibly conflicting beliefs in a text corpus will influence how language models trained on it behave, though all have the same objective.1 

  • Inferring goals from observed behavior. We may have examples of decisions that we know align with the true goal (for example, safe driving behavior). However, the inverse problem of inferring rewards from behavior is not identifiable. Advances in inverse RL3 are needed to ensure that a learned reward aligns with the true goals.

Areas that require interdisciplinary engagement.  Designing objective functions involves evaluating their relevance, feasibility, and consistency with broader policy goals. Often, even policy goals can be vague, complicating the development and validation of AI systems. For example, in criminal justice applications of AI, the definitions and metrics for crime can be ambiguous, and the data used may not accurately represent the judges’ true goals.25

Designing the Model: Privacy

Bad actors may use transparency about the data, code, and model for identifying private information about individuals. According to the CDADM, “[D]ata used and generated by the automated decision system” must be “protected and accessed appropriately.” The WEF also highlights the need for safeguarding data integrity, regardless of whether it is sensitive or anonymized, as unintended disclosure can still enable significant harms. Assessing the privacy requirements of different datasets is essential in determining appropriate protection levels. The WEF also encourages technological innovations that make “less intrusive use of data” or that achieve similar outcomes with “less sensitive datasets.”

What we know how to do.  Differential privacy17 is a widely accepted theoretical notion of privacy. In settings where this notion of privacy is appropriate, we have differentially private algorithms that can calculate statistical properties of data, train machine learning models, and generate synthetic data. Other privacy notions also exist; choosing which to use in a particular setting remains an open question.

Directions requiring additional AI innovation: 

  • Better trade-offs between privacy and performance. In general, differentially private models have worse predictive performance than non-private models. How can that gap be closed? Can we ensure models remain private even with many queries and in conjunction with public data? What can we maximally expose about a model and training data? Can we precisely state what cannot be exposed, for example, an omitted long tail?

  • Creating and assessing privacy definitions. How can we define privacy appropriately and meaningfully for different types of data, such as trajectories or text? What do current privacy definitions achieve on this data?

  • Privacy via minimal data collection. Can we collect different information for different individuals to provide the AI system with the minimal necessary data required for making accurate predictions?

  • Private generative models. Existing work focuses on classification. Questions remain when it comes to the privacy of large generative models: Can we prevent a generative model from replicating training data? Is there a difference between a private generative model and adding noise to data? Are empirical methods to ensure privacy, for example via reinforcement learning with human feedback, sufficient?

  • Effective unlearning. In some cases, we may wish for people to elect to remove the influence of their data on the model after training. Unlearning methods are still nascent, especially for generative models.27

Areas that require interdisciplinary engagement.  Current private models still allow third parties to infer private information via access to additional publicly available data. We need to develop new notions of privacy for this setting. Broader discussion is also needed around what to do if privacy guarantees sacrifice predictive performance, especially if the sacrifice primarily affects underrepresented groups.4 More generally, the appropriate definition of privacy, and how strict the privacy guarantee must be (for example, via hyperparameter settings), will depend on the setting.

Interacting with the Model: Human + AI Systems

AI regulations consistently highlight the importance of human involvement in automated decision making. The CDADM mandates the inclusion of “specific human intervention points” and that “the final decision must be made by a human” for high-risk decisions. It also requires contingency plans in case the AI system becomes unavailable. The primary objective is for the human decision maker to scrutinize AI systems, assume responsibility for the ultimate decision, and intervene in emergencies and system failures. We also consider the case of learning from human input.

What we know how to do.  There has been significant work on learning from humans. We can apply methods such as imitation learning and reinforcement learning from human feedback28 to orient the model based on expert control or learn human intentions. Active learning techniques can be used to proactively ask for information from humans to improve a model.33 We also have methods for humans to take the initiative to correct an agent. While methods in uncertainty quantification are always being improved, for the purposes of flagging uncertain inputs for human inspection, our current methods are reasonable.

Directions requiring additional AI innovation: 

  • HCI methods for avoiding cognitive biases. Humans have many cognitive biases and limitations. If a system behaves most of the time, people may start to over-rely on it. Confirmation bias can accompany backward reasoning (people finding ways to justify a given decision) but can be mitigated by forward reasoning (looking at the evidence).10 Bias can also come from imperfect information fusion; for example, if a human inspects the input data and then views an AI prediction based on the same input data, they may falsely believe that the AI prediction is a new, independent piece of information. Appropriate human+AI interaction can help mitigate these biases.

  • Semantic alignment and shared mental models. We need shared mental models among the human, the agent, and the methods to ground terms.2 Some works use models of humans to facilitate interaction, including modeling a person’s latent states such as cognitive workload and emotions.31 But it remains an open question as to how to develop and validate these methods for increasing numbers of human+AI use cases.

  • Time-constrained settings. How can we design safe and effective hand-offs between AI systems and humans in time-constrained settings, such as AI-assisted driving in emergencies?

  • Test-time validation of large surface models. Models with large output surfaces (for example, LLMs) will be difficult to evaluate via prospective metrics; we need methods to assist people in their validation at task time.16 

  • Evaluation and design of realistic human-in-the-loop systems. Testing various forms of shared control (for example, an AI-assisted driving system) requires significant training and risk. How can we make it more efficient?

Areas that require interdisciplinary engagement.  Human+AI decision making is highly interdisciplinary, involving social and cognitive science, psychology, and other related fields. Fortunately, HCI research has existing connections to these fields.40 Furthermore, adoption of new tools into workplaces is well studied in design, human factors research, management, and operations science. Whether, how, and which humans to include in the loop, as well as how AI systems should respond to inappropriate, slow, or absent human input will require interdisciplinary efforts.

Conclusion

In reviewing the technical criteria in two regulatory frameworks—the Canadian Directive on Automated Decision-Making and the World Economic Forum’s AI Procurement in a Box—we found that while we have many existing metrics for evaluating data and model quality, there are also many technical gaps. We conclude that advancing these concrete areas of research using an interdisciplinary approach would improve our ability to vet AI systems and create truly regulatable AI.

Acknowledgments

The authors thank Andrew Ross, Siddharth Swaroop, Rishav Chourasia, Himabindu Lakkaraju, and Brian Lim, as well as all participants of NUS Responsible, Regulatable AI Working Group 2022–2023, including Limsoon Wong, Angela Yao, Suparna Ghanvatkar, and Davin Choo.

More Online

To view the complete list of references for this article, please visit https://dx.doi.org/10.1145/3653670 and click on Supplemental Material.

    References

    • 1. Andreas, J. Language models as agent models. In Findings of the Association for Computational Linguistics: EMNLP 2022. Y.GoldbergZ.Kozareva, and Y.Zhang (Eds.) . Association for Computational Linguistics, Abu Dhabi, U.A.E. (Dec. 2022), 57695779; DOI: 10.18653/v1/2022.findings-emnlp.423.
    • 2. Andrews, R.W., Lilly, J.M., Srivastava, D., and Feigh, K.M. The role of shared mental models in human-AI teams: A theoretical review. Theoretical Issues in Ergonomics Science 24, 2 (2023), 129175; DOI: 10.1080/1463922X.2022.2061080.
    • 3. Arora, S. and Doshi, P. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence 297 (2021), 103500.
    • 4. Bagdasaryan, E., Poursaeed, O., and Shmatikov, V. Differential privacy has disparate impact on model accuracy. Advances in Neural Information Processing Systems 32 (2019).
    • 5. Bayerl, P.S. and Paul, K.I. What determines inter-coder agreement in manual annotations? A meta-analytic investigation. Computational Linguistics 37, 4 (2011), 699725.
    • 6. Belsley, D.A., Kuh, E., and Welsch, R.E. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons, (2005).
    • 7. Ben-Gal, I. Outlier detection. In Data Mining and Knowledge Discovery Handbook. Springer, (2005), 131146.
    • 8. Bender, E.M. and Friedman, B. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Trans. of the Association for Computational Linguistics 6 (2018), 587604.
    • 9. Bills, S. et al. Language models can explain neurons in language models. OpenAI. (May 9, 2023); https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html
    • 10. Bondi, E. et al. Role of human-AI interaction in selective prediction. In Proceedings of the AAAI Conf. on Artificial Intelligence, (2022).
    • 11. Brodley, C.E. and Friedl, M.A. Identifying mislabeled training data. J. of Artificial Intelligence Research 11 (1999), 131167.
    • 12. Brown, D.S., Schneider, J., Dragan, A., and Niekum, S. Value alignment verification. In Proceedings of PMLR 139 (2021), 11051115.
    • 13. Government of Canada. Directive on Automated Decision-Making, 2019; https://www.tbs-sct.canada.ca/pol/doc-eng.aspx?id=32592
    • 14. Chuang, C.-Y., Torralba, A., and Jegelka, S. Estimating generalization under distribution shifts via domain-invariant representations. In Proceedings of the Intern. Conf. on Machine Learning, PMLR 119 (2020), 19841994.
    • 15. deMan, Y. et al. Opt-in and opt-out consent procedures for the reuse of routinely recorded health data in scientific research and their consequences for consent rate and consent bias: Systematic review. J. of Medical Internet Research 25 (2023), e42131.
    • 16. Doshi-Velez, F. and Glassman, E. Contextual evaluation of AI: A new gold standard. Working Paper, (2023); https://glassmanlab.seas.harvard.edu/papers/alt_CHI_Benchmarks_are_not_enough_8p.pdf
    • 17. Dwork, C. Differential privacy. In Proceedings of Automata, Languages and Programming: 33rd Intern. Colloquium. Springer, (2006), 112.
    • 18. Feng, J. et al. Clinical artificial intelligence quality improvement: Towards continual monitoring and updating of AI algorithms in healthcare. npj Digital Medicine 5, 1 (2022), 66.
    • 19. Gajcin, J. et al. Contrastive explanations for comparing preferences of reinforcement learning. In Proceedings of AAAI Conf. on Artificial Intelligence, (2022).
    • 20. Gama, J. et al. A survey on concept drift adaptation. ACM Computing Surveys 46, 4 (2014), 137.
    • 21. Gehr, T. et al. AI2: Safety and robustness certification of neural networks with abstract interpretation. In Proceedings of 2018 IEEE Symp. on Security and Privacy. IEEE, (2018), 318.
    • 22. Ghorbani, A., Wexler, J., Zou, J.Y., and Kim, B. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems 32 (2019).
    • 23. Guidotti, R. Counterfactual explanations and how to find them: Literature review and benchmarking. Data Mining and Knowledge Discovery (2022), 155.
    • 24. Hadfield-Menell, D. et al. Inverse reward design. Advances in Neural Information Processing Systems 30 (2017).
    • 25. Isaac, W.S. Hope, hype, and fear: The promise and potential pitfalls of artificial intelligence in criminal justice. Ohio St. J. Crim. L. 15 (2017), 543.
    • 26. Karimi, A.-H., Schölkopf, B., and Valera, I. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conf. on Fairness, Accountability, and Transparency. ACM, (2021), 353362.
    • 27. Krishna, S., Ma, J., and Lakkaraju, H. Towards bridging the gaps between the right to explanation and the right to be forgotten. In Proceedings of the 40th Intern. Conf. on Machine Learning. JMLR.org, (2023).
    • 28. MacGlashan, J. et al. Interactive learning from policy-dependent human feedback. In Proceedings of the 34th Intern. Conf. on Machine Learning, PLMR 70 (2017), 22852294.
    • 29. Molnar, C. Interpretable Machine Learning. Lulu.Com, (2020).
    • 30. Moos, J. et al. Robust reinforcement learning: A review of foundations and recent advances. Machine Learning and Knowledge Extraction 4, 1 (2022), 276315.
    • 31. Ong, D.C., Zaki, J., and Goodman, N.D. Computational models of emotion inference in theory of mind: A review and roadmap. Topics in Cognitive Science 11, 2 (2019), 338357.
    • 32. Pineau, J. et al. Improving reproducibility in machine learning research (a report from the NeurIPS 2019 reproducibility program). J. of Machine Learning Research 22, 1 (2021), 74597478.
    • 33. Ren, P. et al. A survey of deep active learning. ACM Computing Surveys 54, 9 (2021), 140.
    • 34. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206215.
    • 35. Samek, W. and Müller, K.-R. Towards explainable artificial intelligence. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, (2019), 522.
    • 36. Sawaragi, Y., Nakayama, H., and Tanino, T. Theory of Multiobjective Optimization. Elsevier, (1985).
    • 37. Shokri, R., Strobel, M., and Zick, Y. On the privacy risks of model explanations. In Proceedings of the 2021 AAAI/ACM Conf. on AI, Ethics, and Society. ACM, (2021), 231241.
    • 38. Smith, A.L. et al. Artificial Intelligence and Software Testing. BCS, The Chartered Institute for IT, (2022).
    • 39. Sosa, D. Meaningful explanation. Philosophical Issues 8 (1997), 351356.
    • 40. Sundar, S.S. Rise of machine agency: A framework for studying the psychology of human–AI interaction (HAII). J. of Computer-Mediated Communication 25, 1 (2020), 7488.
    • 41. Suresh, H. and Guttag, J.V. A framework for understanding unintended consequences of machine learning. (2019), arXiv preprint arXiv:1901.10002.
    • 42. Tsirtsis, S. and Gomez Rodriguez, M. Decisions, counterfactual explanations and strategic behavior. Advances in Neural Information Processing Systems 33 (2020), 1674916760.
    • 43. Vyas, N., Kakade, S.M., and Barak, B. On provable copyright protection for generative models. In Proceedings of the 40th Intern. Conf. on Machine Learning, PMLR 202 (2023), 3527735299.
    • 44. Wang, K.R. et al. Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. In Proceedings of Intern. Conf. on Learning Representation, (2023).
    • 45. World Economic Forum. AI Procurement in a Box. Technical report, World Economic Forum, (2020); https://www.weforum.org/reports/ai-procurement-in-a-box/

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More