Research and Advances
Artificial Intelligence and Machine Learning Research

Humble AI

An effort to bring artificial intelligence into better alignment with our moral aims and finally realize the vision of superior decision making through AI.


One of the central uses of artificial intelligence (AI) is to make predictions. The ability to learn statistical relationships within enormous datasets enables AI, given a set of current conditions or features, to predict future outcomes, often with exceptional accuracy. Increasingly, AI is being used to make predictions about individual human behavior in the form of risk assessments. Algorithms are used to estimate the likelihood that an individual will fully repay a loan, appear at a bail hearing, or safeguard children. These predictions are used to guide decisions about whether vital opportunities (to access credit, to await trial at home rather than while incarcerated, or to retain custody) are extended or withdrawn.

Back to Top

Key Insights

  • Decisions based on AI-generated predictions of a person’s future behavior are naturally interpreted by decision subjects as assessments of their trustworthiness.
  • Minimizing risk by denying opportunities to seemingly untrustworthy individuals increases the risk of trustworthy individuals receiving negative decisions.
  • Experiences of being wrongly distrusted by AI contribute to public distrust of AI.
  • Humble AI calls for AI developers and deployers to appreciate and mitigate these harmful effects.
  • This approach will help to align the application of AI with human values and promote public trust.

An adverse decision—for instance, a denial of credit based on a prediction of probable loan default—has negative consequences for the decision subject, both in the near term and into the quite distant future (see the sidebar on credit scoring for an example). In an ideal world, such decisions would be made on the basis of a person’s individual character, on their trustworthiness. But forecasting behavior is not tantamount to assessing trustworthiness. The latter task requires understanding reasons, motivations, circumstances, and the presence or absence of morally excusing conditions.9 Although a behavioral prediction is not the same as an evaluation of moral character, it may well be experienced that way. Humans are highly sensitive to whether others perceive them as trustworthy.25 A decision to withhold an opportunity on the basis that a person is “too risky” is naturally interpreted as a derogation of character. This can lead to insult, injury, demoralization, and marginalization.

Creators of AI systems can rather easily understand the costs of false positives, where people are incorrectly predicted to carry out the desired behavior. These costs include loan defaults that reduce profitability, criminal suspects at large who compromise public safety, and abusive caretakers who provide unsafe care to vulnerable parties. Seeking to avoid these costs—a stance elsewhere described as precautionary decision making—is consistent with a natural human tendency toward loss aversion, and it can be the optimal strategy when the costs of false positives are much greater than the costs of false negatives.28 That said, the costs of false negatives are often much harder to understand and thus fully appreciate. How much is lost, for example, by withholding credit from those who would actually repay their loans, by denying bail to those who would dutifully show up at a hearing, or by removing children from responsible caretakers? We argue in what follows that the consequences of false negatives are widely underestimated, contributing to cascading harms that animate widespread public distrust of AI. As a corrective measure, we offer the notion of Humble AI. We propose that this humble stance can align AI systems with moral aims of respect and inclusion as well as contribute to broader public trust in AI.

Back to Top

Distrustful AI

In social science literature, trust is usually understood as a willingness to be vulnerable to the harm that would occur if the trusted party acts in a way that is untrustworthy, and distrust is conceived as a “retreat from vulnerability.”8 AI systems do not have mental states, such as beliefs (about a person’s trustworthiness), affective attitudes (fear, anxiety, and so on), or intentions (to approach or to withdraw). Nonetheless, these mental states, attitudes, and intentions may still be inferred by a human receiving a decision, not least because they can quite reasonably be ascribed to the humans who develop, own, and deploy AI systems. So, while AI is not capable of thinking or feeling in ways humans do when they trust or distrust, those who create and deploy AI influence the system’s tendency to accept or to avoid the risk of decision subjects acting in untrustworthy ways. For this reason, we find it apt to describe AI tuned to avoid the risk of untrustworthy behavior as characteristically distrustful. As we discuss here, this distrustful stance has consequences quite apart from whether decisions are statistically “fair.”

The consequences of false negatives are widely underestimated, contributing to cascading harms that animate widespread public distrust of AI.

Distrust contributes to misidentifying untrustworthiness. High decision accuracy can be characterized as some near-optimal combination of true positives and true negatives. Again, taking the example of granting a loan, the optimal case would have all granted loans being eventually repaid (true positives) with all rejected loans being properly classified as a future default (true negatives). From the perspective of the humans deploying this optimal system, it has correctly distinguished the trustworthy from the untrustworthy. In contrast, AIs inclined toward distrust are not just more likely to identify the untrustworthy, they are also more likely to misidentify people as untrustworthy.

There are a couple of reasons why:

Amplification of weak signals. When a machine-learning classifier based on correlations rather than causal phenomena operates in a regime with high costs of false positives, the decision threshold gets pushed toward the tail of the likelihood functions where weak correlations have undue influence (see Figure 1). Here, decisions are more likely to be based on spurious correlations of the target variable with irrelevant features.

Figure 1. The cost of false positives and false negatives.

Decreased responsiveness to evidence of misrecognition. In addition to these false negatives being more likely, they are also less likely to be detected as false. When focused on lowering the costs of false positives, AI creators are inclined to interpret the identification of a high proportion of “untrustworthy” individuals as validating their model rather than indicating a need for additional training or adjustment of the decision threshold.

Therefore, distrustful AI is more likely to lead to denial of opportunities for individuals who actually deserve them, inflicting unnecessary harm in ways that can, quite understandably, increase the public’s distrust of AI.

The inertia of distrust. It is a known phenomenon of interpersonal relationships that distrust tends to be self-reinforcing,8,18 fueled by “distrust-philic”14 emotional responses of both parties. Interestingly, here we see that AIs, entirely lacking in affect, are also implicated in what can be seen as distorted reasoning that reinforces untrustworthiness classifications. There are three components of this distortion:

Fundamental attribution error. The reason credit score is such a widely used indicator is because it offers an easily legible and seemingly objective measure of a person’s global trustworthiness. When an AI uses credit score as a feature in areas outside of credit worthiness, it is assuming the score conveys relatively stable information about the person’s disposition, and this information is useful in predicting behavior across a wide range of contexts. In social psychology, the tendency to over-emphasize dispositional over situational explanations of behavior is known as the fundamental attribution error.13 When the output of one system is used as a feature for another, whatever situational information that may have influenced the first system is at least diluted if not eliminated in the second system. This leads to a systematic over-emphasis on disposition, which ultimately means that a person who is wrongly categorized as untrustworthy by one AI has a greater chance of being wrongly categorized as untrustworthy by other AIs.

Asymmetrical feedback. When a person is trusted, they are given an opportunity to carry out the task they are trusted to do, and usually they are highly motivated to demonstrate their trustworthiness.1,19,22 The resulting behavior generates new data that serves as “confirming” feedback to the trustor as they continually recalibrate trust.29 An individual’s tendency to meet commitments or to fall short of them will, over time, influence AI’s classifications of trustworthiness. In contrast, the distrusted lack the opportunity to become reclassified because they do not have a chance to demonstrate how they would have responded were they to have been trusted. Distrustful strategies—retreating, withdrawing, avoiding reliance—lead to systematic under-trusting10 by reducing information about people’s trust-responsiveness that is needed to recalibrate misplaced distrust.

Reliance on proxies. In contrast to a person who is distrusted, a trusted person not only receives a favorable decision, but that decision often creates opportunities characteristic of “trustworthy” individuals. Consider the lasting impacts of a decision regarding eligibility for rented accommodation: A trusted person is granted an apartment in an affluent neighborhood. A distrusted one is denied the same accommodation and takes an apartment in a less affluent neighborhood, which may also have a higher crime rate. In a future decision about these two individuals, the one with a “better” postal code is more likely to be seen as more trustworthy23 to the extent that proxy is used as a model feature in other AIs. Decisions based on proxies that are seen as being evidence of untrustworthiness perpetuate long-term disadvantage by making it easier for the trusted and harder for the distrusted to be recognized as trustworthy.

Here we see that, as in interpersonal relationships, distrust can create pernicious spirals that are very difficult to escape. Distrust by AI feeds itself insofar as it leads to system (by which we mean both within-system and system-of-system) feedbacks that prevent re-trusting those classified as untrustworthy. The difficulty faced by distrusted individuals in establishing their trustworthiness makes miscategorization by AI more consequential than it may immediately seem.

People distrust those who distrust them. Having established that distrustful AI increases the likelihood of individuals being labeled as untrustworthy, let’s now consider what it feels like to be on the receiving end of such labeling. Beyond the immediate consequences of an unfavorable decision, a decision that labels an individual as “too risky” (to receive a loan, to be granted bail, to retain custody) is a derogation of character which has important effects even when, or particularly when, it is unwarranted.

Resentment. Frustratingly, trustworthiness does not ipso facto engender trust. A perfectly legitimate response to one’s trustworthiness not being recognized is ill will toward the decision maker. When a person repeatedly experiences a mismatch between the evidence of trustworthiness required by AI and the evidence they can provide, this naturally breeds resentment for being made to play a game whose rules are tilted against them. Resentment typically leads to “obsessively replay[ing]” one’s injury, and also “cuts off search for possible mitigating factors or alternative explanations of it”,14 allowing distrust to build.

Demoralization. It is valuable to have one’s trustworthiness broadcast to other parties. As such, when a person is trusted, they have an incentive to respond to this trust so that this signal continues to be broadcast (this is known as the “trust-responsiveness mechanism”).22 In contrast, a person who is distrusted lacks any such incentive because the signals they send are less likely to receive uptake. It is possible that being repeatedly identified as untrustworthy by AIs may demoralize a person in ways that spill over into how they comport themselves in their daily lives. In turn, this behavior may produce signals that serve as input to other AIs, which are then more likely to identify untrustworthiness.

Just as trustworthiness is cultivated and reinforced by trust, so too is untrustworthiness cultivated and reinforced by distrust. The recursive pattern of trust- and distrust-responsiveness means knowledge that a person is widely distrusted, whether or not such distrust is merited, is (pro tanto) evidence that they are untrustworthy. Conversely, a distrusted person who has a grasp of the interpretive biasing that is a signature of distrustful attitudes has (pro tanto) evidence that innocent actions are likely to be misinterpreted. This person will not trust others to trust him. The mere anticipation of such misrecognition diminishes the motivation to be responsive to trust, generating a pernicious and self-reinforcing equilibrium.9

Alienation and contempt. When a person is distrusted for inscrutable reasons (as is the case with most AI systems), they may draw on other experiences of pain and injustice to construct what to them seems a plausible explanation.8 Individuals who cannot produce a sufficiently rich digital trail that matches expected patterns will find themselves unable to signal their trustworthiness and render it legible to AI systems. Those experiencing exclusion because they cannot satisfy AI systems may quite rightly feel contempt for the entire world represented by AI—the system of systems that circumscribe the new rules in society that appear to make it harder for certain individuals to succeed.

These mechanisms help explain how distrustful AI provokes reciprocated distrust by the public. As we see, there is an important affective dimension of trust and distrust—while one may reason about trust, emotions have a strong influence on that reasoning.14 Contempt, for example, is a “totalizing emotion…it focuses on the person as a whole rather than on some aspect of them.”14 This is important when we consider people’s broad distrust of AI to make any decision about them,27 as this reflects a collapsing of all AIs into a single, threatening entity. These emotions may also reduce the individual’s receptivity to the notion that AI could be trustworthy—for example, even if improvements are implemented to correct for the original misrecognition. Thus, we see that distrustful AI not only justifies distrust of AI, but also triggers affective feedback loops that intensify and entrench public distrust.

Back to Top

A Proposal to Emulate Humble Trust

To help address legitimate concerns about misrecognition by AI, we propose several affirmative measures inspired by the notion of “humble trust.”8 Underlying humble trust is an awareness of and a concern for the harm caused by misrecognition. This means balancing the aim to not trust the untrustworthy with the aim to avoid misrecognition of the trustworthy. The practice of humble trust entails13 (emphases added):

  • Skepticism about the warrant of one’s own felt attitudes of trust and distrust.”
  • Curiosity about who might be unexpectedly responsive to trust and in which contexts.”
  • Commitment to abjure and to avoid distrust of the trustworthy.”

These principles have important implications for features, labels, costs, and thresholds of the decision functions in a machine-learning system.

Skepticism: Confidence and verification. It is sensible to want to determine with the greatest possible accuracy who is trustworthy and who is not, but in doing so, one must be aware of the limitations of statistical reasoning and be open to the possibility of getting it wrong. While machines can find usefully predictive relationships within large volumes of data, we know that AI interpretation, like that of humans, is susceptible to uncertainty and failure. It is not always clear which feature or combination of features is most predictive of the desired behavior, nor how the available data relates to those features.

Distrustful AI not only justifies distrust of AI, but also triggers affective feedback loops that intensify and entrench public distrust.

Key to avoiding overestimating the predictive capability of machines is recognizing the information loss that occurs in selecting a set of features while ignoring others7 and the uncertainty that results.5 What is the system not seeing by focusing on what it is focusing on? Are the data and model uncertainty too large? An appropriately skeptical stance would be to assume the model is missing information that could be relevant, to not be satisfied that a metric alone tells the whole story, and to actively seek out non-traditional evidence of trustworthiness as features that allow people to show themselves more fully. An active feature acquisition approach proposed by Bakker et al. operationalizes skepticism exactly along these lines and achieves fairness for both groups and individuals by continuing to seek additional features about individuals as long as the AI remains too uncertain.2

Those who build and deploy AI systems must not lose track of the crucial distinction between predicting behavior and assessing trustworthiness. Being trustworthy is a matter of responsiveness to being counted on and doggedness in meeting commitments. Such qualities are not necessarily derivable from a person’s behaviors, which are influenced also by circumstance and opportunity. Being able to predict whether a person will meet a commitment does not imply an understanding of how easy or difficult it will be for that person, or what mitigating or excusing conditions should be considered.9

It is also worth noting that being forthright about such limitations does not necessarily, and should not, lead to reduced trust. In the wise words of Onara O’Neill, “Speaking truthfully does not damage trust, it creates a climate for trust.”21

Curiosity: Trust-responsiveness. Part of being open to the possibility of having gotten a decision wrong is being curious about what might have happened if a different decision were made. This means creating opportunities for the AI to learn about the trust-responsiveness of people who fall below the decision threshold—in practice, extending trust to those who might betray it and seeing if the expectation of betrayal is fulfilled. Nuanced solutions to this problem (also known as the “selective labels problem”) have been addressed in principled ways in the decision-making and machine-learning literature. For example, Wei balances the costs of learning with future benefit through a partially observed Markov decision process that shifts a classifier’s decision threshold to more and more stringent positions as it sees more people who would normally have been classified as untrustworthy.29 This may be seen as a way of conducting safe exploration wherein the system exhibits curiosity up to a point that does not induce undue harms.20 It may also be seen as involving satisficing behavior, a decision strategy that aims for a satisfactory result but not necessarily the optimal one if curiosity were not a consideration.30

Commitment: Investing in identifying and supporting trustworthiness. A common economic objective in deploying AI to make forecasts about human behavior is lowering the costs of making a decision. Eliminating humans from the decision process is tempting for this very reason. But AIs can do more harm than good when this sort of efficiency is pursued to the exclusion of other values, such as quality, fairness, and social inclusion. Commitment is exemplified by doing something even when it is tempting not to; so a commitment to avoiding distrust of the trustworthy means adjusting the model or the wider decision-making process, even when those changes reduce the overall efficiency.

An embodiment of such commitment is the establishment of an institutional process (such as an AI Ethics review board) to carefully consider the costs of false negatives along with false positives, better aligning each AI system with core values of fairness and social inclusion. Another example of this commitment would be designing the AI to report when it is unsure and passing those decisions to humans who can be more deliberative, even if less efficient. Humans can “put themselves in the shoes of” decision subjects through empathy, identifying and evaluating mitigating and excusing conditions in a way that algorithms cannot.9

Back to Top


The allure of decision-making efficiency is powerful.17 To an ever-greater extent, a person’s opportunities are circumscribed by AI-driven forecasts of their behavior. Consternation at this prospect is entirely reasonable. If we are serious about aligning AI with values of fairness and social inclusion, we must reorient our thinking about its appropriate use. AI is better suited to finding ways to cultivate and support trust-responsive behavior than to serving as an independent and objective arbiter of trustworthiness.

Humble trust does not imply trusting indiscriminately. Rather, it calls for developers and deployers of AI systems to be responsive to the effects of misplaced distrust and to manifest epistemic humility about the complex causes of human behavior. It further encourages them to look for (and provide opportunities for the future generation of) new signals of trustworthiness, thereby improving their ability to recognize the trustworthy. Finally, it suggests they look beyond the immediate efficiencies of decision making to consider the long-term harms (both to individuals and AI-deploying institutions) of careless classifications.

For a business, misidentifying an individual as untrustworthy might mean losing a potentially profitable customer. Depending on the economic conditions, this may or may not impact an organization’s near-term bottom line. The case is clearer from a moral perspective. Humble trust has an essential role to play in realizing justice and social inclusion. Democratic public institutions, such as the criminal justice system and the social safety net, cannot afford to compromise on such values.

While many applications of AI pose risks of exacerbating unjust distributions of trust, and therefore of opportunity, AI also offers unique mechanisms for resolving this very problem. It is possible to calibrate decisions made by AI systems with tools that are unavailable to the calibration of our own psychologies. Human attitudes of trust and distrust can be altered indirectly, but they are not under a person’s direct voluntary control. This makes it difficult for human decision makers to adjust their personal attitudes of trust and distrust to align with moral aims. AI systems are different in this respect. The degree to which they are “willing to trust” can be directly manipulated by developers. The affirmative measures of the Humble AI approach promise to bring AI into better alignment with our moral aims so we may finally realize the vision of superior decision making through AI.

Back to Top


This work is partially funded by the SUNY-IBM Research Alliance under grant number AI2102, the ESRC funded grant BIAS: Responsible AI for Labour Market Equality (ES/T012382/1), and by the Data Science Institute at Lancaster University.

Back to Top

Back to Top

Back to Top

Back to Top

Back to Top

Back to Top


    1. Alfano, M. Friendship and the Structure of Trust (2016).

    2. Bakker, M.A. et al. Beyond reasonable doubt: Improving fairness in budget-constrained decision making using confidence thresholds. In Proceedings of the AAAI/ACM Conf. on AI, Ethics, and Society (July 2021), 346–356.

    3. Barassi, V. et al. David Graeber, bureaucratic violence, and the critique of surveillance capitalism. Annals of the Fondazione Luigi Einaudi 55 (2021), 237–254.

    4. Bartlett, P.L. and Wegkamp, M.H. Classification with a reject option using a hinge loss. J. of Machine Learning Research 9 (August 2008), 1823–1840.

    5. Bhatt, U. et al. Uncertainty as a form of transparency: Measuring, communicating, and using uncertainty. In Proceedings of the AAAI/ACM Conf. on AI, Ethics, and Society (July 2021), 401–413.

    6. Bu, Y. et al. Fair selective classification via sufficiency. In Proceedings of the Intern. Conf. on Machine Learning (July 2021), 6076–6086.

    7. Cover, T.M. and Joy, A.A. Elements of Information Theory, Wiley (2012).

    8. D'Cruz, J. Humble trust. Philosophical Studies 176, 4 (2019), 933–953.

    9. D'Cruz, J.R. et al. The empathy gap: Why AI can forecast behavior but cannot assess trustworthiness. AAAI 2022 Fall Symp. Series.

    10. Fetchenhauer, D. and Dunning, D. Why so cynical? Asymmetric feedback underlies misguided skepticism regarding the trustworthiness of others. Psychological Science 21, 2 (2010), 189–193.

    11. Guo, C. et al. On calibration of modern neural networks. In Proceedings of the Intern. Conf. on Machine Learning (August 2017), 1321–1330.

    12. How security clearance credit check rules impact many military service members. Consolidated Credit (2018);

    13. Jones, E.E. and Harris, V.A. The attribution of attitudes. J. of Experimental Social Psychology 3, 1 (1967), 1–24.

    14. Jones, K. Trust, distrust, and affective looping. Philosophical Studies 176, 4 (2019), 955–968.

    15. Knowles, B. et al. The many facets of trust in AI: Formalizing the relation between trust and fairness, accountability, and transparency. (2022); arXiv preprint arXiv: 2208.00681.

    16. Kurt, D. The side effects of bad credit. Investopedia (2021);

    17. Marks, P. Algorithmic hiring needs a human face. Communications of the ACM 65, 3 (2022), 17–19.

    18. McGeer, V. Developing trust. Philosophical Explorations 5, 1 (2002), 21–38.

    19. McGeer, V. Trust, hope and empowerment. Australasian J. of Philosophy 86, 2 (2008), 237–254.

    20. Moldovan, T.M. and Abbeel, P.P. Safe exploration in Markov decision processes. In Proceedings of the Intern. Conf. on Machine Learning (June 2012), 1451–1458.

    21. O'Neill, O. A question of trust. Lecture 2: Trust and Terror. The Reith Lectures, BBC (2002).

    22. Pettit, P. The cunning of trust. Philosophy Public Affairs 24, 3 (1995), 202–225.

    23. Pope, D.G. and Sydnor, J.R. Implementing antidiscrimination policies in statistical profiling models. American Economic J. Economic Policy 3, 3 (2011), 206–231.

    24. Public attitudes to data and AI: Tracker survey. Centre for Data Ethics and Innovation. (2022);

    25. Slepian, M.L. and Ames, D.R. Internalized impressions: The link between apparent facial trustworthiness and deceptive behavior is mediated by targets' expectations of how they will be judged. Psychological Science 27, 2 (2016), 282–288.

    26. Speakman, S. et al. Three population covariate shift for mobile phone-based credit-scoring. In Proceedings of the ACM Conf. on Computing and Sustainable Societies. (June 2018), 20.

    27. The public don't trust computer algorithms to make decisions about them, survey finds. BCS, The Chartered Institute for IT (2020);

    28. Varshney, L.R. and Varshney, K.R. Decision making with quantized priors leads to discrimination. In Proceedings of the IEEE 105, 2 (2016), 241–255.

    29. Wei, D. Decision-making under selective labels: Optimal finite-domain policies and beyond. In Proceedings of the Inter. Conf. on Machine Learning (July 2021), 11035–11046.

    30. Wierzbicki, A.P. A mathematical basis for satisficing decision making. Mathematical Modelling 3, 5 (1982), 391–405.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More