News
Artificial Intelligence and Machine Learning

Accepting AI’s Suggestions Is Harder When You Think

If AI is going to work as a brain extension, humans may need to do more thinking.

Posted
workflow with AI

The computer scientists of the 1960s had high hopes for the way humans and intelligent machines would work with each other. In separate essays that date back more than half a century, Joseph Licklider and Douglas Engelbart described situations where the humans and artificial intelligence (AI) would exchange ideas in insightful dialogues of the kind followed by ancient Greek philosophers. Some of that optimism has not gone away. But today it is tempered not just with the reality of what AI can do but a growing understanding of how humans themselves want to think.

Last year, a team led by Giuseppe Riva, professor of general and communications psychology at the Catholic University of Milan, described their hopes for how AI could become an extension of human cognition. “Throughout history, from writing and mathematics to calculators and computers, humans have developed tools that extend our cognitive capabilities beyond biological constraints,” Riva said.

AI takes the concept of augmentation further, Riva added, thanks to its ability to handle enormous quantities of data and, in principle, identify the patterns that will help humans the most.

Many examples of this kind of process now exist, from the summarization that chatbots perform on long documents and meeting notes through to scientific research. At Nvidia’s GTC conference in March, Patrick Hsu, assistant professor of bioengineering at the University of California at Berkeley, described how the Virtual Cell Atlas published by the Arc Institute relies on AI-based agents to build and curate a dataset of experiments on the reactions of some 300 million cells to different changes.

The automated tools processed the data from many experiments through a standard pipeline to try to deliver better consistency and insights into the information provided by the experiments carried out in a vast range of labs. “It’s the kind of thing a human would never do. But these agents can just crank on the tasks 24/7,” Hsu says.

A danger lies in how AI models could nudge humans to making poor decisions by disguising or hiding data that might change how users interpret it. A decade ago, University of Cambridge professor of machine learning Neil Lawrence described the potential abuse of these effects as System Zero.

Riva argues his team’s work takes a different view of what could develop with their concept, which they have termed System 0. “Lawrence’s framework positions AI systems as intrusive mediators that operate beneath our awareness, potentially limiting human agency and narrowing our worldview,” he says.

Both concepts take their naming from the model of human thinking developed by late psychologist Daniel Kahneman. System 2 represents the human ability to reason about facts and come to a conclusion. System 1 represents more instinctive, heuristic reactions. Lawrence saw that more basic, “chimp-like” part of human cognition as being the target for attack by System Zero. System 0 would, ideally, interact more with System 2, not least as a way of handling the biases that naturally emerge in many machine-learning systems. But without careful design, a concept like System 0 could still wind up delivering many of the unwanted effects of System Zero.

Even with technology designed to avoid biases and deception, the sum of human and machine can easily turn out to be less than the parts. Performance can worsen when the two combine mostly because humans often will not override the AI’s suggestion even if it is incorrect.

Some researchers point to the illusion of certainty that AI user interfaces can project as a part of the problem. The software will often present a diagnosis as a clear outcome with no indication of confidence in the result, partly because the model itself does not have a way to calculate this. There is also a split in how humans react to the systems. Users with high experience in a field can also wind up distrusting AI outputs too much, as shown in work led by Susanne Gaube, assistant professor human factors at the U.K.’s University College London.

Intuitively, making the AI explain itself should lead to better results. This should make flaws in its analysis obvious to the user. Experiments have shown subtle differences in human behavior, based on how the AI presents its explanations. But, overall, presenting explanations does not work nearly as well as expected. The presence of explanations can even make the situation worse. A study presented at the 2021 Conference on Human Factors in Computing Systems (CHI) by researchers from the University of Washington and Microsoft found adding explanations could increase the chances of a human accepting an answer, correct or not.

“Explanations from AI systems often create an illusion of understanding rather than triggering genuine critical thinking. Users tend to process these explanations through System 1 rather than engaging System 2,” Riva said.

The cognition gap seen in using AI correlates with a growing body of research in psychology and is a major theme of the 2019 book The Enigma of Reason, written by French researchers Hugo Mercier and Dan Sperber. They pointed to experiments that show how, for many problems where humans are expected to use System 2 processes they will, where possible, rely on the heuristics of System 1.

Research continues to show this effect. Last year, Louise David and colleagues from Radboud University analyzed more than 100 studies based on a method developed by NASA to look at how people approach tasks that need concentration. They found across a large range of tasks and participants, mental effort is something they prefer to avoid.

To see if there are other ways to encourage human users to challenge the machine, Harvard University Ph.D. researcher Zana Buçinca and colleagues recruited 200 people on the Mechanical Turk crowdworking platform several years ago to see how they dealt with techniques the researchers call cognitive forcing functions. These are simple mechanisms intended to trigger the analysis that explanation seemingly does not. The forcing functions tend to take the form of obstructions that slow down the human user. That obstruction takes the form of a checklist. Or is might simply be a timer to prevent the user entering a response too quickly.

The interventions improved performance compared to explanations when the simulation of an AI provided deliberately wrong suggestions. But the increased friction leads to a tradeoff in usability. According to Buçinca, participants disliked the automated interventions.

“In high-stakes domains like healthcare or legal judgments, professionals may be more willing to engage with friction-inducing mechanisms,” Riva says. “However, in consumer applications or routine decision-making, users typically abandon systems that impose cognitive demands, no matter how beneficial those demands might be for decision quality.”

Acceptability in high-stakes domains might also be helped by the types of user who work in them. In the Harvard work, those who benefited the most from the forcing functions were those with a greater tendency to engage reasoning skills in the first place and who seem to show “a need for cognition,” in Buçinca’s words.

A larger experiment that tried to gauge users’ skepticism of ChatGPT’s answers found a similar effect. Researchers working at Beijing Normal University recruited 1,000 students to look at how they interacted with the chatbot. The willingness to engage in cognitive thought overall was more powerful than the desire to work out whether the AI’s claims on a particular subject were accurate.

But when looking at how individual users behave, how much they rely on the AI can depend heavily on the task at hand. For a paper to be presented at CHI 2025 at the end of April, Carnegie Mellon University Ph.D. student Hao-Ping Lee worked with a team at Microsoft Research in the U.K. to gauge the different ways users think about what the AI does for them. The experiment asked users to gauge their own thought processes when using AI for different purposes and under changing time pressures.

Pressures of time and lack of knowledge about the target subject both led to greater reliance on the AI, as did the users’ opinion of how important the task was to their overall job. A more subtle effect they found was a change in the way users deal with their tasks when engaging AI.

This shift in focus has shown up in the use of generative-AI programming assistants. With traditional coding, the emphasis is very much on choosing an algorithm that addresses the problem and writing code that performs it efficiently. In the “vibe coding” style that has apparently developed around greater use of programming assistants, the focus moves to finding and fixing mistakes in the AI’s output. In some ways, AI users become more like managers.

What may work is to make the interventions adapt to the user. In one example, the Harvard group built a system that asked probe questions to gauge how much reliance a user will put on the AI and intervene more. This could mean revealing information progressively to engage a chain of thought in the user.

Another path that Riva sees as possibly being more effective is to design user interfaces that show how confident the AI is in its own analysis or where it cannot present data. But the user interfaces could be tough to design. Ideally, they would reflect the social cues that humans use with each other to convey doubt or uncertainty. “This doesn’t necessarily mean becoming more ‘human-like’ in a superficial sense, but rather developing communication patterns that signal reliability. The goal should be to create interfaces that leverage our existing cognitive capacities for trust calibration while accounting for the unique capabilities and limitations of AI systems,” Riva says.

If the use of generative AI continues along the path it has followed over the past few years, the need to design those effective interfaces will become increasingly urgent for the technology to be more of a help in the vein of Licklider’s vision than a hindrance to human thought.

Chris Edwards is a Surrey, U.K.-based writer who reports on electronics, IT, and synthetic biology.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More