Opinion
Artificial Intelligence and Machine Learning

AI Should Challenge, Not Obey

Let’s transform our robot secretaries into Socratic gadflies.

Posted

“How should we evaluate the legacy of Thomas Jefferson?,” asks a professor of American history.

The reply: “The general consensus on Thomas Jefferson is that he was a complex and contradictory figure who championed the ideals of democracy, tolerance, and independence, but also owned hundreds of slaves and fathered several children with one of them.”

The professor teaches a course challenging the “great white men” narrative of American history, positing that it is also women and people of color who drive history forward, and that the canonized great men of America are seldom unambiguously so. It aims to instill in students the rare and nebulous skill of critical thinking.

The reply comes not from a student, but from the Bing AI chatbot.

How do we evaluate a claim like this? Such claims cannot be reduced to “correct” and “incorrect”; concepts such as “error” and “hallucination” break down when complex qualitative judgments are involved. Historians are trained27 to ask questions such as: “Who constructed this account and why? What sources did they use? What other accounts are there of the same events or lives? How and why do they differ? Which should we believe?”

But what if the user was not a professor, but an inquisitive reader without training in historical thinking? Now more than ever before, users face the task of thinking critically about AI output. Recent studies show a fundamental change across knowledge work, spanning activities as diverse as communication, creative writing, visual art, and programming. Instead of producing material, such as text or code, people focus on “critical integration.”24 AI handles the material production, while humans integrate and curate that material. Critical integration involves deciding when and how to use AI, properly framing the task, and assessing the output for accuracy and usefulness. It involves editorial decisions that demand creativity, expertise, intent, and critical thinking.

However, our approach to building and using AI tools envisions AI as an assistant, whose job is to progress the task in the direction set by the user. This vision pervades AI interaction metaphors, such as Cypher’s Watch What I Do and Lieberman’s Your Wish Is My Command. Science fiction tropes subvert this vision in the form of robot uprisings, or AI that begins to feel emotions, or develops goals and desires of its own. While entertaining, they unfortunately pigeonhole alternatives to the AI assistance paradigm in the public imagination: AI is either a compliant servant or a rebellious threat, either a cool and unsympathetic intellect or a pitiable and tragic romantic.

AI as Provocateur

In between the two extreme visions of AI as a servant and AI as a sentient fighter-lover, resides an important and practical alternative: AI as a provocateur.

A provocateur does not complete your report. It does not draft your email. It does not write your code. It does not generate slides. Rather, it critiques your work. Where are your arguments thin? What are your assumptions and biases? What are the alternative perspectives? Is what you are doing worth doing in the first place? Rather than optimize speed and efficiency, a provocateur engages in discussions, offers counterarguments, and asks questions4 to stimulate our thinking.

The idea of AI as provocateur complements, yet challenges, current frameworks of “human-AI collaboration” (notwithstanding objections to the term23), which situate AI within knowledge workflows. Human-AI collaborations can be categorized by how often the human (versus the AI) initiates an action,19 or whether human or AI takes on a supervisory role.16 AI can play roles such as “coordinator,” “creator,” “perfectionist,” “doer,”28 “friend,” “collaborator,” “student,” “manager.”7 Researchers have called for metacognitive support in AI tools,32 and to “educate people to be critical information seekers and users.”26 Yet the role of AI as provocateur, which improves the critical thinking of the human in the loop, has not been explicitly identified.

The “collaboration” metaphor easily accommodates the role of provocateur; challenging collaborators and presenting alternative perspectives are features of successful collaborations. How else might AI help? Edward De Bono’s influential Six Thinking Hats12 framework distinguishes roles for critical thinking conversations, such as information gathering (white hat), evaluation and caution (black hat), and so forth. “Black hat” conversational agents, for example, lead to higher-quality ideas in design thinking.3 Even within the remit of “provocateur,” there are many possibilities not well distinguished by existing theories of human-AI collaboration.

A constant barrage of criticism would frustrate users. This presents a design challenge, and a reason to look beyond the predominant interaction metaphor of “chat.” The AI provocateur is not primarily a tool of work, but a tool of thought. As Iverson notes, notations function as tools of thought by compressing complex ideas and offloading cognitive burdens.10 Earlier generations of knowledge tools, like maps, grids, writing, lists, place value numerals, and algebraic notation, each amplified how we naturally perceive and process information.

How should we build AI as provocateur, with interfaces less like chat and more like notations? For nearly a century, educators have been preoccupied with a strikingly similar question: How do we teach critical thinking?

Teaching Critical Thinking

The definition of “critical thinking” is debated. An influential perspective comes from Bloom and colleagues,2 who identify a hierarchy of critical thinking objectives such as knowledge recall, analysis (sorting and connecting ideas), synthesis (creating new ideas from existing ones), and evaluation (judging ideas using criteria). There is much previous research on developing critical thinking in education, including in computing, as exemplified in How to Design Programs,6 and in Learner-Centered Design for Computing Education.8

Critical thinking tools empower individuals to assess arguments, deriving from a long preoccupation in Western philosophy with valid forms of argument that can be traced to Aristotle. Salomon’s work in computer-assisted learning showed that periodically posing critical questions such as “what kind of image have I created from the text?” provided lasting improvement in students’ reading comprehension.22

The Toulmin model decomposes arguments into parts such as data, warrants, backing, qualifiers, claims, and their relationships.13 Software implementations of this model help students construct more argumentative essays.18 Similarly, “argument mapping” arranges claims, objections, and evidence in a hierarchy that aids in evaluating the strengths and weaknesses of an argument,5 and software implementations help learners.31

What can we learn from these? In a nutshell: Critical thinking is a valuable skill for everyone. Appropriate software can improve critical thinking. And their implementations can be remarkably simple.

Critical Thinking for Knowledge Work

Critical thinking tools are rarely integrated into software outside education. There is a lot to learn from work in education, but professional knowledge work is a new set of contexts where critical thinking support is becoming necessary.24 Previous results may not translate into these contexts. The needs, motivations, resources, experiences, and constraints of professional knowledge workers are extremely diverse, and significantly different from those of learners in an education setting.

We do know that conflict in discussions, sparked by technology, fosters critical thinking.15 Tools for preventing misinformation, such as Carl Sagan’s “Baloney Detection Kit,” can significantly impact user beliefs.9 When individuals are less inclined to engage in strenuous reasoning, they let technology take over cognitive tasks passively.1 Conversely, the more interactive the technology, the more it is perceived to contribute to critical thinking.21

System designers have a tremendous opportunity (and responsibility) to support critical thinking through technology. Word processors could help users map arguments, highlight key claims, and link evidence. Spreadsheets could guide users to make explicit the reasoning, assumptions, and limitations behind formulas and projections. Design tools could incorporate interactive dialogue to spark creative friction, generate alternatives, and critique ideas. Critical thinking embedded within knowledge work tools would elevate technology from a passive cognitive crutch into an active facilitator of thought.

How would we achieve this, technically? We have parts of the solution: automatic test generation, fuzzing and red-teaming,33 self-repair,20 and formal verification methods11 can be integrated into the development and interaction loop to improve correctness. Language models can be designed to cite verifiable source text.17 Beyond “correctness,” these techniques could also support critical thinking. A system error, if surfaced appropriately as a “cognitive glitch,”26 could prompt reflection, evaluation, and learning in the user.

However, there are missing pieces, such as rigorous prompt engineering for generating critiques, and benchmark tasks for evaluating provocateur agents. Methods for explaining language model behavior to non-expert end users have not been proven reliable.34 Design questions include what kind of provocations, how many, and how often to show in particular contexts. These mirror longstanding questions in AI explanation,14 but as provocations are different, so the answers are likely to be.

Critical thinking is well-defined within certain disciplines, such as history,27 nursing,29 and psychology,30 where these skills are taught formally. However, many professional tasks involving critical thinking, such as using spreadsheets, preparing presentations, and drafting corporate communications, have no such standards or definitions. To create effective AI provocateurs, we need to better understand how critical thinking is applied in these tasks. Clearly, the provocateur’s behavior should adapt to the context; this could be achieved through heuristics, prompt engineering, and fine-tuning.

Conclusion

“How should we evaluate the legacy of Thomas Jefferson?”

Consider what someone who asks such a question seeks. Is it “assistance,” or a different kind of experience?

Could the system, acting as provocateur, have accompanied its response with a set of appropriate questions to help the reader evaluate it? Beyond citing its sources, could it help the reader evaluate the relative authority of those sources? Could it have responded not with prose, but with an argument map contrasting the evidence for and against its claims? Could it highlight the reader’s own positionality and biases with respect to the emotionally charged concepts of nationalism and slavery?

As people increasingly incorporate AI output into their work, explicit critical thinking becomes important not just for formal academic disciplines, but for all knowledge work. We thus need to broaden the notion of AI as assistant, toward AI as provocateur. From tools for efficiency, toward tools for thought. As system builders, we have the opportunity to harness the potential of AI while maintaining, even enhancing, our capacity for nuanced and informed thought.

    References

    • 1. Barr, N. et al. The brain in your pocket: Evidence that smartphones are used to supplant thinking. Computers in Human Behavior 48, (2015), 473480.
    • 2. Bloom, B.S. et al. Taxonomy of Educational Objectives: The Classification of Educational Goals: Handbook 1: Cognitive Domain. D. Mckay, New York, (1956).
    • 3. Cvetkovic, I. et al. Conversational agent as a black hat: Can criticizing improve idea generation? (2023).
    • 4. Danry, V. et al. Don’t just tell me, ask me: AI systems that intelligently frame explanations as questions improve human logical discernment accuracy over causal AI explanations. In Proceedings of the 2023 CHI Conf. on Human Factors in Computing Systems  (2023), 113.
    • 5. Davies, M. Concept mapping, mind mapping and argument mapping: What are the differences and do they matter? Higher Education 62, (2011), 279301.
    • 6. Felleisen, M. et al. How to Design Programs: An Introduction to Programming and Computing. MIT Press (2018).
    • 7. Guzdial, M. et al. Friend, collaborator, student, manager: How design of an AI-driven game level editor affects creators. In Proceedings of the 2019 CHI Conf. on Human Factors in Computing Systems  (2019), 113.
    • 8. Guzdial, M. Learner-Centered Design of Computing Education: Research on Computing for Everyone. Morgan & Claypool Publishers (2015).
    • 9. Holzer, A. et al. Towards mobile blended interaction fostering critical thinking. In Proceedings of the 17th Intern. Conf. on Human-Computer Interaction with Mobile Devices and Services Adjunct (New York, NY, USA, 2015), 735742.
    • 10. Iverson, K.E. Notation as a tool of thought. ACM Turing Award Lectures. (1979).
    • 11. Jha, S. et al. Dehallucinating large language models using formal methods guided iterative prompting. 2023 IEEE Intern. Conf. on Assured Autonomy (ICAA) (2023), 149152.
    • 12. Kivunja, C. et al. Using De Bono’s six thinking hats model to teach critical thinking and problem solving skills essential for success in the 21st century economy. Creative Education 6, 3 (2015), 380.
    • 13. Kneupper, C.W. Teaching argument: An introduction to the Toulmin model. College Composition and Communication 29, 3 (1978), 237241.
    • 14. Kulesza, T. et al. Too much, too little, or just right? Ways explanations impact end users’ mental models. In Proceedings of the 2013 IEEE Symp. on Visual Languages and Human-Centric Computing (2013), 310.
    • 15. Lee, S. et al. Fostering youth’s critical thinking competency about AI through exhibition. In Proceedings of the 2023 CHI Conf. on Human Factors in Computing Systems (Hamburg, Germany, 2023).
    • 16. McNeese, N.J. et al. Who/what is my teammate? Team composition considerations in human–AI teaming. IEEE Transactions on Human-Machine Systems 51, 4 (2021), 288299.
    • 17. Menick, J. et al. Teaching language models to support answers with verified quotes. arXiv preprint arXiv:2203.11147. (2022).
    • 18. Mochizuki, T. et al. Development of software to support argumentative reading and writing by means of creating a graphic organizer from an electronic text. Educational Technology Research and Development 67, (2019), 11971230.
    • 19. Muller, M. and Weisz, J. Extending a human-AI collaboration framework with dynamism and sociality. In Proceeding of the 2022 Symp. on Human-Computer Interaction for Work  (2022), 112.
    • 20. Pan, L. et al. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. arXiv preprint arXiv:2308.03188. (2023).
    • 21. Saadé, R.G. et al. Critical thinking in e-learning environments. Computers in Human Behavior 28, 5 (2012), 16081617.
    • 22. Salomon, G. AI in reverse: Computer tools that turn cognitive. J. of Educational Computing Research 4, 2 (1988), 123139.
    • 23. Sarkar, A. Enough with “human-AI collaboration.” In Extended Abstracts of the 2023 CHI Conf. on Human Factors in Computing Systems (New York, NY, USA, 2023) .
    • 24. Sarkar, A. Exploring perspectives on the impact of artificial intelligence on the creativity of knowledge work: Beyond mechanized plagiarism and stochastic parrots. Annual Symp. on Human-Computer Interaction for Work 2023 (CHIWORK 2023) (Oldenburg, Germany, 2023), 17.
    • 25. Sarkar, A. Should computers be easy to use? Questioning the doctrine of simplicity in user interface design. In Extended Abstracts of the 2023 CHI Conf. on Human Factors in Computing Systems (2023), 110.
    • 26. Seeber, I. et al. Machines as teammates: A research agenda on AI in team collaboration. Information & Management 57, 2 (2020).
    • 27. Seixas, P. and Peck, C. Teaching historical thinking. Challenges and Prospects for Canadian Social Studies. (2004), 109117.
    • 28. Siemon, D. Elaborating team roles for artificial intelligence-based teammates in human-AI collaboration. Group Decision and Negotiation 31, 5 (2022), 871912.
    • 29. Simpson, E. and Courtney, M. Critical thinking in nursing education: Literature review. Intern. J. of Nursing Practice 8, 2 (2002), 8998.
    • 30. Sternberg, R.J. and Halpern, D.F. Critical Thinking in Psychology. Cambridge University Press (2020).
    • 31. Sun, N. et al. Critical thinking in collaboration: Talk less, perceive more. In Proceedings of the 2017 CHI Conf. Extended Abstracts on Human Factors in Computing Systems (Denver, CO, USA, 2017) .
    • 32. Tankelevitch, L. et al. The metacognitive demands and opportunities of generative AI. (2023).
    • 33. Yu, J. et al. GPTFuzzer: Red teaming large language models with auto-generated jailbreak prompts. arXiv preprint arXiv:2309.10253. (2023).
    • 34. Zhao, H. et al. Explainability for large language models: A survey. arXiv preprint arXiv:2309.01029. (2023).

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More