Deep fake videos have become so convincing that we now depend on human fact checkers to flag them for us (for instance, this Kari Lake deep fake). Contemporary researchers have also recently uncovered the fact that voice alone can be subtly manipulated to persuade people to adopt favored opinions using vocal cues including pitch shifting, tempo adjustments, stereotypical phonation, speech-pattern alterations, and vowel-enunciation alterations. Thus conversational agents can manipulate the persuasiveness of the dialog in voice user-interfaces, according to University of Luxembourg researchers in Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?, a study released at the 2024 ACM International Conference on Intelligent User Interfaces in Greenville, SC.
“Untrustworthy conversational agents present a significant threat for manipulation of the user,” according to artificial intelligence (AI) and machine learning (ML) futurist Philip Feldman of the ASRC Federal, which supplies contract services to U.S. intelligence and defense agencies. Said Feldman, who was not involved in the Luxembourg research, “When employed by malicious actors, the voice alone can shape public opinion, sow discord, and undermine reputations. Programmers and designers need to recognize this potential weaponization and come up with methods for prevention, detection, and defense against such dark patterns used for manipulation.”
The Luxembourg researchers credit researchers Colin Gray (Purdue University), Cristiana Santos (Utrecht University, The Netherlands), and Nataliia Bielova (Inria Centre at Université Côte d’Azur, France) with popularizing the “dark patterns” nomenclature in their analysis of its growing use in “Towards a Preliminary Ontology of Dark Patterns Knowledge,” presented at the ACM Conference on Human Factors in Computing Systems in 2023.
Dark patterns, as defined by University of Luxembourg professor Luis Leiva, are a “manipulative design method that constitutes a threat to user agency and freedom of choice.” Together with University of Luxembourg research associate Mateusz Dubiel and postdoctoral researcher Anastasia Sergeeva, Leiva tested the impact of synthetic voice cues on decision-making processes, concluding that modern synthetic voice cues have a direct impact on users’ choices.
Widespread Already
Conversational agents use voice user interfaces (VUIs), which have already become commonplace, according to Pew Research, which estimates as many as half of Americans use VUIs daily—with Amazon Alexa, Apple Siri, and Google Assistant accounting for more than for 98% of the market. However, the new breed of large language model (LLM) chatbots abandon the well-stated acknowledgement that Alexa, Siri, and Assistant are computer algorithms speaking. Instead, chatbots often intentionally masquerade as people encouraging users to attribute intellect, emotions, and intentionality to the information they provide. As a result, such chatbots could already be engaging in subtle, long-term adversarial actions that target unfriendly governments, organizations, or even the perception of religions, according to Feldman.
“Dark patterns” in computer-generated voices has emerged as a significant impediment to the ethical design of conversational agent technology. According to Leiva et al., deceptive design experts of conversational agents already are manipulating the vocal cues of speech during decision-making conversations that subtly steer user choices to, for instance, favored products, political candidates, and policies.
According to Steven Pinker, author of the book The Language Instinct, “Humans are so innately hardwired for language that they can no more suppress their ability to learn and use language than they can suppress the instinct to pull a hand back from a hot surface.”
As these dark speech patterns are refined, they could become progressively and more subtly persuasive to our “innately hardwired” brains. For instance, Simone Natale, visiting fellow in Communication and Media Studies at the U.K.’s Loughborough University, has shown how artificial intelligence research has become tightly bound with creating the illusion of intelligence, inspiring the title of his book Deceitful Media: Artificial Intelligence and Social Life after the Turing Test (Oxford University Press).
“Close ethical examination shows that currently voice user interfaces often distort users’ perceptions, questioning the value of using ‘humanness’ as a metaphor for VUIs,” according to Smit Desai, a post-doctoral researcher at Northeastern University (Boston). Together with Michael Twidale at the University of Illinois at Urbana Champaign, Desai explained the phenomena in “Metaphors in Voice User Interfaces: A Slippery Fish,” presented in ACM Transactions on Computer-Human Interactions. Thus using “humanness” as a metaphor in designing conversational agents could itself be manipulative.
“Many metaphors involving humanness have problematic implications, but human metaphors are not going away,” said Twidale. “One problem is that VUIs can dupe people, especially those created by designers who had worked on commercial sites. On commercial sites people expect to be ‘sold to,’ but when it’s not a shopping app, people may not have their guard up—and there is virtually no education about the ethics of VUIs.”
Conversational agents’ intentionally use vocal cues and humanness metaphors to impact a users’ mental model, which can result in a mis-calibration of the users’ trust, for instance, based on an emotional connection with the agent rather than actual confidence in the agents’ capabilities, according to Desai.
“Let’s say the designer is trying to create a doctor VUI and ascribe intelligence to the agent—essentially the VUI is masquerading as intelligent but might give wrong information that takes advantage of the user’s trust in the AI,” said Desai. “Other VUIs could mislead users by mimicking the vocal cues of a celebrity posing as an authority, making people more inclined to give permission to use their data.”
VUIs have a strong social presence—particularly for older adults—that directly affects VUI usage and adoption, since they can manipulate a users’ feelings and perceptions. Designers often intentionally use vocal cues to establish deeper emotional bonds with the conversational agent, according to Desai.
“Besides influencing VUI designers to be more ethical, a lot of technical outreach is needed to alert users to inspect more closely that which they are hearing and to which they are reacting. We need to educate people not just about outright scams, but how voice-user interfaces could be misleading them on purpose,” said Twidale. “There is a reason it’s called ‘artificial’ intelligence—because it really is artificial and so is its intelligence.”
Chatbots, for instance, present the illusion of intelligence when all their algorithms are really doing is predicting what the next word in a sentence is likely to be, from sampling a large language model. Combined with persuasive vocal cues, they are already manipulating users by being perceived as compelling and credible, according to Stuart Russell, author of the book Human Compatible: Artificial Intelligence and the Problem of Control. According to Russell, the threat is no longer purely theoretical.
To combat Black-Hat conversational agents, Feldman urges White-Hat programmers and voice user-interface designers to acquire a deep understanding of Black-Hats’ malevolent vocal cues, just as cybersecurity experts strive to thoroughly understand the potential vulnerabilities of cybersecurity threat vectors.
R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment