This article is about the kind of trust normally demonstrated in face-to-face human interaction, along with approaches to and benefits from having our computer interfaces employ these same manifestations of trustworthiness. As a designer of technology, your morals really can be your only guide. So, assuming you are a good person and want to build technology that does what it promises or represents people who do what they promise, please read on. We argue that interaction rituals among humans, including greetings, small talk, and conventional leave-takings, along with their manifestations in speech and in embodied conversational behaviors, can lead users to judge the technology as more reliable, competent, and knowledgeable—that is, to trust the technology more.
Trust is essential for all kinds of interpersonal interaction; it is the loom on which we weave our social fabric. Trust among humans depends on credibility, believing one another, confidence in one another’s judgment, and believing that other people’s actions fit our own schemata of how to act. We use the interaction rituals of conversation, in part, to demonstrate our trustworthiness and more generally to establish and maintain social relationships in which trust is important. Building rapport and common ground through small talk, intimacy through self-disclosure, credibility through technical jargon, social networks through gossip, and “face” through politeness, are all examples of this phenomenon. These social uses of language are not important only in purely social settings; they are crucial in establishing and maintaining any collaborative relationship or task accomplishment.
Not only can conversation demonstrate trustworthiness, trustworthiness affects how we act and converse with one another. When we do not trust, we neither believe what others say to us nor can we learn from them; we neither engage in financial or emotional transactions nor allow ourselves to disclose personal information and become more intimate. In fact, even at the local level, interactions between two people who do not trust one another are difficult to sustain; they involve less verbal fluency and are filled with pregnant pauses, incoherent sounds, and dropped words as each participant estimates what it’s safe to reveal .
Many of the metaphors we use in American culture to signify trust derive from such ritual conversational behaviors, particularly those concerning the role of the human body in face-to-face interaction, as reflected in such metaphorical expressions as “He looked me right in the eye”; “He looked at her with trusting eyes”; “We stood toe-to-toe”; “We worked shoulder-to-shoulder”; “It was service with a smile”; and, conversely, “He went behind my back.”
These expressions reflect our instinctive belief that it is easiest to gauge the trustworthiness of other people when we engage in interaction rituals firsthand. What else would explain the fact that initial business meetings are still routinely held in person, even though the information content could be handled easily via teleconference or videoconference? Looking people in the eye, shaking their hands, and watching them make presentations seem to be universal prerequisites to establishing working trust in the business world.
Here, we don’t address the issue of why technologies should be trustworthy or how to make them so. Instead we emphasize how to inspire a cognitive state of trust in the user of a technology through trust-inspiring procedures: specifically the external manifestations of trust and signals of trustworthiness. In fact, as each technology that mediates communication across space and time (the telephone, email, fax) has developed over the past 120 years or so, users of the technology have addressed similar issues.
Conversations mediated by various technologies have to overcome the lack of face-to-face data that normally allows participants to gauge credibility. Participants try to find ways to overcome that lack or to exploit it. In the days of face-to-face and door-to-door sales, the prototypical Fuller Brush salesman knew that eye contact and a winning smile would guarantee a foot in the door. Telemarketers soon learned to address victims by their names and engage in small talk (“And how are you today, Mrs. Brown?”). Email marketing today tries to personalize notices with knowledge of the targeted customer’s past behavior (“Since last month when you bought a copper pot from us, lots of new and exciting housewares have come in we’re sure would be of interest to you”).
Recent research on the potential of interaction rituals for increasing trust will no doubt also find its way into the schemes of its misusers. However, our belief is that interaction should rituals acknowledge the social attributions users make, building on them to make technology easy to understand, quick at imparting knowledge, and congruent in how it advertises itself. But we don’t feel these signals of trustworthiness can be used to trick users out of their credit card numbers with any more success than a stranger on the street would have. Rather, they signal that the technology is a competent and cooperative interactant with respect to the user’s sociolinguistic background.
We want users to believe the information the technology provides and reciprocate by providing information the technology requires. How do we allay mistrust in order to allow the process of an interaction to succeed? In the domain of interpersonal trust, a useful distinction can be made between a cognitive state of trust and trusting behaviors . The former involves three competencies:
- Perceiving other people as knowing the nature of the type of interaction about to transpire;
- Representing their own self-interests in the interaction; and
- Being responsible enough to try to ensure the interaction does not result in negative consequences for the trusting person.
Trusting behaviors involve making oneself vulnerable to other people in any one of a number of ways. The goal of technology designers is to inspire a cognitive state of trust in users such that they will engage in trusting behaviors, thus allowing the human-computer interaction to proceed smoothly.
Embodied Conversational Agents
For the past several years, we have been developing embodied conversational agents (ECAs) at the Gesture and Narrative Language Group at the MIT Media Lab. These graphical computer characters are able to engage in face-to-face dialogues with users through nonverbal modalities, including gesture, gaze, intonation, and posture, as well as speech. We have designed and built several such related multimodal systems capable of sensing a user’s speech, gesture, body posture, and intonation; they respond by animating a computer character with behavior based on studies of human conversation.
We have learned to employ certain important affordances of human bodies to signal conversational process, indicating where in an interaction one is, whether the computer has understood the user’s input, and how embodied interfaces facilitate certain kinds of human-computer interaction . Recently, we began modeling some of the social cues people use to signal trust in face-to-face encounters and conducting experiments into the ability of these interfaces to engage users’ trust. Embodied interfaces that display these social cues and engage in the interactional rituals that display them may elicit that trust. The result is that the interaction can manifest all of the smoothness, lack of hesitancy, and increased self-disclosure accompaning trustful encounters among humans.
One way to view the cognitive state of trust is that it is a composite of benevolence (belief in the intentional good will of another person) and credibility (disbelief in the unintentional ill will of another person) . ECAs demonstrate benevolence by engaging in such interaction rituals as greetings and small talk, relating past experiences of benevolent behavior, and referring to third-party affiliations. Such interaction rituals also fit into the uncertainty-reduction model of trust, whereby individuals incrementally reinforce their assumptions about their partners’ dependability with actual evidence from their partners’ behavior . The natural progression of a conversation between strangers—from greetings, through small talk, and into more substantive topics—can be viewed as a process in which they iteratively “test the water” to determine whether or not they want to continue deepening the relationship. Thus, an ECA can provide a natural transition into a trust relationship, especially for Web sites or software products users have not seen before.
ECAs can also project credibility through verbal and nonverbal modalities, presenting themselves as competent, fluid speakers, and through appearance projecting expertise, professional affiliation, or attractiveness.
We are currently investigating the use of interaction rituals to build trust in ECAs specially designed to carry out real estate sales encounters. With an ECA named Rea (Real Estate Agent), interaction rituals help the agent achieve its goals by “greasing the wheels” of task talk. Interaction rituals serve a transitional function, providing a ritualized way for prospective home buyers to move into conversation in what may otherwise be an awkward or confusing situation [2, 11].
Small talk, in particular, can also serve an exploratory function by providing a conventional mechanism for users to establish the abilities and credentials of the agent (and vice versa) (see Figure 1). Small talk builds solidarity with users when agents engage in a ritual of showing agreement with and appreciation for the user’s utterances . Finally, an ECA can use a kind of small talk called “conversational storytelling” to establish its expertise, relate stories of past problem-solving behavior, and obtain information about the user that can be used indirectly to help achieve task goals; for example, finding out that a user drives a minivan increases the probability this person has children.
Rea plans and carries out small talk to achieve four main goals:
Build solidarity. Before taking on face-threatening (sensitive) task topics, Rea continually assesses her solidarity with the user (modeled as a numeric quantity between 0 and 1). Each conversational topic has a predefined, prerequisite solidarity that must be achieved before Rea can introduce the topic. Given this assessment, the system can plan to perform small talk in order to prepare for task talk, especially about sensitive topics, like the user’s income and other financial information.
Transition. When moving from one phase of the interaction to the next and toward more important topics, Rea keeps track of the current and past conversational topics. Conversational bids that stay within a topic (maintain topic coherence) are given preference over those that do not. In addition, Rea can plan to execute a sequence of conversational turns that gradually transition the topic from its current state to one she wants to talk about; for example, talk about the weather can move on to talk about Boston weather and to talk about Boston real estate.
Establish her expertise and her limits. Rea begins with some self-disclosing statements indicating some limiting features of her technology. During the interaction, she mentions other clients who resemble the user in some way and states that she has helped them accomplish their goals.
Semiautonomous ECAs can actually decrease deception by ensuring all communicative modalities deliver a consistent message—an important consideration in designing a system to be trusted.
Acquire information about the user. Rea incorporates a list of priorities to find out about the user’s housing needs in the initial interview. Conversational turns that work directly toward satisfying these goals (such as asking interview questions) are preferred.
We are currently conducting an experiment to assess the efficacy of this kind of small talk in real estate interviews, as well as its effect on the user’s perception of and trust in the agent. Subjects are interviewed by the agent about their housing needs, shown two “virtual” apartments, then asked to fill out a questionnaire covering how much rent they would be willing to pay for one of the apartments, attitudinal measures of perceived likability, competence, and intelligence of the agent, and a standard measure of trust. Results indicate that many users find the agent more competent, reliable, and knowledgeable when it uses small talk than when it engages only in task talk.
An important aspect of building trust is the ECA’s role in encouraging “self-disclosure,” or talking about the self. Along these lines, we are investigating their use to prompt for and listen to the user’s personal narratives. In the Sam the Castlemate project, an ECA encourages young children to engage in storytelling . In the GrandChair project, an ECA who appears to be a young child in a rocking chair listens to grandparents’ family stories  (see Figure 2); the stories are videotaped so they can be watched by future generations.
In a preliminary experiment, we found that grandparents talk significantly longer in the presence of an ECA than when using only a video camera and cue cards. We hypothesize that this difference is at least partly due to the ECA’s self-disclosure prompts (shown by other researchers to induce self-disclosure to a computer) and partly due to the benevolent child’s image and voice, which increase the user’s trust in the system . Trust is crucial, since users are being asked to disclose intimate information about themselves to a computer.
Mediating Human-Interpersonal Interactions
In instances where the embodied interface is representing the knowledge and expertise of a computational system, the interface should make the system look trustworthy. But what about when the technology is mediating an interaction between two users, as in a videoconference or in online chat?
Here, users attempt to ascertain whether they can trust the representation of the other human. In this domain, we have developed semiautonomous ECA avatars to represent users in interactions with other users in graphical chat systems. Users control the content of what their avatars say and some aspects of their avatars’ movements (such as walking), while much of the nonverbal conversational behavior displayed by the avatars is generated automatically based on the conversational context. For example, if a user indicates that he or she wants to talk to another user in the chat system, his or her avatar automatically produces the appropriate eye gaze, facial expression, and gestural behaviors required to signal that it wants to engage the other user in a conversation.
This kind of system gives users a higher bandwidth (more modalities) to use social cues to signal their intent to trust and be trusted while still allowing them to maintain anonymity if they so desire (see Figure 3). Moreover, semiautonomous ECAs can actually decrease deception by ensuring all communicative modalities deliver a consistent message—an important consideration in designing a system to be trusted . In contrast, current graphical avatars can be controlled so their verbal and nonverbal behavior is completely independent (for example, someone’s avatar can be smiling while flaming at you). Evaluation of our system has shown that users feel better understood and better understand other users. Both they and their conversational partners are more expressive when their avatars autonomously generate interaction ritual behavior (greetings, turntaking, leave-takings) than when directly manipulating the behaviors of their avatars .
Some degree of trust is required to engage in cooperative behavior. Conversation, in particular, requires cooperation and mutual trust to function smoothly. Participants want to trust that their partners are being truthful and are not withholding important information or conveying only superfluous or redundant information. They also want to trust that their partners will not blatantly insult or infringe on their freedom . In turn, trust may be established using the same myriad social cues people use in face-to-face conversation, including interaction rituals, such as small talk, to incrementally build evidence of the ECA’s good will and credibility.
We have found that ECAs able to engage in “phatic,” or relationship-oriented, behaviors challenge our notions of technology as tool and push us further than we expected into the metaphor of computer as conversational partner. They also represent deep technical challenges as we attempt to model the goals small talk achieves in order to plan interactions with users.
When beginning our work several years ago, we were hesitant about claiming that our ECAs actually engaged in small talk and other kinds of social interaction with users. Weren’t we pulling some kind of trick on users by having the system act so much like a human, down to the interaction cues signaling a relationship bond? In fact, watching children and adults interact with the system, we became convinced that nobody was fooled; nobody would leave thinking this was a new living species or a new kind of human. On the other hand, in a myriad of subtle ways, users felt heard and as if the technology was adapting to them, rather than the other way around. More recently, we’ve discovered that, unlike limited talk-like behaviors in other systems, our approach to small talk is actually responsive to user state and to the overarching goals of the interaction. We’ve also found it to be a prime way for the system to monitor users’ progress in their chosen tasks while inducing them to speak easily without “disfluencies,” or embarrassed pauses. All good natural human reasons for users to be more trusting.