Computing Applications Attentive user interfaces

Interacting with Groups of Computers

AUIs recognize human attention in order to respect and react to how users distribute their attention in technology-laden environments.

By Jeffrey S. Shell, Ted Selker, and Roel Vertegaal

Posted Mar 1 2003

Introduction
Goals of AUIs
Previous Work
Prototypes that Sense Attention
Negotiating User Attention
Communicating Device Attention
Discussion and Outlook
Conclusion
References
Authors
Footnotes
Figures
Sidebar: Students "Sense" AUI Solutions

As we evolve new relationships with the computing systems that surround us, there is a continuous need to adopt new strategies for user interface design. Many of the features of the graphical user interface (GUI) were designed under the assumption that computers would be used as isolated tools with a one-to-one relationship with users. But today, each user has many computers, causing existing channels of interaction to break down. The reason for this is that computers have no knowledge of the devices or tasks a user is attending to. As a consequence, users are bombarded with interruptions from their PDAs, email programs, instant messaging applications, and cell phones. The nature of these interruptions is often acute, demanding full and immediate attention.

To design less intrusive and more sociable interfaces, we suggest augmenting computing devices with attention sensors that allow the devices to prioritize their demands for user attention. Thus, users and devices may enter a turn-taking process similar to what naturally occurs in a human group conversation. This process is key to a new paradigm for computer interfaces—Attentive User Interfaces (AUIs). Here, we present some of the prototype AUIs designed at Queen’s University and MIT. We describe scenarios demonstrating how to design systems that engage users in a manner complementary and appropriate to their attentive context, in order to improve interactions among people and ubiquitous computers.

People communicate attention to each other all the time. Gestures, looks, laughs, and other nonverbal utterances often serve to stimulate the listener, making conversations more interesting and engaging. However, nonverbal cues communicate more than just attention. While eye contact is a powerful communicator of attention between people, too much of it can make us uncomfortable and too little leaves us feeling ignored. Like this example shows, nonverbal communication of attention is always interpreted in context. By viewing attention in a social context, we can design systems able to engage in richer, more meaningful interactions with people. AUIs allow user attention to drive the human-computer interface scenario in physical and virtual environments. By recognizing attentive cues from users, and by communicating attention to users, these interfaces encourage a more natural process of turn taking.

All interfaces use some method to negotiate control between computer and user. When computers do not follow reasonable conventions for flow of control, they generate interruptions that are intrusive and annoying. Consider the example of the email tool in Figure 1, which brings up a modal dialogue box to inform the user that a message has been received. Without any regard for the user’s current activity, the dialogue box pops up in the center of the screen. The user can continue his or her activities only by clicking the “OK” button. This example points to a serious underlying flaw in current user interfaces—their lack of knowledge of a user’s current activities. This problem is intensified because users are now surrounded by many computer systems, each competing for the user’s attention. This scenario is analogous to human group communication, in which many people might simultaneously have an interest in speaking.

Clearly, human attention is a limited resource in conversations. A person can only listen to, and fully absorb, the message of one individual at a time. When there are many speakers, the Cocktail Party Effect allows us to focus on the one person we are interested in by attenuating speech from other individuals. However, a more effective method to regulate group communication is to have speakers take turns. According to Short et al. [10], as many as eight cues can be used to negotiate conversational turn taking. Of these, only eye gaze allows people to continuously perceive who is paying attention to whom. We found that visual attention conveyed by eye contact is a reliable indicator of whom one speaks to or listens to during group conversations. It is also a social cue that conveys when it is time for a speaker to relinquish the floor, and who is expected to speak next [1]. Eye contact functions as a nonverbal visual signal that peripherally conveys attention without interrupting the verbal auditory channel. With it, humans achieve a remarkably efficient process of conversational turn taking. Without it, turn taking breaks down [11].

To facilitate turn taking between devices and users in a nonintrusive manner, AUIs monitor nonverbal attentional channels, such as eye gaze, to determine when, whether, and how to communicate with a user. Devices that negotiate requests for attention over peripheral channels make human-device communication more efficient, reliable, and sociable.

Goals of AUIs

AUIs aim to recognize a user’s attention space in order to optimize the information-processing resources of user and devices. This is accomplished by measuring and modeling the users’ past, present, and future attention for tasks, devices, or people. Key features of AUIs include:

Sensing attention. By monitoring users’ physical proximity, body orientation, and eye fixations, AUIs can determine what device, person, or task the user is attending to.
Reasoning about attention. By modeling user attention, AUIs can estimate task prioritization and predict attentive focus.
Graceful negotiation of turns and sense user acknowledgment of the request. Before taking the foreground, AUIs determine whether the user is available for interruption given the priority of the request; signal the user via a nonintrusive peripheral channel; sense user acknowledgment of the request.
Communicating attention. To encourage efficient turn taking, AUIs communicate their attention to users, and communicate the attentive focus of the user to other AUIs and remote people that request the user’s attention.
Augmenting attentive resources. Analogous to the Cocktail Party Effect, AUIs may optimize the use of the user’s attentive resources by magnifying information in the estimated focus of user activity, while attenuating peripheral detail.

Previous Work

Rick Bolt’s Gaze-Orchestrated Dynamic Windows [2] was one of the first true AUIs. It simulated a composite of 40 simultaneously playing television episodes on one large display. All stereo soundtracks from the episodes were active, creating “a kind of Cocktail Party Effect mélange of voices and sounds.” Via a pair of eye-tracking glasses, the system sensed when the user looked at a particular image, turning off the soundtracks of all other episodes. If users looked at one episode for a few seconds, the system would zoom in to fill the screen with that image. Because eye movements are not always voluntary, they are best interpreted as an indicator of interest, rather than as a means for control. Similarly, Nielsen’s Noncommand Interfaces [8] observed user activity and reacted to implicit input based on simple, predefined heuristics, instead of responding to explicit, user-issued commands (for example, mouse clicks).

Vertegaal’s GAZE [12] was one of the first AUIs to apply the Noncommand principle to communicate user attention during remote, collaborative interactions. Using eye trackers, GAZE observes whom and what participants look at during mediated group conversations (Figure 2). By automatically rotating 2D video images of individuals toward the person they look at, participants in a 3D meeting room can see who is talking to whom. According to Maglio et al., not only do users look at other people when speaking to them, they also look at the devices that execute spoken commands [6]. This means a person’s eye gaze can be used to open and close communication channels with devices. We applied this principle in the design of several AUIs described later. However, it is important to note that user attention can be observed through many means besides eye tracking. With Priorities [3], Horvitz designed the first AUI to forward a user’s email messages to digital appliances on the basis of their perceived urgency. Messages are prioritized using simple measures of user attention to a sender: the mean time and frequency with which the user responded to email messages from that sender. Messages with a high priority rating are forwarded to a user’s pager, while messages with low priority can be checked at the user’s convenience.

Similar in nature to AUIs, Context-Aware Systems [5, 9] employ the user’s physical situation, goals, and experience, as well as the system’s capabilities to inform action. These systems can recognize and handle repetitive, work-intensive subtasks to allow users to do less to accomplish their goals. Unlike AUIs, user attention is not the primary criterion to determine user context. For example, the Universal Plug (Figure 3) is a tool capable of functioning in several contexts, without any knowledge of the user’s activities. When the plug is pressed against a power outlet anywhere in the world, it automatically selects the correct power and voltage. The correct prongs enter the outlet, while the others retract without any user intervention. Being a tool, the plug does not vie for user attention, thus the attentive status of the user is not required to use the plug. The difference between AUIs and Context-Aware Interfaces is that context is always dominated by user attention in an AUI framework.

Prototypes that Sense Attention

Here, we introduce some of the prototypes recently developed at Queen’s University and MIT. We begin our discussion by presenting novel attention sensors. To enable a seamless turn-taking process between humans and groups of computers, devices must also communicate attention for the user. Using scenarios, we will illustrate the application of attention sensors in appliances that reason about attentive input and, in turn, convey their own attention.

The first attention sensor is Eye aRe (Figure 4), a simple eye movement detection system. Eye aRe glasses report whether the user is looking in the direction of another device or user, augmented with Eye aRe capabilities. Eye aRe detects both pauses in the user’s eye movements and light emitted from other Eye aRe devices. Software determines when the user blinks in order to detect aspects of the user’s cognitive load, for example, stress and fatigue levels.

Our second attention sensor, eyeCONTACT (Figure 5a), is based on the IBM PupilCam [7]. It consists of a camera that uses computer vision to find pupils in its field of view and detect when users look at the sensor. Unlike most commercially available eye trackers, eyeCONTACT is inexpensive, unobtrusive, tolerant to user head movement, and requires no calibration.

By embedding eyeCONTACT sensors in household appliances and other digital devices we designed eyePLIANCES, which explore gradual turn taking between humans and attentive appliances. By looking at an eyePLIANCE a user conveys attention for the device, which is used to regulate communications. A user interacts with the device with speech commands, or by using remote or manual controls. Figure 5b shows the simplest form of an eyePLIANCE, an attentive light fixture. A user can switch the light on or off by simply saying “on” or “off” while looking at the fixture. By having only one device listen at a time, speech recognition is simplified as generic terms such as “on” and “off” can be reused for different devices. Our experiences indicate that eyeCONTACT sensors, as pointing devices for the real world, make it easier to communicate the target of remote interactions.

Negotiating User Attention

In environments with many attention-sensing appliances, AUIs need a dynamic model of the user’s attentive context to establish a turn-taking process. This context includes which task, device, or person the user is paying attention to, the importance of that task, and the preferred communication channel to contact the user. eyeREASON is a personalized communications server that negotiates all remote interactions between a user and attentive devices by keeping track of the user’s attentive context. Appliances report to the server when they sense a user is paying attention to them. eyeREASON uses this information to determine when and how to relay messages from appliances to the user. This is accomplished using knowledge of what communication channels are occupied, and the priority of the message relative to the tasks the user is engaged in [3]. All speech communication between user and appliances is processed through a wireless headset by a speech recognition and production system on the server. As the user works with various devices, eyeREASON switches its vocabulary to the lexicon of the focus device, sending commands through that device’s I/O channels.

To design less intrusive and more sociable interfaces, we suggest augmenting computing devices with attention sensors that allow the devices to prioritize their demands for user attention.

The following scenario illustrates interactions of a user with various eyePLIANCES through eyeREASON. It shows how an awareness of the user’s attentive context may facilitate graceful turn taking between users and remote ubiquitous devices.

Alex enters his living room, which reports his presence to his eyeREASON server. He turns on his television, which has live-pausing capability (Figure 5c). The television is augmented with an eyeCONTACT sensor, which notifies the server that it is being watched. The eyeREASON server updates the visual and auditory interruption levels of all people present in the living room. Alex goes to the kitchen to get himself a cold drink from his attentive fridge, which is augmented with a radio tag reader. As he enters the kitchen, his interruption levels are adjusted appropriate to his interactions with devices in the kitchen. In the living room, the TV pauses because its eyeCONTACT sensor reports that no one is watching. Alex queries his attentive fridge and finds there are no cold drinks within. He gets a bottle of soda from a cupboard in the kitchen and puts it in the freezer compartment of the fridge. Informed by the radio tag on the bottle, the fridge estimates the amount of time it will take for the bottle to freeze and break. It records Alex’s tag and posts a notification with a timed priority level to his eyeREASON server.

Alex returns to the living room and looks at the TV, which promptly resumes the program. When the notification times out, Alex’s eyeREASON server determines the TV is an appropriate device to use for notifying Alex. It chooses the visual communication channel, because it is being watched and is less disruptive than audio. A box with a message from the fridge appears in the corner of the TV. As time progresses, the priority of the notification increases, and the box grows in size on the screen, demonstrating with increased urgency that Alex’s drink is freezing. Alex gets up, the TV pauses, and he sits down at his computer to check his email. His eyeREASON server determines that the priority of the fridge notification is greater than that of his current email, and moves the alert to his computer. Alex acknowledges this alert, and retrieves his drink, causing the fridge to withdraw the notification. Had Alex not acknowledged this alert, the eyeREASON server would have forwarded the message to Alex’s email, instead of continually notifying him directly.

Communicating Device Attention

To enable efficient and sociable interactions between users and devices, attentive systems must, conversely, convey their attention to a user. Figure 5d shows how eyePLIANCES may communicate their own attention using an eyePROXY. An eyePROXY consists of an eyeCONTACT sensor mounted on a pair of actuated, moveable eyeballs. It can be connected to any eyePLIANCE to provide nonverbal feedback to the user, demonstrating this appliance is now listening, or requesting a turn. An eyePROXY may also serve as a surrogate that indicates the attention of a remote individual [4]. We augmented a speakerphone with an eyePROXY to experiment with gradual negotiation of communications using nonverbal channels. The following scenario illustrates the process.

Arnie wishes to place a call to Barbara. He looks at Barbara’s speakerphone proxy on his desk, which detects eye contact and begins setting up a voice connection with Barbara. On the other side of the line, Arnie’s proxy on Barbara’s desk starts moving its motorized eyeballs, using its eyeCONTACT sensor to find Barbara’s pupils. Barbara observes the activity of Arnie’s proxy in her peripheral vision, and looks at the eyeballs. Only now does the speakerphone establish a voice connection. If Barbara does not wish to take the call, she simply looks away from the proxy. Barbara’s proxy would then convey her unavailability to Arnie by shaking its eyes, and breaking eye contact. To avoid the need for multiple eyePROXYs per location, eyePROXYs can be augmented with a display showing a picture of the current caller.

Discussion and Outlook

The popularity of ubiquitous, wireless computing devices has fundamentally changed the way we interact with technology. We feel it is necessary to augment devices with attention-sensing capabilities to help users manage the many conflicting requests for their attention. Sensing technology has improved in cost and functionality to the extent we can now reliably monitor users to determine what they are paying attention to. AUIs may measure attention in many ways. In social settings, the physical distance between people, the way they turn their heads, and the way they direct their eye gaze at each other all indicate attention. Obtaining nonverbal attentional cues and using them in context allows us to build systems that respectfully and efficiently manage a user’s attention space. This permits more natural, sociable, and most importantly, meaningful interaction between people and groups of computers. We have presented a series of systems and scenarios that describe how we approach this problem. As designers, however, we must keep in mind socio-technological issues that may arise from the usage of attentive systems. For instance, will people trust a technological system to serve as the gatekeeper to their interactions? How can we foster such a trust, and safeguard the privacy of the people using systems that sense, store, and relay information about the user’s identity, location, activities, and communications with other people?

Conclusion

We have presented here an overview of our work on AUI—interfaces that recognize, refine, and respect a user’s attention space. By augmenting devices and appliances with attention sensors that permit the devices to recognize and prioritize demands on the user’s attention, users and devices may enter a turn-taking process analogous to that found in human group conversation. By explicitly designing for the virtual windows of attention between devices and users, interactions with groups of computers may become more sociable as well as more efficient.

Figures

Figure 1. Email application with modal notification alert.

Figure 2. GAZE-2 attentive videoconferencing.

Figure 3. Context aware, not attentive.

Figure 4. Eye aRe glasses

Figure 5. (a) eyeCONTACT sensor. (b) Light fixture with eyeCONTACT sensor. (c) Attentive TV. (d) eyePROXY.

Sidebar: Students “Sense” AUI Solutions

LAFCam

The LAFCam makes use of the involuntary attentive cues people utter. We trained an AI model to recognize MIT Media Lab student Andrea Lockerd’s laugh and voice. We recorded Andrea walking around Harvard Square making a videotape using the system. LAFCam was then able to find and mark the three most engaging moments in the video based on the nonverbal utterances Andrea inadvertently made while filming, highlighting points of interest from her perspective to the audience.

AuraMirror

AuraMirror is a media art project by Human Media Lab student Alexander Skaburskis. The video mirror renders the virtual windows of attention, or attentive auras, that encompass groups of people during interactions. A visual representation of conversational attention is obtained by superimposing bubbles over each participant’s head. These ‘auras’ grow toward interlocutors to form tunnels during sustained interactions. This permits users to see how they distribute their attention in group interactions, and the effect of interruption on this process. When interlocutors look at the mirror to see their merged aura, it will invariably break, because the target of their visualattention has changed. This serves as a metaphor for interruption.

Social Floor

The floor in the Context Aware Computing Lab at MIT senses people’s position and uses this information to comment on social relationships. The reasoning is based on the common-sense notion that proximity is related to interest. People are probably attending to other people and objects that are close by. This simple AUI notices social distance and paints butterflies around the feet of people standing next to each other. If one group of people is standing separate from another, the floor projects footsteps between them. This is done to encourage participants to notice the social distance and possibly attend to each other. If one person is standing apart from a group in a designated location, it projects a podium and activates a spotlight on the speaker. If a lone visitor stands near a demo, a cartoon head appears on the floor, describing the project using localized speakers. This scenario shows that even a crude metric of position can be used to deduce aspects of what a person is attending to.

Attentive Cell Phone

To avoid the problem of phones interrupting face-to-face conversations in public places, Human Media Lab student Connor Dickie (left) augmented a cell phone with a wearable eyeCONTACT sensor. The eyeCONTACT sensor reports when someone looks at Connor. The attentive cell phone uses this information to assess whether Connor is engaged in a conversation. The cell phone communicates his attentive status to people in his customized contact list. If Connor is not available, a picture of the back of his head is displayed. If a message is urgent, callers can override Connor’s preferred method of notification, which is currently set to vibrate. By adding a nonintrusive channel to convey attention, the attentive cell phone encourages social behavior among users, thus reducing the number of interruptive phone calls one receives.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Interacting with Groups of Computers

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/636772.636796

March 2003 Issue

Published: March 1, 2003

Vol. 46 No. 3

Pages: 40-46

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Apr 26 2024

Optimizing Energy Efficiency in Datacenters with Advanced Cooling Technologies

Alex Williams

Architecture and Hardware

Credit: Getty Images Servers in snowy setting.

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Goals of AUIs

Previous Work

Prototypes that Sense Attention

Negotiating User Attention

Communicating Device Attention

Discussion and Outlook

Conclusion

Figures

Sidebar: Students “Sense” AUI Solutions

Interacting with Groups of Computers

DOI

March 2003 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.