Computing Applications Multimodal interfaces that flex, adapt, and persist

Multimodal Conversational Systems For Automobiles

By Roberto Pieraccini, Krishna Dayanidhi, Jonathan Bloom, Jean-Gui Dahan, Michael Phillips, Bryan R. Goodman, and K. Venkatesh Prasad

Posted Jan 1 2004

Article
References
Authors
Footnotes
Figures

Currently available in-vehicle speech recognition systems are designed around a single-utterance-command paradigm [2], with as many as 200 commands¹ that must be learned or referenced in a manual—an unpractical option while driving. The combination of a flexible dialogue-based speech system with a visual and haptic touch screen, while still an area of active research [1, 3], provides the opportunity for an intuitive and effective multimodal interface for vehicles.

SpeechWorks² and Ford designed and realized a prototype targeted at relaxing the limitations of the current systems by adopting a conversational speech interface coupled with a touch-screen display. The system, which controls vehicle functions such as climate, telephone, navigation, MP3 player, and personalization, was installed in the Ford Model U concept vehicle and first shown at the 2003 North American International Auto Show in Detroit. Figure 1 shows an image of the actual car interior where the touch-screen interface is visible on the dashboard. Figure 2 shows a depiction of the GUI in one of its various configurations.

The multimodality of the system allows users to adapt to their environment, for example, interacting through the GUI when the car is stopped at a light versus when the car is moving. In addition, we designed the two modalities to complement each other; the graphic interface controls providing hints of the corresponding voice commands. New users interact by speaking commands shown on the display, while the system engages in a directed dialogue and prompts for missing information. Experienced users can then adopt more effective commands and decide, at each turn, whether to interact by using speech or touch controls. The following is an example of the interaction:

User speaks: Climate Control.
System speaks: Climate Control. Warmer or cooler? (The system displays changes to climate control display showing a list of options, including seat temperature, fan speed/direction, and so on).
User: Seat temperature.
System: Seat temperature. Please say driver, passenger, or both.
User: Driver.
System: Warmer or cooler?
User touches the “Warmer” button.
System: Increasing the driver seat temperature by two degrees.

This long interaction helps the driver learning the options and the words for completing the task. Experienced users learn to achieve the same goals with single sentences, such as “climate control driver’s seat temperature down,” or “turn the driver’s seat temperature to 60°.” Drivers are allowed to issue any command at any point, via speech or touch screen, for instance placing a telephone call while engaged in a navigation dialogue.

The speech recognition engine (SpeechWorks’ Speech2Go) makes use of dynamic semantic models that keep track of the current and past contextual information and dynamically modify the language model in order to increase the accuracy of the speech recognizer. A conditional confirmation strategy, also based on contextual information, is used for improving the dialogue flow and reducing the time to completion.

The interaction is controlled by a dialogue manager that responds to input signals with output actions. In automobile applications, the input signals may come from the user as well as from the vehicle, and the actions may be directed to the user or the vehicle. All input signals can cause a change in the course of the interaction. For example, a low-fuel condition signal can cause the navigation system to engage in a dialogue for rerouting the driver to the nearest gas station. We developed a general multimodal dialogue-manager architecture that allows for a complete separation between the interaction logic and the input signals. The interaction logic is independent from the source of the signals (speech, GUI, vehicle signals) and is represented by a recursive transition network as described in [4, 5].

The goal of our multimodal interface is to provide an intuitive and flexible means for controlling vehicle systems while providing a user with the option to operate the system with speech, touch, or any combination of the two.

The goal of our multimodal interface is to provide an intuitive and flexible means for controlling vehicle systems while providing a user with the option to operate the system with speech, touch, or any combination of the two. Providing this flexibility, while maintaining UI capabilities, required careful design at the UI and architectural levels. The prototype described here is one of the first attempts to move this challenge from the research community to commercialization.

Figures

Figure 1. The interior and haptic interface on the dashboard of the Ford Model U concept car.

Figure 2. Example of GUI controls used in the Ford concept car multimodal application.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Multimodal Conversational Systems For Automobiles

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/962081.962104

January 2004 Issue

Published: January 1, 2004

Vol. 47 No. 1

Pages: 47-49

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Apr 26 2024

Optimizing Energy Efficiency in Datacenters with Advanced Cooling Technologies

Alex Williams

Architecture and Hardware

Credit: Getty Images Servers in snowy setting.

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Figures

Multimodal Conversational Systems For Automobiles

DOI

January 2004 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.