Sign In

Communications of the ACM

BLOG@CACM

Hello, Computer


IBM Almaden Researcher Tessa Lau

Four years ago when I bought my first GPS in-car unit, it felt like a taste of the future. The unit knew where I was, and regardless of how many wrong turns I made, it could tell me how to get where I wanted to go. It was the ultimate adaptive interface: no matter where I started, it created a customized route that would lead me to my destination.

Alas, my first GPS met an untimely end in a theft involving a dark night, an empty street, and a smashed window.

My new GPS, a Garmin nüvi 850, comes with a cool new feature: speech-activated controls.

Speech recognition brings a new dimension to the in-car human-computer interface. When you're driving, you're effectively partially blind and have no hands. Being able to talk to the computer and instruct it using nothing but your voice is amazingly empowering, and makes me excited about the future of voice-based interfaces to come.

The nüvi's interface is simple and well-designed. There's a wireless button-activated mic that you mount to your steering wheel. When you activate the microphone, a little icon appears on the GPS screen to indicate that it's listening, and it plays a short "I'm listening" tone. You can then speak the names of any buttons that appear on the screen, or one of the always-active global commands (e.g., "main menu" or "music player" or "go home"). Musical tones indicate whether it has successfully interpreted your utterance. If it recognized your command, it takes you to the next screen and verbally prompts you for the next piece of information (e.g., the street address of your destination). Most of the common GPS functionality can be activated (via spoken confirmations) without even looking at the screen.

Lists (e.g., of restaurant names) are annotated with numbers so you only have to speak the number of the item you want to choose from the list. However, it also seems to correctly recognize the spoken version of anything in the list, even if it's not displayed on the current screen (e.g., speaking the name of an artist in the music player).

In my tests so far it's been surprisingly accurate at interpreting my speech, despite the generally noisy environment on the road.

What has surprised me the most about this interface is that the voice-based control is so enjoyable and fast that I don't use the touchscreen at all anymore. Speech recognition, which had been in the realm of AI for decades now, has finally matured to the point where it's now reliable enough for use in consumer devices.

Part of the power of the speech-activated UI comes from the ability to jump around in the interface by spoken word. Instead of having to navigate through several different screens by clicking buttons, you can jump straight to the desired screen by speaking its name. It's reminiscent of the difference between GUIs and command lines; GUIs are easier to learn, but command lines offer more efficiency and power once you master them. As is the case with command lines, it takes some experimentation to discover what commands are available when; I'm still learning about my GPS and how to control it more effectively.

Kudos Garmin, you've done a great job with the nüvi 850. I can't wait to see what the future will bring! (Voice-based access to email on the road? It seems almost within reach.)


Disclaimer: The views expressed here do not necessarily reflect the views of my employer, ACM, or any other entity besides myself.


Comments


Debra Goudy

Information I've read lately on the topic of speech recognition indicates that a device's ability to correctly recognize commands depends in large measure on the quietness of the environment. I have often found that voice systems on my cell phone don't work well unless I find a quiet place to access them. So it is good to hear that Garmin has found an effective way to interpret commands while driving - an environment that you note can be noisy.

As you speak of future enhancements, it brings up the issue of what drivers should be able to do while on the road. Multi-tasking is great, but I'm not sure email while driving is such a good idea...


Jesse Iswaraputra

As long as the driver or passengers are not in the middle of conversation, or not turning on loud music within the car, the environment inside the car should be more quiet compared to the environment of a cell phone user.

Email while driving is okay, as long as it's only short messages, not a long conversation that a user need to focus on.

What I would like to see is an actively maintained open source project that will allow more people to utilize speech recognition technology. I think most of the best speech recognition technology is still within the domain of proprietary technology.

But http://freespeech.sourceforge.net/ or http://www.voxforge.org/ is a good start.


Displaying all 2 comments