BLOG@CACM
Architecture and Hardware blog@CACM

Speech-Activated User Interfaces and Climbing Mt. Exascale

The Communications Web site, cacm.acm.org, features 13 bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish excerpts from selected posts, plus readers' comments. Tessa Lau discusses why she doesn't use the touch screen on her in-car GPS unit anymore and Daniel Reed considers the future of exascale computing.
Posted
  1. Article
  2. Authors
  3. Footnotes
BLOG@CACM logo

From Tessa Lau’s "Hello, Computer"

Four years ago when I bought my first in-car Global Positioning System (GPS) unit, it felt like a taste of the future. The unit knew where I was, and regardless of how many wrong turns I made, it could tell me how to get where I wanted to go. It was the ultimate adaptive interface: No matter where I started, it created a customized route that would lead me to my destination.

Alas, my first GPS unit met an untimely end in a theft involving a dark night, an empty street, and a smashed window.

My new GPS, a Garmin nüvi 850, comes with a cool new feature: speech-activated controls.

Speech recognition brings a new dimension to the in-car human-computer interface. When you’re driving, you’re effectively partially blind and have no hands. Being able to talk to the computer and instruct it using nothing but your voice is amazingly empowering, and makes me excited about the future of voice-based interfaces.

The nüvi’s interface is simple and well designed. There’s a wireless, button-activated microphone that you mount to your steering wheel. When you activate the mic, a little icon appears on the GPS screen to indicate that it’s listening, and the GPS plays a short "I’m listening" tone. You can speak the names of any buttons that appear on the screen or one of the always-active global commands (e.g., "main menu," "music player," or "go home"). Musical tones indicate whether the GPS has successfully interpreted your utterance. If it recognized your command, it takes you to the next screen and verbally prompts you for the next piece of information (e.g., the street address of your destination). Most of the common GPS functionality can be activated via spoken confirmations without even looking at the screen.

Lists (e.g., of restaurant names) are annotated with numbers so you only have to speak the number of the item you want from the list. However, it also seems to correctly recognize the spoken version of anything in the list, even if it’s not displayed on the current screen (e.g., the name of an artist in the music player).

In my tests it’s been surprisingly accurate at interpreting my speech, despite the generally noisy environment on the road.

What has surprised me the most about this interface is that the voice-based control is so enjoyable and fast that I don’t use the touch screen anymore. Speech recognition, which had been in the realm of artifical intelligence for decades, has finally matured to the point where it’s now reliable enough for use in consumer devices.

Part of the power of the speech-activated user interface comes from the ability to jump around in the interface by spoken word. Instead of having to navigate through several different screens by clicking buttons, you can jump straight to the desired screen by speaking its name. It’s reminiscent of the difference between graphic user interfaces (GUIs) and command lines; GUIs are easier to learn, but once you master them, command lines offer more efficiency and power. As is the case with command lines, it takes some experimentation to discover what commands are available when; I’m still learning about my GPS and how to control it more effectively.

Kudos, Garmin, you’ve done a great job with the nüvi 850. I can’t wait to see what the future will bring! (Voice-based access to email on the road? It seems almost within reach.)

Disclaimer: The views expressed here do not necessarily reflect the views of my employer, ACM, or any other entity besides myself.

Reader’s comment:

Information I’ve read lately on the topic of speech recognition indicates that a device’s ability to correctly recognize commands depends in large measure on the quietness of the environment. I have often found that voice systems on my cell phone don’t work well unless I find a quiet place to access them. So it is good to hear that Garmin has found an effective way to interpret commands while driving—an environment that you note can be noisy.

As you speak of future enhancements, it brings up the issue of what drivers should be able to do while on the road. Multitasking is great, but I’m not sure email while driving is such a good idea…
            —Debra Gouchy

From Daniel Reed’s "When Petascale Is Just Too Slow"

It seems as if it were just yesterday when I was at the National Center for Supercomputing Applications and we deployed a one teraflop Linux cluster as a national resource. We were as excited as proud parents by the configuration: 512 dual processor nodes (1 GHz Intel Pentium III processors), a Myrinet interconnect, and (gasp) a stunning 5 terabytes of RAID storage. It achieved a then-astonishing 594 gigaflops on the High-Performance LIN-PACK benchmark, and was ranked 41st on the Top500 list.

The world has changed since then. We hit the microprocessor power (and clock rate) wall, birthing the multicore era; vector processing returned incognito, renamed as graphical processing units (GPUs); terabyte disks are available for a pittance at your favorite consumer electronics store; and the top-ranked system on the Top500 list broke the petaflop barrier last year, built from a combination of multicore processors and gaming engines. The last is interesting for several reasons, both sociological and technological.

Petascale Retrospective

On the sociological front, I remember participating in the first peta-scale workshop at Caltech in the 1990s. Seymour Cray, Burton Smith, and others were debating future petascale hardware and architectures, a second group was debating device technologies, a third was discussing application futures, and a final group of us was down the hall debating future software architectures. All this was prelude to an extended series of architecture, system software, programming models, algorithms, and applications workshops that spanned several years and multiple retreats.

At the time, most of us were convinced that achieving petascale performance within a decade would require new architectural approaches and custom designs, along with radically new system software and programming tools. We were wrong, or at least so it superficially seems. We broke the petascale barrier in 2008, using commodity x86 microprocessors and GPUs, InfiniBand interconnects, minimally modified Linux, and the same message-based programming model we have been using for the past 20 years.

However, as peak system performance has risen, the number of users has declined. Programming massively parallel systems is not easy, and even terascale computing is not routine. Horst Simon explained this with an interesting analogy, which I have taken the liberty of elaborating slightly. The ascent of Mt. Everest by Edmund Hillary and Tenzing Norgay in 1953 was heroic. Today, amateurs still die each year attempting to replicate the feat. We may have scaled Mt. Petascale, but we are far from making it pleasant or even a routine weekend hike.

This raises the real question: Were we wrong in believing different hardware and software approaches would be needed to make petascale computing a reality? I think we were absolutely right that new approaches were needed. However, our recommendations for a new research and development agenda were not realized. At least, in part, I believe this is because we have been loathe to mount the integrated research and development needed to change our current hardware/software ecosystem and procurement models.

Exascale Futures

Evolution or revolution, it’s the persistent question. Can we build reliable exascale systems from extrapolations of current technology or will new approaches be required? There is no definitive answer as almost any approach might be made to work at some level with enough heroic effort. The bigger question is: What design would enable the most breakthrough scientific research in a reliable and cost-effective way?

My personal opinion is that we need to rethink some of our dearly held beliefs and take a different approach. The degree of parallelism required at exascale, even with future many-core designs, will challenge even our most heroic application developers, and the number of components will raise new reliability and resilience challenges. Then there are interesting questions about many-core memory bandwidth, achievable system bisection bandwidth, and I/O capability and capacity. There are just a few programmability issues as well!

I believe it is time for us to move from our deus ex machina model of explicitly managed resources to a fully distributed, asynchronous model that embraces component failure as a standard occurrence. To draw a biological analogy, we must reason about systemic organism health and behavior rather than cellular signaling and death, and not allow cell death (component failure) to trigger organism death (system failure). Such a shift in world view has profound implications for how we structure the future of international high-performance computing research, academic/government/ industrial collaborations, and system procurements.

Back to Top

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More