Smarter Voice Assistants Recognize Your Favorite Brands—and Health

Users can't always tell when smart speakers/assistants are listening. — Smart speakers/assistants are about to start listening for sounds linked to specific brands.

"Alexa. What brand of coffee am I drinking?" I ask my Amazon Echo.

"Hmm. I don't know that," she replies.

Well, she might know it soon.

At January's Consumer Electronics Show in Las Vegas, a boost to the artificial intelligence (AI) that allows smart speakers like the Echo, Google Home, and Apple Homepod to reliably recognize everyday sounds—and to act on them—is set to lend the devices powerful new capabilities, including the ability to recognize your favorite brands from the noises they make.

Based on sound recognition technology from a British AI startup called Audio Analytic, these capabilities include allowing voice assistants to recognize the sounds of the brands you use day to day, to boost your home's security by listening out for out-of-the-ordinary "anomalous" sounds around the house, and, for the first time, to collect health data by recognizing coughs, sneezes, sniffles, yawns, and snores, in order to recommend medicines, or pharmacies.

However, this smart speaker intelligence boost is coming just as security interaction engineers at the University of Michigan in Ann Arbor are warning that people are blithely surrendering their data privacy to the continuously listening microphones of their voice assistants, without understanding the potential implications of doing so.

University of Michigan engineers Florian Schaub and Josephine Lau told the 21st ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSAW 2018) in November that smart speaker makers need to design effective, usable privacy controls—because the risk to our privacy is increasing as voice assistants are fast migrating beyond tabletop speakers to our cars, smartwatches, fitness trackers, wearables, wireless headsets, TV streaming boxes, security cameras, and smart heating/lighting controllers.

All these platforms are able to exploit the patent pending "brand sonification" technology that Audio Analytic will be plugging at CES 2019, the Consumer Technology Association's annual event in Las Vegas in January.

The basic idea behind brand sonification, according to Audio Analytic CEO Chris Mitchell, "is to have voice assistant devices respond to the sounds that brands make when they are used." Imagine popping open a tube of Pringles snack chips; a smart speaker would recognize the talismanic sound of the snack's foil being torn off, and play an advertisement for a dip or salsa to complement the chips. Alternatively, think about cracking open a cold can of your favorite soft drink, and your smart speaker recognizes the sound of the ringpull and the can's depressurizing "pfttt!," which triggers a spoken ad for local discounts on that very drink, or it could even give you loyalty points.

Are such activities an invasion of your "soundspace"? Schaub thinks so. He predicts, "People will find the detection of what kind of soft drink they are having, based on the sound of opening the can or bottle, very creepy."

Audio Analytic's Mitchell, however, says any use of the technology would be on an "opt-in" basis, requiring the "express consent" of consumers. "We've designed the brand sonification technology to enable brands to interact with consumers at the point they use the product; for example, opening a snack during half-time during the Super Bowl and winning a prize."

Mitchell says Audio Analytic is pursuing a number of avenues for its technology, such as designing drink cans so that when opened, they make different, distinctive kinds of sounds that precisely identify the drink "and so drive some kind of interaction." However, the drink does not have to be identified; simply knowing you're drinking from a can could be valuable, says Mitchell, and might spark a verbal request from the smart speaker to recycle the can when you're finished.

Another idea, he says, is to have the startup sound of, say, an Xbox or Apple Mac kick off an audio interaction for gamers.

Beyond brand recognition, Audio Analytic is aiming to make voice assistants smarter and more valuable to their users by providing their makers with the ability to recognize, with high reliability, 15 new everyday sounds, in order to alert users, via spoken or app alert or even by activating devices like smart lighting, to many more conditions they may be interested in knowing about.

Until now, voice assistants like Amazon's Echo have been able to recognize just four major domestic sounds: glass breaking, a smoke alarm, a baby crying, and a dog barking. On hearing any of those sounds, a voice assistant will alert users via phone app that something is up. Thanks to machine-learned sound profiles, this is done intelligently; the devices only respond to the sound of glass breaking in windows and doors, but not to a wine glass being dropped. The smoke alarm detector can discriminate between the real thing and pings from phones and microwave ovens. and even parrots that mimic smoke alarms. A baby's cry sends app alerts to parents, while Alexa activates playback of a lullaby and switches on a smart nightlight for the infant.

Audio Analytic's 15 new sound recognition profiles will identify anomalous sounds in the home such as people knocking at the door, doorbells, bicycle bells, emergency vehicle sirens, reversing vehicles, and car horns. Considering the sounds made by humans, the new profiles will recognize sneezing, coughing, yawning, snoring, laughing, and shouting.

To demonstrate a health application, Mitchell played a sequence of throaty coughs from a handheld sound synthesizer into an Intel smart speaker running Amazon's software. Alexa responded: "You've been coughing a lot today. Do you want me to recommend a medicine or some alternative remedies?"

"So it's not so much about detecting one cough," Mitchell says, "but how much you have been coughing. What might that mean? It's a fascinating area for sound analysis; it can start surmising stuff."

Yet the new sound detection capabilities also offer the potential for controversy, as the speakers now collect low-level health data. Snoring and yawning a lot, for instance, could be signs of obstructive sleep apnea, so leaked data might impact somebody's health insurance, or even car insurance rates. A lot of coughing and sneezing might impact employability, too, if somebody seems too sickly too often.

To avoid the kind of opprobrium that hit Google after its DeepMind division acquired health records without patient permission, Mitchell says it is important for smart speaker makers using their cough/sneeze recognition technology to keep such processing local to the device, rather than storing the data in the cloud, where it might leak from unsecured storage 'buckets'.

If that sounds unlikely, even the judiciary has picked up on the fact that smart speaker accounts can store important audio data. For instance, a judge has ordered Amazon to turn over any recordings that an Echo device made around the time of a January 2017 killing in a double-murder case in Farmington, NH.

That people are unaware their smart speakers could record activity after the device's wake word is uttered is no surprise to Schaub and his fellow researchers. Their survey of smart speaker users showed that consumer rationalizations for installing smart speakers showed "an incomplete understanding of privacy risks" and that they had a misplaced "trust relationship" with the smart speaker companies. Most users, the researchers said, seemed resigned to losing their privacy and accepted it as a cost of using the technology.

The University of Michigan team offered two key findings related to smart speakers makers failing to provide effective privacy controls. First, users thought it was effective to verbally instruct a smart speaker to stop listening, even though on most of them, a physical button must be pressed to mute the microphone. Second, audio logs in companion apps like the one that works with the Echo were not touted as a privacy feature, when they can in fact be quite a strong one because they allow users to delete any logged audio they choose from their cloud account.

Schaub agrees with Mitchell that processing audio data locally on the device is safest, but he suspects speaker makers will be tempted to aggregate the data. "Behavioral tracking approaches are primarily used to create advertising profiles about individuals to better target ads to them, or worse, sell this information to data brokers," he says.

Still, voice assistant makers should harbor no illusions that audio data is any less worthy of protection than other forms of data, says a spokesman for the Information Commissioner's Office (ICO) in London, which drew up of many of the measures in the European Union's General Data Protection Regulation (GDPR), which companies the world over now have to comply with if they want to sell into Britain or Europe.

"It's important to understand that where personal data is processed, data protection law applies," says the ICO spokesman.

"Being clear with individuals about the use of their data, and providing options to control that data, are important matters for organizations to get right. This applies just as much to virtual personal assistants as it does to any other device, product or service."

Paul Marks is a technology journalist, writer, and editor based in London, U.K.

Smarter Voice Assistants Recognize Your Favorite Brands—and Health

Smarter Voice Assistants Recognize Your Favorite Brands—and Health

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.