Hidden Messages Fool AI

Deep neural networks (DNNs) have advanced to the point where they underpin online services from image search to speech recognition, and are now moving into the systems that control robots. Yet numerous experiments have demonstrated that it is relatively easy to force these systems to make mistakes that seem ridiculous, but with potentially catastrophic results. Recent tests have shown autonomous vehicles could be made to ignore stop signs, and smart speakers could turn seemingly benign phrases into malware.

Five years ago, as DNNs were beginning to be deployed on a large scale by Web companies, Google researcher Christian Szegedy and colleagues showed making tiny changes to many of the pixels in an image could cause DNNs to change their decisions radically; a bright yellow school bus became, to the automated classifier, an ostrich.

But the changes made were imperceptible to humans.

At the time, researchers questioned whether such adversarial examples would translate into the physical domain because cameras would smooth out the high-frequency noise mixed into the digitized images that Szegedy and others were presenting directly to their DNNs. Within several years, examples of real-world attacks appeared. In one case, stickers attached to a stop sign made a DNN interpret it as a 45 m.p.h. (miles per hour) sign even though the word ‘stop’ remained clearly visible.

Although most of the research into subverting DNNs using adversarial examples has been within the realm of image recognition and classification, similar vulnerabilities have been found in networks trained for other applications, from malware classification to robot control. Audio systems such as smart speakers seem just as susceptible to attack using the same concepts. Similar to the effects of camera processing on images, the low-pass filtering of microphones and speakers make some attacks more feasible than others in the real world.

As a Ph.D. student working with David Wagner at the University of California at Berkeley, Nicholas Carlini started looking at fooling speech engines in 2015 as part of a project to examine the vulnerabilities of wearable devices. The UC Berkeley researchers thought practical wearable devices would rely on speech recognition for their user interfaces.

Their focus switched to in-home systems when products such as Amazon’s Echo started to become popular.

“We were able to construct audio that to humans sounded like white noise, that could get the device to perform tasks such as open up Web pages,” says Carlini, now a research scientist at Google Brain. “It was effective, but it was very clear to anyone who heard it that something was going on: you could hear that there was noise.”

In 2017, a team from Facebook AI Research and Bar-Ilan University in Israel showed it was possible to hide messages in normal speech, though a limitation of their so-called Houdini method was that it needed to use replacement phrases, the spoken versions of which were phonetically similar to those being targeted. In November of that year, Carlini found it was possible to push attacks on speech-based systems much further.

“I don’t like writing, and for two or three weeks I had been working on a paper and managed to submit it with 15 minutes to go on the deadline. I woke up the next morning and said, ‘let’s do something fun,'” Carlini explains.

The target was the DeepSpeech engine published as open-source code by Mozilla. “Fifteen hours of work later, I had broken it,” Carlini claims.

Rather than using noise to confuse the system, he had found the engine was susceptible to slightly modified recordings of normal speech or music. The system could be forced to recognize a phrase as something completely different to what a human would hear. The attacks buried subtle glitches and clicks in the speech or music at a level that makes it hard for a human hearing the playback to detect. Some glitches buried in normal phrases convinced the network it was hearing silence.

“I was incredibly surprised it worked so easily. You don’t expect things to break so easily. However, much of it was because I had spent a year and a half on developing attacks to break neural networks in general,” Carlini explains.

However, as a practical attack, the method did not work on audio played through a speaker and into a microphone. Distortions caused by amplifiers and microphones altered the glitches enough to cause the attacks to fail. In Carlini’s version, the adversarial examples needed to be presented to the DNN in the form of ready-made digitized audio files. This was in contrast to his earlier original attack, in which the added noise survived the filtering of physical speakers and microphones. As with other parts of the adversarial-examples space, the attacks have evolved quickly.

Early in the summer of 2018, a system called CommanderSong developed by a team led by researchers at the Chinese Academy of Sciences demonstrated it was possible to hide voice commands to speech-recognition systems in popular tunes played over the air. The victim system recognizes the altered signals as speech commands.

General concern over the susceptibility of DNNs to adversarial examples grew quickly after Szegedy’s work. The attacks seem to work across many different implementations, suggesting there are common factors that make DNNs vulnerable. Numerous low-level countermeasures have been proposed, but almost all have been beaten within months of publication. The problem seems fundamental to systems that can learn.

Humans are susceptible to similar kinds of processing. In experiments intended to find connections between biological perception and AI, Gamaleldin Elsayed and colleagues at Google Brain and Stanford University made subtle changes to images that could fool both humans and DNNs. Neuroscientists believe exposure to images for less than a tenth of a second seems to cut out the brain’s ability to use its complex array of feedback networks for recognition. The behavior becomes more consistent with feedforward networks similar to those used in DNNs.

“I don’t think humans are perfect and we don’t want these systems to be perfect, but we also do not want them to be obviously flawed in ways that humans are not,” Carlini says.

Researchers see one reason for DNNs’ susceptibility to attack being the enormous number of parameters their layers are called upon to process and how those parameters are set during training. One of the reasons it is so easy to force a misclassification is the way that DNNs perform weighted sums of many individual inputs. Small changes to each pixel in an image can shift the overall result to a different state. Carlini saw a similar effect in his work with speech, but cautions against drawing direct parallels.

Misunderstanding the math of high-dimensional spaces may have led to false confidence in the ability of DNNs to make good decisions.

“With five seconds of audio, you have as many as 70,000 samples. Messing with only one sample gives you only a small gain. But we get to do it to a lot of samples. The more interesting question is why it is possible that for any target phrase there’s a way to get to it without making too much of a change to the audio. I don’t have an answer for that, and it is very hard to find a solution to a problem when you don’t know why it happens,” Carlini says.

The huge number of samples or pixels in the inputs means the DNN has to work on data with a huge number of dimensions. Misunderstanding the mathematics of high-dimensional spaces may have led users to place false confidence in the ability of DNNs to make good decisions. Carlini notes: “Lots of intuition turns out to be completely false in these higher dimensions. It makes things a lot harder to analyze.”

In high-dimensional spaces, classifications do not have the clear boundaries we think they do. Relatively small distortions of a large number of pixels or samples in the input image or audio can push a sample from one classification into one of many near neighbors.

At the University of Virginia, work by Ph.D. student Mainuddin Jonas with supervisor David Evans has shown how adversarial examples tend to guide the network away from the correct classification progressively as an image is analyzed by each successive layer of neurons. Reducing the freedom of adversarial examples to push the classification process off course may yield a way to reduce their impact.

In parallel, several groups have looked at ways to harden classification boundaries. They are using the mathematics of these higher-dimensional spaces to indicate how best to keep classifications more clearly defined in the trained network. This could lead to methods that detect and discard ambiguous training data and so harden the classification boundaries. But this work remains at an early stage.

Evans has proposed a technique he calls feature-squeezing, which uses techniques such as reducing the bit-resolution of the data processed by neurons. “My goal is to try to reduce the adversary’s search space. The perspective that we have on it is to take the high-dimensional search space that the adversaries can currently exploit and try to shrink that,” he says. But he notes the problem of tackling adversarial examples and similar attacks will take a lot more effort. “It is definitely an area where there a lot of exciting work going on. We are at the very early stages of what may be a very long arms race.”

Carlini believes it will be essential to explore the core mechanisms that drive DNNs to understand how adversarial examples succeed. “I don’t think you can construct sound defenses without knowing why they work. We need to step back and figure out what’s going on.”

Further Reading

Szegedy, C., Zaremba, E., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R.
Intriguing properties of neural networks International Conference on Learning Representations 2014. ArXiv:1312.6199 (12/2013)

Carlini, N. and Wagner D.
Audio Adversarial Examples: Targeted Attacks on Speech-to-Text 1^st IEEE Deep Learning and Security Workshop (2018). ArXiv:1801.01944 (3/2018)

Jonas, M.A and Evans D.
Enhancing Adversarial Example Defenses Using Internal Layers IEEE Symposium on Security and Privacy 2018. [https://www.ieee-security.org/TC/SP2018/poster-abstracts/oakland2018-paper29-poster-abstract.pdf]

Papernot, N. and McDaniel P.
Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning ArXiv:1803.04765 (3/2018)

Hidden Messages Fool AI

DOI

January 2019 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Hidden Messages Fool AI

DOI

January 2019 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.