All deep learning systems are vulnerable to adversarial attacks, researchers warn. Tiny alterations to the input can cause these neural networks to classify pictures or other data totally incorrectly. While cause for concern, this also sparks research that may lead to better, more accountable, artificial intelligence.
Artificial intelligence (AI) based on neural networks has made spectacular progress in recent years. They have become almost as good as doctors at interpreting medical scans, and Google's AlphaGo beat the human world champion in the game Go in 2016, 10 years earlier than expected.
However, the discovery of various kinds of adversarial attacks has exposed the Achilles heel of neural networks.
In February, Samuel Finlayson and other researchers from Harvard University and the Massachusetts Institute of Technology published an article on ArXiv demonstrating adversarial attacks that made neural network-based AI-systems completely misclassify medical pictures of the retina, skin, and chest X-rays. Scans of a cancerous lesion could be changed to benign, or vice versa.
In April, in a Policy Forum in Science magazine, the researchers urged the medical community to pay attention to this problem. U.S. healthcare is a trillion-dollar business, with potentially huge incentives for meddling with AI-systems.
Tailor-made changes
Adversarial attacks are based on slightly altering the original data in such a way that the neural network radically revises its assessment. It is not hacking, because the attacker does not need access to the neural network.
For instance, a PGD (projected gradient descent) attack adds tiny changes to all individual pixels in the original image. These changes are carefully tailored to lead the network astray while it is processing the data. These tailor-made pixel changes can be so tiny that they are invisible to the human eye; separate from the picture, they look like random noise.
Another type of adversarial attack puts a 'patch'—like a stamp or sticker—on an image. For the neural network, this patch overrules all other information in the image. Though clearly visible to the human eye, the neural network notices nothing peculiar, because it was not trained to recognize these patches.
The attacker needs his or her own neural network to construct such an adversarial image or patch. If this depended on the detailed architecture of this particular network, and on the set of images it was trained on, the threat would be minimal, but researchers have been astonished to see how a single adversarial image could attack a broad class of networks trained on different sets of images.
So, are medical deep learning systems in deep trouble? Bram van Ginneken, professor of Medical Image Analysis at Radboud University in the Netherlands, has a different view: "This would only be a serious problem if this attack took place in the scanner itself, and that is unlikely to happen. After that, all medical images are sent to a hospital database, while securing their integrity, for instance by hash-coding the data. Medical data are already better protected than most other data."
As a developer of deep learning software, Van Ginneken takes a keen interest in this phenomenon, because "it helps us to better understand the fundamental properties of deep learning networks."
Not fooled by a patch
The existence of adversarial attacks that are baffling to humans drives home the message that neural networks are not as similar to real brains as they are often believed to be. Says Van Ginneken, "Some parts of our brains do something similar to deep learning networks, but brains do lots of other things, too, like reasoning." That is why a human is not fooled by a sticker-like patch on a medical scan; at some higher level of interpretation, such a patch does not make sense, so its data are ignored.
Neural networks can be trained to do almost anything, so they can also be trained to resist adversarial attacks. One method is to proactively generate adversarial attacks and incorporate these images into the training database. The problem with this method is the space of possible adversarial attacks is so vast that no reasonable amount of adversarial training can completely cover that. On the Cleverhans blog, AI security researchers Ian Goodfellow and Nicolas compare this type of defense to playing Wac-A-Mole; they argue that it is inherently more difficult to defend against adversarial attacks than to launch them.
That is also Van Ginneken's experience: "Soon after a defense method is published, someone will break it. There is an arms race going on, but this is a blessing in disguise, for this has become a very fertile way of doing research, and many researchers now like to work on this."
In its infancy
Part of the problem is that the neural networks are not currently accountable for their decisions. The myriad interactions between their 'neurons' that process the data and deliver a classification are a black box, even to their designers. There is no theoretical reason, however, why neural networks should never be able to explain—to a human or a to supervising neural network—how they arrive at a verdict.
For instance, if a neural network could report which pixels in an image contributed the most to its verdict, an adversarial patch or a PGD attack would be detectable. Future deep learning networks could actually be talking with a human specialist—though the scope of the conversation would be strictly limited.
Concludes Van Ginneken, "It shows that this technology is still in its infancy. As neural networks become better, adversarial attacks will become more difficult. In 20 years, AI experts will worry about entirely different problems."
Arnout Jaspers is a freelance science writer based in Leiden, the Netherlands.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment