AI vs. AI: The Race to Generate, Share, and Detect Deepfakes

A video of former U.S. president Donald Trump that was determined to be a deepfake. — Artificial intelligence, which can generate astonishingly realistic false images and videos, is increasingly being used to detect them.

Distinguishing between fact and fakery has become an everyday part of our online lives. During the U.S. election campaign, a manipulated video appearing to show Joe Biden forget which state he was in went viral, receiving more than a million views before it was debunked.

The doctoring of visual material for political mischief-making is nothing new. Josef Stalin notoriously erased undesirable companions from photographs during the Great Purge in 1930s Russia. Nowadays, digital methods mean everyone can fake it.

Artificial intelligence (AI) can generate astonishingly realistic false images and videos. These so-called deepfakes—a name that combines "deep learning" with "fake"— are the rising stars of the post-fact world.

Deepfakes can be made cheaply and quickly by anyone: amateurs, jokers, scammers, political groups, extremists. Many are harmless memes and comic spoofs, but others are deigned to confuse and destabilize. They are deployed as hoaxes, scams, political propaganda, and in mis-/dis-information campaigns.

Deepfakes are commonly created using Generative Adversarial Networks (GANs). Two neural networks (a generator and a discriminator) are set to work against each other; the generator produces random images, and the determinator classifies them as real or fake. A loop of learning and revision ensues until the determinator is fooled into identifying the fakes as real.

AI can not only generate deepfakes, it increasingly is being used to detect them.

Researchers from Intel Corporation and the Graphics and Image Computing (GAIC) laboratory at Binghamton University produced a tool called FakeCatcher that identifies and classifies fake portrait videos via biological signals. The team reported a 99.39% accuracy for pairwise separation, pairs of videos in which one is real and one is fake, and 96% accuracy for deepfake detection.

The GAIC group has been studying facial expressions and emotion using three-dimensional (3D) images and videos, along with biological data,such as heart rate, blood pressure, and other vital signs.

"When people experience different emotions, they experience different physiological signals, those signals are captured in our lab," said GAIC director Lijun Yin.

The resulting datasets are being applied to research in fields as varied as computer vision, security, biomedicine, and psychology.

Ilke Demir, a senior research scientist at Intel, realized the potential of the resource for deepfake detection and collaborated with one of Yin's Ph.D. students, Umur A. Ciftci, as the expert in PPG signals. As Yin explained, "People can synthesize a facial video, an animated facial expression, but it's very hard to synthesize their physiological signals."

When subtle blood flow and heart rate changes are triggered by emotions, they manifest on portrait videos as color changes in pixels. These are difficult to reproduce accurately in a deepfake.

The changes are invisible to the human eye, but can be detected using photoplethysmography (PPG) signals. Under Yin's supervision, Ciftci and Demir developed an algorithm that identifies PPG signals and converts them into a PPG map that can then be used to automatically detect deepfakes.

In a real portrait video, such signals should be consistent spatially, said Yin. "If this is a real person, a real video, then the PPG signal would be consistent from every part of their face, in the forehead, in the chin, in the nose." They should also be consistent temporally: in time, across a video frame. As Yin explained, "A person's heart rate cannot change dramatically across such a short period of time."

The technique works because deepfake producers have not considered these hidden giveaways in their work. "They never keep the consistency in the spatial and in the temporal dimension for the PPG signal," said Yin. However, he added, the advent of this new detection method could change all that.

In Germany, a team at Horst Görtz Institute for IT Security at Ruhr-Universität Bochum has developed a technique for detecting deepfakes using frequency analysis. The work was presented at the International Conference on Machine Learning (ICML) in July.

The team based their approach on an existing mathematical transformation called discrete cosine transform (DCT), which is commonly used in signal processing. DCT "takes image data and transforms it into what's called the frequency space," explained research assistant Joel Frank.

When an image file is compressed, the DCT analysis removes information that is not relevant or visible to the human eye. Prominent image data tends to be found in low frequency areas, which compression algorithms try to preserve. As Frank explained, "We can drop the high frequencies without actually losing a lot of the content of the image."

This is often done to save bandwidth or space when transmitting images. "WhatsApp would remove high-frequency information from a picture because you can't really see it as a human; it is just there," said Frank.

The researchers wrote a program that automates DCT analysis on images sourced from the Which Face is Real? website, and a dataset of bedroom images collated specifically for the project. The team showed that unlike their real counterparts, GAN-generated deepfakes display information traces in high-frequency areas.

These traces, or artefacts, can now be easily identified and used to flag an image as a fake. "In the high frequency, it's very prominent because real pictures don't have much information there and they [deepfakes] actually have quite a bit," Frank said.

The code is freely available on GitHub for use by researchers and programmers. The team now aims to build an interface that would allow any user to upload an image for analysis.

These new detection techniques from Binghamton and Bochum take advantage of subtle qualities in image data that the makers of deepfakes have neglected to consider— so far. It is possible they will react and adapt. Using AI to both generate and detect deepfakes is a fast-moving game of cat and mouse, with no obvious end in sight.

Karen Emslie is a location-independent freelance journalist and essayist.