Poisoning Data to Protect It

After they released a tool designed to foil facial recognition systems in 2020, computer scientist Ben Zhao and his colleagues at the University of Chicago received a confusing email. Their solution, Fawkes, subtly alters the pixels in digital portraits, rendering images incomprehensible to automated facial recognition systems. So when an artist emailed Zhao to ask whether Fawkes might be used to protect her work, he did not see the connection.

Then news of revolutionary generative artificial intelligence (AI) solutions like Midjourney and Dall-E began to spread. Digital illustrations, photographs, and other visual works had been scraped from the Internet to train various generative models without the consent of the creators. That artist who emailed, as it turned out, was on to something. “We had a bit of a lightbulb moment,” Zhao recalls, and he and his colleagues soon joined a Zoom call with more than 500 concerned artists. “They were all talking about how their lives were turned upside down by the arrival of generative AI. Their work was getting fed into this machine, and people could generate work in their style without their permission. This was not just a loss of income, but a loss of identity.”

Today, Zhao and his group are part of a larger effort to develop technological solutions to protect content creators and their work. He focuses on data poisoning, a technique that can be applied beyond visual media to sound and text. Other research groups are developing ways to edit the output of models to protect the rights of artists.

These efforts to disrupt generative AI models are unusual for the machine learning field, according to computer scientist David Bau of Northeastern University. “For decades we have been struggling to get these models to do what we want them to do, and now they’re doing that, but they’re also doing things we don’t want them to do, like imitate living artists or corporate logos and trademarks,” says Bau. “We’ve never addressed the question of how do you make an AI less capable, but that’s sort of what we’re facing now.”

After the packed Zoom call, Zhao connected with a successful concept illustrator named Karla Ortiz (http://www.karlaortizart.com/). Her work had been scraped from the Web and used to train generative AI models, making it possible for anyone to request an illustration in her style and return decent results. “I asked her, if I gave you a tool that could disrupt this process, would you use it?” Zhao recalls. “The answer was a resounding yes.”

The obvious way to prevent a model from replicating a content creator’s work would be to re-train it on a dataset that excludes that individual’s output. The same holds true for trademarked content. If the model had a tendency to copy a corporate logo, then the group that built the model could remove all instances of that logo from the data and train a new model on that edited dataset. The problem is that retraining a model is an expensive, time-consuming process that also consumes tremendous amounts of compute and electricity.

Another strategy is to compile and maintain opt-out lists, then convince generative AI companies to respect the wishes of content creators and agree not to train on this data. To the illustrator Eva Toorenent (https://www.evaboneva.com/), however, putting the burden on artists to ensure their work is excluded from each new update is unreasonable. “It’s like they’re saying, ‘if you ask nicely, we won’t break into your house’,” she says. “How about you just don’t break in at all?”

Zhao has adopted a different strategy to protect the work of artists, based in part on the idea that humans and AI models see differently. Consider Ortiz, the concept artist. Whereas humans might look at one of her works and instantly recognize the Marvel character Loki, an AI discerns subtle and intricate patterns in the many pixels that make up the image. The initial tool Zhao developed, Glaze, uses the popular Stable Diffusion engine to alter the majority of pixels in an image. Since each pixel has one of 256 RGB color values, a miniscule digital tweak can dramatically impact how the AI sees the art without altering what we see as humans. “Glaze is able to take a characteristic portrait of someone in a realistic style and convince the AI model that it’s looking at a Jackson Pollock,” Zhao says. To the human eye, however, the two works are indistinguishable.

The Glaze tool functions as a kind of protective shield for artists who want to prevent people from fine-tuning a model on their style, but it does not stop companies from broadly scraping the Internet for images to train their models. Zhao’s follow-up solution, Nightshade, discourages such behavior by acting as a poison pill. Nightshade can subtly alter an image of a cat so that it will appear unchanged to humans but appear to have the features of a dog to an AI model. “If a company comes along and scrapes a bunch of these images for training despite opt-out lists and other measures, then the next time someone comes along and asks this model for a picture of a cat, they will get an image of a dog,” Zhao notes.

Zhao sees Nightshade as a more aggressive form of copyright protection, one that will make it too risky to train a model on unlicensed content. The startup Spawning AI, which has partnered with companies like Shutterstock to compile massive do-not-train lists, is developing a data-poisoning solution with a similar goal called Kudurru. According to Spawning CEO Jordan Meyer, when a tool attempts to download an image from a Kudurru-protected site, the solution can send back gigabyte-sized files to stall its download or alter the data to confuse the model.

Data poisoning is not merely a protective tool, as it has been used in numerous cyberattacks. Recently, researchers have detailed ways in which the output of Large Language Models (LLMs) can be poisoned during fine-tuning so that specific text inputs will trigger undesirable or offensive results. Over time, malicious attackers will likely find antidotes that disrupt the pro-artist poisoning techniques as well. For now, though, Zhao’s work has proven popular with content creators—Glaze was downloaded 1.6 million times in its first few months—and inspired scientists to develop protective tools for other forms of expression, including voice.

After learning about Zhao’s tools, computer scientist Ning Zhang of Washington University in St. Louis (https://cybersecurity.seas.wustl.edu/ning/index.html) developed a solution called AntiFake that protects an individual’s voice against AI training. Instead of altering pixels, AntiFake makes small changes to the sound waves expressing a person’s particular voice. “We perturb the signal in places where it’s less obvious for humans,” says Zhang. “If you imagine an audio wave, and slice it into many, many little chunks, then for each chunk, there are specific time points where our tool will make the wave shoot up or down.” These perturbations are designed to maximize the impact on the AI model without impacting how the audio sounds to the human ear.

An alternative to poisoning data is changing or editing what a model can generate. Bau and his group at Northeastern University explored whether it would be possible to teach a model to erase a specific concept, such as a copyrighted logo or Vincent Van Gogh. Instead of re-training a model on new data or creating a new one that functions as if the data associated with a given concept never existed, Bau and his team create a fine-tuned version of a model that modifies the output under specific conditions to avoid generating the prohibited concept. If the original model is the teacher, Bau explains, and the fine-tuned version is the student, then the student will follow the teacher’s example closely under most conditions. But when the prompt involves something off-limits—an illustration in the style of a particular artist, such as Van Gogh—then the fine-tuned model will do the opposite of what the original would have done. “In the end, we can get a student model that is the same as the teacher in almost all ways, except that it refuses to imitate Van Gogh,” Bau says. “That concept is erased.”

Another technique from Bau’s group takes an entirely different approach. This second method would change the connections between the neural network representing the text “Van Gogh” and the network that generates images from those prompts. The researchers developed ways to alter the mappings so that if someone were to ask for a work in the style of Van Gogh, the model would be steered in another direction, and associate that prompt with a more generic artistic style. Bau says this is a more surgical way of exerting editorial control over the output.

Ultimately, the goal of many of these researchers is to return some control to creators and data owners. “This is probably too blue sky, but I hope that by working on proactive defense in collaboration with other tools, we are able to build a more friendly and responsible data community,” says Zhang, the developer of the AntiFake tool.

Toorenent, the illustrator, welcomes this assistance from the research community. She protects every new work with Glaze and Nightshade. “Even if these tools might be broken in the future, they give me a sense of security today,” she says. “The way we can actually battle this is through laws and regulations, but for now these tools are amazing.”

Further Reading

Poisoning Data to Protect It

DOI

June 2024 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Further Reading

Poisoning Data to Protect It

DOI

June 2024 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.