The Rise of the Chatbots

During the 2016 U.S. presidential race, a Russian “troll-farm” calling itself the Internet Research Agency sought to harm Hillary Clinton’s election chances and help Donald Trump reach the White House by using Twitter to spread false news stories and other disinformation, according to a 2020 report from the Senate Intelligence Committee. Most of that content apparently was produced by human beings, a supposition supported by the fact that activity dropped off on Russian holidays.

Soon, though, if not already, such propaganda will be produced automatically by artificial intelligence (AI) systems such as ChatGPT, a chatbot capable of creating human-sounding text.

“Imagine a scenario where you have ChatGPT generating these tweets. The number of fake accounts you could manage for the same price would be much larger,” says V.S. Subrahmanian, a professor of computer science at Northwestern University, whose research focuses on the intersection of AI and security problems. “It’ll potentially scale up the generation of fakes.”

Subrahmanian co-authored a Brookings Institution report released in January that warned the spread of deepfakes—computer-generated content that purports to come from humans—could increase the risk of international conflict, and that the technology is on the brink of being used much more widely. That report focuses on fake video, audio, and images, but text could be a problem as well, he says.

Text generation may not have caused problems so far. “I have not seen any evidence yet that malicious actors have used it in any substantive way,” Subrahmanian says. “But every time a new technology emerges, it is only a matter of time, so we should be prepared for it sooner rather than later.”

There is evidence that cybercriminals are exploring the potential of text generators. A January blog post from security software maker Checkpoint said that in December, shortly after ChatGPT was released, unsophisticated programmers were using it to generate software code that could create ransomware and other malware. “Although the tools that we present in this report are pretty basic, it’s only a matter of time until more sophisticated threat actors enhance the way they use AI-based tools for bad,” the company wrote.

Meanwhile, Withsecure, a Finnish provider of cybersecurity tools, warned of the threat of so-called “prompt engineering,” in which users coax software like ChatGPT to create phishing attacks, harassment, and fake news.

ChatGPT, a chatbot based on a large language model (LLM) developed by AI company OpenAI, has generated much excitement as well as fear about the advances of AI in general, and there has been a backlash from many technologists across varying disciplines. There have been calls to pause AI’s development, and at press time, a one-sentence open letter to the public signed by hundreds of the world’s leading AI scientists, researchers, and more (including OpenAI CEO Sam Altman) warned that “[m]itigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Microsoft, which invested in its development, soon incorporated the chatbot into its search engine, Bing, leading to reports of inaccurate and sometimes creepy conversations. Google also put out a version of Bard, its own chatbot based on its LaMDA LLM that had previously made the news when a Google engineer proclaimed it was self-aware (he was subsequently fired).

Despite some early misfires, the text generated by these LLMs can sound remarkably like it was written by humans. “The ability to generate wonderful prose is a big and impressive scientific accomplishment from the ChatGPT team,” Subrahmanian says.

Fake Detectors

Given that, researchers agree it would be useful to have a way to distinguish human-written text from that generated by a computer. Several groups have developed detectors to identify synthetic text. At the end of January, OpenAI released a classifier designed to distinguish between human and machine authors, hoping to both identify possible misinformation campaigns and cut down on the risk of students using a text generator to cheat on their school-work. The company warns its classifier is not fully reliable; in tests, it labeled 9% of human-written texts as AI-written, was unreliable on texts of less than 1,000 characters, and did not work well in languages other than English.

Bimal Viswanath, a professor of computer science at Virginia Tech, says some detectors demonstrated very high accuracy when their developers tested them on synthetic text that those developers had generated, but did less well with fake text found in the real world, where the distribution of data may be different from what was created in the laboratory and where malicious actors try to adapt to defenses.

AI-written text is thought to be detectable because of the way it is created. The LLMs are trained on human-written text and learn statistics about how often particular words appear in proximity to other words. They then make predictions about how likely it is that a given word is the best choice to appear next in a sentence and pick the word with the highest probability, generally speaking. Humans show more diversity in their choices of words, and that difference in diversity can be perceived.

Viswanath underscores the difficulty of being able to say for sure why detectors peg a particular text as real or fake. They use neural networks and deep learning to identify hidden patterns in sequences of text, but as with so much of deep learning, scientists cannot always identify the patterns. Attackers also can evade the detectors by modifying their language generator; having it select slightly fewer high-probability words, for instance, can introduce enough randomness in the word choice to make the text seem human-generated to a neural network.

That strategy has its limitations. If a malicious actor is trying to get out a particular message, they cannot change the text so much that that message is lost. “You have a certain thing that you want to communicate. You don’t want to change that underlying semantic content,” Viswanath says. That points to a method that might be better at detecting fake text. Because the LLM does not really know what it is talking about, it can inadvertently select words with different meanings. For instance, it might start talking about named places or people, but within a few sentences it could drift into a different set of names. “And then the article may not sound coherent anymore,” he says. Using semantic knowledge to detect synthetic text, though, is an area that still requires a lot of research, he adds.

Watermarking

Another approach to identifying synthetic text is building in a hidden pattern when the text is created, a process known as watermarking. Tom Goldstein, a professor of computer science at the University of Maryland, has developed a scheme to embed such a pattern in AI-generated text. His system uses a pseudo-random number generator to assign each token in a text—a character or sequence of characters, often a single word—to either a red list or a green list. Humans, not knowing which list a word is on, should choose a roughly equal proportion of red- and green-list words within a mathematically predictable variation.

The text generator, meanwhile, assigns extra weight to green-list words, making them more likely to be chosen. A detector that knows the algorithm used to generate the list, or even just the list itself, then examines the text. If it’s near half-red and half-green, it decides a human wrote it; if the green words greatly outweigh the red, the machine gets the credit.

It only takes 36 tokens—approximately 25 words—to produce a very strong watermark, Goldstein says, so even individual tweets can be labeled. On the other hand, it is possible to weaken or remove a watermark by having a human or another LLM rewrite the text to include more red-list words. “The question is, how much of a sacrifice in quality do you need to suffer to remove the watermark?” Goldstein says.

In fact, Viswanath says, every defense can be defeated, but at a cost. “If you’ve raised the cost of the attack so significantly that the attack is no longer worth it, then you’ve actually won as a defender,” he says.

Aside from deliberate misuse, text generators also can generate toxic content unintentionally. Soroush Vosoughi, a professor of computer science in the Institute for Security, Technology and Society at Dartmouth University, is working on methods for countering the antisocial possibilities of text generation by looking for ways to make chatbots prosocial. “We develop models that can sit on top of these language models and guide their generation,” he says.

For instance, Vosoughi has developed a classifier, based on ratings from groups such as the Pew Research Center that classify news outlets as leaning left or right politically. The classifier learns to identify certain words as more indicative of a political bias and steers the chatbot to give greater weight to neutral terms. It might, for instance, push the generator away from following the word “illegal” with “aliens” and instead encourage it to write “immigrants.” Another version waits until the whole sentence is generated, and then can go back and change the phrase to “undocumented immigrants.” The same sort of approach can be used with, say, medical information, to make it less likely the generator will produce misleading advice.

Of course, this approach requires humans to define the values they want the LLM to uphold, Vosoughi says, but at least it can avoid the problem of models inadvertently generating hate speech or misinformation.

None of these solutions is permanent, researchers warn. Every success at labeling or detecting machine-written text is likely to be met by more sophisticated methods of evading such detection. That does not mean sitting out such an arms race is an option, Vosoughi says. “We need to be just one step ahead of the other side,” he says. “That’s the best we can do in these situations.”

Further Reading

Pu, J., Sawar, Z., Abdullah, S. M., Rehman, A., Kim, Y., Bhattacharya, P., Javed, M., and Viswanath, B.
Deepfake Text Detection: Limitations and Opportunities, IEEE Symposium on Security and Privacy 2023. https://doi.org/10.48550/arXiv.2210.09421

Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., and Goldstein, T.
A Watermark for Large Language Models, 2023, arXiv, https://doi.org/10.48550/arXiv.2301.10226

Liu, R., Jia, C., Wei, J., Xu, G., Wang, L., and Vosoughi, S.
Mitigating Political Bias in Language Models Through Reinforced Calibration, 2021, Proc. of the AAAI, https://doi.org/10.48550/arXiv.2104.14795

Byman, D.L., Gao, C., Meserole, C., and Subrahmanian, V.S.
Deepfakes and International Conflict, 2023, Foreign Policy at Brookings, https://www.brookings.edu/research/deepfakes-and-international-conflict/

What is ChatGPT? OpenAI’s ChatGPT Explained https://www.youtube.com/watch?v=o5MutYFWsM8

Fake Detectors

Watermarking

The Rise of the Chatbots

DOI

July 2023 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Fake Detectors

Watermarking

The Rise of the Chatbots

DOI

July 2023 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.