Sign In

Communications of the ACM

ACM Careers

Researchers Poke Holes in Chatbot Safety Controls

View as: Print Mobile App Share:
user interacts with a smartphone chatbot assistant

Researchers discovered vulnerabilities in the controls set up around AI chatbots.

Credit: Getty Images

The safety measures of leading chatbots can be circumvented to generate nearly unlimited amounts of harmful information, according to a report on adversarial attacks by researchers at Carnegie Mellon University and the Center for AI Safety. The report underscores increasing concern that chatbots could flood the Internet with false and dangerous information.

The researchers found that they could break through the guardrails of open source systems by appending a long suffix of characters onto each English-language prompt fed into the system. The methods they developed could also bypass the guardrails of closed systems, including OpenAI's ChatGPT, Google Bard, and Anthropic's Claude chatbot.

The researchers say there is no known way of preventing all attacks of this kind. "There is no obvious solution," says Zico Kolter, a professor at Carnegie Mellon and an author of the report. "You can create as many of these attacks as you want in a short amount of time."

From The New York Times
View Full Article


No entries found