A team, including scholars at Yale, the University of Oxford, and the University of Cambridge, have come up with what they call a lie detector that can identify falsehoods in the output of large language models simply by asking a series of unrelated yes or no questions after each round of dialogue, without any access to the guts of the program.
Their lie detector, which is described in "How to Catch An AI Liar," is able to work with large language models for which it was not initially developed, with novel prompts it had never encountered, and with topic databases it had never faced such as mathematics questions.
"Despite its simplicity, this lie detector is highly accurate and surprisingly general," the authors write.
The work relies on the notion put forward in a 2021 work by Owain Evans and researchers at the Future of Humanity Institute at Oxford that described AI lies as "falsehoods that are actively selected for."
View Full Article
No entries found