Why It’s Time to Sunset the Turing Test

In 1950, the British computer pioneer Alan Mathison Turing published a paper in the journal Mind outlining a number of ways in which emerging machine intelligences could be assessed, all of them centered on the broad notion that if an AI can convincingly imitate a human intellect, then it can be regarded as intelligent.

Ever since, the so-called Turing Test embodied in that paper, Computing Machinery And Intelligence, has served as a yardstick by which the smarts of everything from the earliest hard-coded medical expert systems to today’s hallucinating large language models (LLMs), have been measured.

In London in October, at a one-day meeting to observe the 75-year anniversary of the publication of Turing’s landmark paper, a gathering of computer scientists, cognitive psychologists, mathematicians, philosophers, historians, and even tech-savvy musicians agreed that the Turing Test has had its day, and that it is time to retire it as an unhelpful distraction. Why? People are too easily duped into thinking an AI system is intelligent, leaving the test meaningless.

“Extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.”
–Alan Kay, Turing Award Laureate
Credit: Debbie Rowe / Web Science Institute

“To have people talk about your paper 75 years after you wrote it is pretty damned cool, and to have that meme running for so long means there must be something in it,” said computer scientist Dame Wendy Hall, director of the Web Science Institute at the University of Southampton, U.K., as she kicked off the meeting at London’s Royal Society. “But one of the things I think Turing got wrong was that he overestimated the intelligence of human beings, because it’s incredibly easy to fool people.”

Data scientist Yannis Ioannidis, president of ACM, concurred, telling delegates that in his experience researchers “don’t worry so much about very advanced artificial intelligence, but about the very low human intelligence” of some users—who, whatever the evidence, simply want to believe the output of AI systems is more truthful than erroneous.

Delusional thinking on the part of AI users is nothing new, said computer science pioneer Alan Kay, who conceptualized the Dynabook, a precursor to today’s GUI-based personal computers, laptops, and tablets. In a keynote, Kay related the story of his late friend, MIT researcher Joe Weizenbaum, who between 1964 and 1967 ran experiments with an early psychotherapist chatbot called ELIZA, which was coded to present canned natural language responses to key words in a patient’s typed inputs. (A public version is available online.)

Turing “overestimated the intelligence of human beings, because it’s incredibly easy to fool people.”
–Dame Wendy Hall, Web Science Institute, University of Southampton
Credit: Debbie Rowe / Web Science Institute

Its predictable responses meant ELIZA failed the Turing Test, but some people believed it was intelligently psychoanalyzing them, and asked to spend time alone with the machine for private mental health consultations, according to Douglas Hofstadter in Gödel, Escher, Bach: an Eternal Gold Braid.

“I knew Joe Weizenbaum, and one of the things he said was that he had not realized that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people. He was shocked,” Kay said.

Peter Millican, a professor of philosophy at the U.K.’s University of Oxford, agreed that the user experience with ELIZA was a game changer. “It showed that it has turned out to be much easier than people thought it would be to deceive people. The Turing Test is somewhat undermined by that,” he said.

That “ELIZA effect” has not gone away. Today’s chatbots, like Anthropic’s Claude, OpenAI’s ChatGPT, and Google’s Gemini, are very much its heirs and many are treated as reliable sources, even friends.

“LLMs are deeply flawed imitators that are preying on the ELIZA effect.”
–Gary Marcus, cognitive scientist, entrepreneur
Credit: Debbie Rowe / Web Science Institute

“Now we have something else that is also preying on fooling people: the LLM. The chatbot mania we’re experiencing right now represents a profoundly dangerous echo of the ELIZA effect,” said cognitive scientist Gary Marcus, an entrepreneur and critic of companies like OpenAI, whose CEO Sam Altman has claimed that spending trillions of dollars to scale up deep learning foundation models will lead to the emergence of an artificial general intelligence (AGI).

“We as a society are placing truly massive bets around the premise that AGI is close, in no small part, because LLMs, pretty arguably, do pass the Turing Test,” Marcus said. “LLMs can fool people into thinking they’re people. People will talk to those machines, tell them their most private details and so forth, because they have a kind of relationship with those machines. But in reality, LLMs are deeply flawed imitators that are preying on the ELIZA effect.”

Sarah Dillon, a professor at Cambridge University, agreed. “LLMs show that the Turing Test is irrelevant, because they’re just sequence prediction machines processing vast amounts of language and telling you the most obvious thing that’s going to come next.”

The problem, said Dillon, is that Alan Turing never expected his Mind paper to be taken anywhere near so seriously. As evidence she cited the views of Turing’s Ph.D. student, Robin Gandy, who said in an essay (recounted in turn by Millican) that the Mind paper was written quickly and with relish by Turing, who proudly read the punchier excerpts out loud as he wrote it, and considered it a piece of propaganda designed to get the emerging computing sector taken more seriously, rather than a learned test for use in perpetuity.

Not only does the Mind paper not reference a Turing Test, Dillon pointed out, but in fact includes seven different imitation games in which an interrogator has to guess, variously, who is a woman, who is a man, and who is a universal computing machine, with each possibly attempting to deceive the interrogator, or not. All that passing the test proves, Dillon said, is that a machine can imitate some of the intellectual operations of a human.

“That’s it; it doesn’t prove anything else,” she said.

Yannis Ionannidis — Researchers “don’t worry so much about very advanced artificial intelligence, but about the very low human intelligence.”
–Yannis Ioannidis, president of ACM
Credit: Debbie Rowe / Web Science Institute

The upshot of the Turing Test leading people to believe an AI may be intelligent presents societal dangers and safety threats across the board, the meeting was told, from law firms drawing up legal briefs using hallucinating LLMs, to teens being encouraged to commit suicide or to trust deep learning models to write their text messages, to a striking fact relayed by Kaitlyn Regehr, Associate Professor of Digital Humanities at University College London, and author of Smartphone Nation, that 81% of children over the age of three are exposed to algorithmic YouTube feeds by digitally illiterate parents.

Marcus related the tale of an autonomous car striking a jet plane on an airport apron, as its training data did not include non-road vehicle avoidance.

In the face of such threats, one idea for a global AI safety regime came from musician and human rights campaigner Peter Gabriel, a regular visitor to Xerox PARC back in the day, said Kay, who noted that the UN’s International Civil Aviation Organization (ICAO) successfully instantiated a global safety regime for the airline industry. “ICAO actually managed to get agreement from 190 countries. Maybe the same thing could be achieved for AI,” he said.

How will governments “create some international regulations to provide safety structures when so many countries are afraid of missing out on the spoils of AI?,” Gabriel asked.

Marcus addressed this point when asked how politicians can learn about the risks of allowing companies to reap the spoils of runaway machine intelligence. “It’s hard to educate people when their wallet is in the way,” Marcus said, especially at a time when “tech CEOs are absolutely convinced that they are gods.”

What is the value of the Turing Test after 75 years? Kay had an interesting response.

“How about a half-life for papers? Many of us have written papers in the past that have survived long past their usefulness, and yet they still are around. This Mind paper is one that really needed a fairly short half life.”

Paul Marks is a technology, aviation, and spaceflight journalist, writer, and editor based in London, U.K.

Why It’s Time to Sunset the Turing Test

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Why It’s Time to Sunset the Turing Test

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.