Artificial Intelligence Still Can’t Form Concepts

Melanie Mitchell. — "If the goal is to create an AI system that has humanlike abstraction abilities, then it does not make sense to have to train it on tens of thousands of examples," Mitchell said. "The essence of abstraction and analogy is few-shot learning."

Machine translation, automatic speech recognition, and automatic text generation demonstrate the enormous progress artificial intelligence (AI) has made in processing human language. On the other hand, AI has made astonishingly little progress in forming concepts and abstractions. That is the research area of Melanie Mitchell, professor of complexity at the Santa Fe Institute and author of the book Artificial Intelligence – A Guide for Thinking Humans.

Mitchell argues forming concepts is absolutely crucial to unlock the full potential of AI. "A concept is a fundamental unit of understanding," Mitchell said during an interview at the 2023 American Association for the Advancement of Science (AAAS) Annual Meeting in Washington, D.C. "Neural networks can look at a picture and tell whether it contains a dog, a cat, or a car, but they do not have a rich understanding of any of those categories.

"Take the concept of a bridge. Humans can extend the notion of a bridge to abstract levels. We can talk about a bridge between people or bridging the gender gap. We can instantly understand what these expressions mean because we have a rich mental model of what a bridge can be."

Mitchell first started working on concepts and abstraction in 1984, as a Ph.D. student of Douglas Hofstadter. Inspired by Hofstadter's famous book Gödel, Escher, Bach: An Essential Golden Braid, Mitchell decided to contact him, and that was the start of their cooperation. Together they created an AI system called Copycat, which can solve simple letter-string analogy problems. For example, given the letter-strings ABC and PQR, which string follows after AABBCC? Copycat could then find the answer PPQQRR by using a mental model that included symbolic, sub-symbolic, and probabilistic elements.

Copycat had huge limitations: its architecture was ad hoc, it was unclear how general the architecture was, and it was unclear how to form new concepts beyond what was given in its prior conceptual repertoire. In the roughly three decades that have passed since Copycat was released, there have been various efforts to create AI systems that form abstractions and concepts, but the problem fundamentally is still unsolved.

In recent years, some scientists have shown that deep learning systems can perform better than the average human (see for example https://arxiv.org/abs/2012.01944) on Raven's Progressive Matrices, a widely used non-verbal test of general human intelligence and abstract reasoning (for example, given a set of visual geometric designs, the subject has to identify a missing piece at the end). However, Mitchell found that deep learning systems did not accomplish this by learning humanlike concepts, but by finding shortcuts. Furthermore, they needed a large corpus of training examples.

"If the goal is to create an AI system that has humanlike abstraction abilities, then it does not make sense to have to train it on tens of thousands of examples," Mitchell said. "The essence of abstraction and analogy is few-shot learning."

What about large language models, like GPT? Don't they have the capability to form humanlike concepts and abstractions? "Interestingly, they can make analogies to some extent," said Mitchell. "I have tried some letter-string problems in GPT-3, and in some cases it could solve them. It learned, for example, the concept of successorship. Not perfect, not robust, but I found it still surprising that it can do this. Therefore, I don't agree that these systems are only 'stochastic parrots', as some scientists have called them. I have seen evidence of GPT building simple internal models of situations."

Recently, Mitchell became very interested in the Abstraction and Reasoning Corpus (ARC), a benchmark created in 2019 by Google researcher François Chollet to measure intelligence in AI systems. The ARC consists of a set of visual reasoning tasks in the form of grid-based puzzles that are more difficult than Raven's Progressive Matrices. In the benchmark, only a few examples are provided for each visual analogy task so traditional machine learning techniques, which require lots of data, don't work.

"In the most recent ARC competition, the best computer program only got 20% right," said Mitchell, "and that program basically used brute force search. When I talk about these problems, people often respond: 'Well, GPT-type of models will be able to do this very soon'. I am pretty convinced that they will not. In some sense I agree with scientists who say that large neural networks can do everything, because our brains are also large neural networks, but I believe that we have to create neural networks that somehow deal with symbol-like entities, and we don't know how to do that yet."

Mitchell thinks we need more insights from neuroscience and cognitive science on how the brain deals with symbols. "Babies and children can do so many things that our AI systems cannot do. We have a description of the brain in terms of neurons, we have a description in terms of concepts, but we do not have an intermediate description. The same is true for large language models. One of my future research directions is to figure out how these models do what they do and to develop tools for probing them."

Although AI systems are still bad at forming concepts, and although Mitchell is convinced of the importance of building models that are founded on core concepts about objects, space and geometry, numbers and numerosity, agents and actions, she is baffled by the power of current statistical models.

"Before Deep Blue beat Kasparov, people honestly believed that playing chess requires general intelligence; now we know that it does not. Large language models demonstrate that language understanding is not needed to generate humanlike text. These examples show that we don't understand our own intelligence very well. The good thing is that building AI systems refines our understanding of what intelligence is."

Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.