The ongoing revolution in artificial intelligence (AI)—in image recognition, natural language processing and translation, and much more—has been driven by neural networks, specifically many-layer versions known as deep learning. These systems have well-known weaknesses, but their capability continues to grow, even as they demand ever more data and energy. At the same time, other critical applications need much more than just powerful pattern recognition, and deep learning does not provide the sorts of performance guarantees that are customary in computer science.
To address these issues, some researchers favor combining neural networks with older tools for artificial intelligence. In particular, neurosymbolic AI incorporates the long-studied symbolic representation of objects and their relationships. A combination could be assembled in many different ways, but so far, no single vision is dominant.
The complementary capabilities of such systems are frequently likened to psychologist Daniel Kahneman's human "System 1" which, like neural networks, makes rapid, heuristic decisions, and the more rigorous and methodical "System 2." "The field is growing really quickly, and there's a lot of excitement," said Swarat Chaudhuri of the University of Texas at Austin. Even though "Neural networks are going to become ubiquitous, even more than they are today," he said, "not all of computer science is going to become replaced by deep learning."
In the early years of artificial intelligence, researchers had high hopes for symbolic rules, such as simple if-then rules and higher-order logical statements. Although some experts, such as Doug Lenat at Cycorp, still hold hopes for this strategy to impart common sense to AI, the collection of rules needed is widely regarded as Unpractically large. "If you try to encode all human knowledge manually, we know that's not possible. That has been tried and failed," said Asim Munawar, a program director of neurosymbolic AI at IBM.
Neural networks also fell short of their aspirations in the 1980s and '90s, and artificial intelligence entered a long "winter" of reduced interest and funding. This situation changed a decade ago, however, largely due to the availability of enormous datasets for training, and massive computer power. Recent architectural innovations, notably attention and transformers, have driven further advances, such as the uncannily plausible text generation by OpenAI's large language model, GPT-3.
Deep learning does surprisingly well at generalizing, for reasons that are only partly understood. Despite impressive successes on average, however, these systems still make some odd errors when presented with novel examples that do not fit patterns they infer from the training data. Errors also can be created using maliciously altered data, sometimes in ways essentially imperceptible to people.
In addition, racial, gender, and other biases in the training data can be unintentionally enshrined by neural networks. Thus, for ethical and safety reasons, users often expect an explanation of how the networks came to a conclusion in medical, financial, legal, and military applications.
In spite of widespread concerns, these problems are not actually "limitations of deep learning systems," said Yann LeCun, chief AI scientist and a vice president at Meta, of the widely used supervised learning paradigm. LeCun, who shared the 2018 ACM A.M. Turing Award with fellow deep learning pioneers Geoffrey Hinton and Yoshua Bengio, believes that if users adopt "self-supervised learning, things that are not trained for a given task but are trained generically, a lot of those problems will essentially disappear." (LeCun regards explainability as a "non-issue.")
Still, LeCun said the training algorithm for neural networks "basically requires all the modules in the system to be differentiable" (in the calculus sense) so that the output errors can be back-propagated to update earlier parameters. "Do we have to have a specific mechanism for symbol manipulation in those networks? Absolutely not! I don't believe in this at all," he said. Humans "don't have discrete symbols. We have patterns of activities in neurons."
Nonetheless, "I do believe that we should find architectures to get deep learning systems to reason," LeCun said, probably as a form of constraint satisfaction. "You can call this neurosymbolic if you want. I don't think of that in those terms, but if you think it is, fine."
Chaudhuri said that although some people think that deep learning can solve everything, "The everything that they imagine is still a pretty narrow set," and will probably not include complex problems with long-range interdependencies. For example, large language models have recently been applied with apparent success to program synthesis. However, "Neural nets have this tendency to make mistakes that superficially look very small," Chaudhuri said, "but for code, it's a huge deal." Similarly, he said, "I don't think even the most optimistic of deep learning optimists think that operating systems would be designed by a language model any time soon—or ever."
Gary Marcus, an entrepreneur and emeritus professor at New York University, is a vocal (and sometimes criticized) critic of deep learning. For example, he co-authored the book Rebooting AI with his former colleague Ernest Davis, has contributed articles to science magazine Nautilus (https://nautil.us/), and had a well-known debate with Bengio in 2019 over the best way forward for AI. Beyond pattern recognition tasks such as image recognition, Marcus has a long list of other goals, such as story comprehension, with which neither neural networks nor other approaches have had much success. "People work on the things that are under the streetlight that they built, but there are lots of things that aren't really under those streetlights," said Marcus. These neglected tasks include "any kind of long-term comprehension, any long-term scientific understanding," he said, adding he does not expect deep learning to solve them on its own.
"I don't think even the most optimistic of deep learning optimists think that operating systems would be designed by a language model any time soon—or ever."
As interest in neurosymbolic AI grows, a wide variety of strategies are being explored, as described in a recent survey of neurosymbolic programming by Chaudhuri et al. of the Massachusetts Institute of Technology. In one strategy, a top layer comprising familiar symbols may help to provide results as more comprehensible explanations. This design also promotes "compositionality," which allows systematic creation of modules from smaller components, with which pure neural networks often struggle.
Alternatively, a neural network also can be built on top of a traditional control circuit that provides its inputs. The circuit acts like a regularizer that limits the search problem for the neural network, although the network still provides a more flexible response. Chaudhuri said his group had achieved "more reliable learning and more sample efficiency" than the control circuit alone.
Munawar said hybrid systems, with communications between distinct neural and symbolic systems, are "not the right approach." IBM, for example, has focused on an architecture it calls logical neural networks, in which, "There is no differentiation between the neural part and the symbolic part," Munawar said, likening this feature to "wave-particle duality." Rather than using preset rules, he said, "You can train it and it can learn new rules that it did not know before."
Symbolic representations also can encode hard constraints on any possible solutions a neural network explores, for example that money removed from one account always appears in another. "I still want my bank transaction to be satisfying that invariant that I historically wanted, even if it has neural nets in it," said Chaudhuri. "Building systems with guarantees, that's a place where we need to explore the neurosymbolic combination."
Adds Luís Lamb of the Federal University of Rio Grande do Sul in Brazil, "We need to develop for machine learning or for deep learning the same rigor that we developed for programming-language semantics." Computer scientists long ago established formal semantics "to understand what actually programs meant in a mathematical way, in a very precise way."
Lamb, who co-authored the book Neural-Symbolic Cognitive Reasoning in 2009 before the deep-learning revolution, emphasized the importance of representing higher-order logic that, for example, includes degrees of evidence for assertions. Similarly, "Temporal logic and temporal reasoning is crucial in computing and has a number of applications in artificial intelligence, a number of applications in software engineering, a number of applications in hardware design, hardware verification, and model checking," Lamb said. "Computers are being applied to every field of human knowledge, and there is a lot of responsibility in using this tool."
New techniques also could help rein in the demands of deep learning for more data and more energy. "They're already consuming all the data that is available to them. There cannot be more data now," said Munawar. "If you look at how exponentially the model size and compute requirements are increasing, it's easy to predict that it's not sustainable."
Symbolic representations are often less demanding, and they could make powerful tools more widely available. "Not all of us can even run large language models, let alone train them," said Chaudhuri. "The goal of doing more with less compute and less data should be something that is front and center for AI."
"We need to develop for machine learning or for deep learning the same rigor that we developed for programming-language semantics."
Integration of symbolic and neural systems "can contribute a lot to development of computer science and, as a consequence, to develop technologies and innovations that can contribute to the betterment of our lives," Lamb said, although it is often hindered by disagreements about what symbols are. "Sometimes the dispute is too much focused on the terminology, too much focused on defending one's approach, and not defending the advancement of science."
Chaudhuri, S., Ellis, K., Polozov, O., Singh, R., Solar-Lezama, A., and Yisong, Y.
Neurosymbolic Programming, Foundations and Trends in Programming Languages, 7: pp 158–243 (2021). http://dx.doi.org/10.1561/2500000049
Garcez, A., and Lamb, L.C.
Neurosymbolic AI: The 3rd Wave, December 2020 https://arxiv.org/abs/2012.05876
IBM Neuro-Symbolic AI Workshop 2022 https://researcher.watson.ibm.com/researcher/view_group.php?id=10897
Deep Learning is Hitting a Wall, Nautilus, March 2022 https://nautil.us/deep-learning-is-hitting-a-wall-14467/
AI DEBATE : Yoshua Bengio | Gary Marcus, Montreal.AI, 2019, https://bit.ly/3kBihmY
©2022 ACM 0001-0782/22/10
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2022 ACM, Inc.
Nice article. Below are my thoughts on what it would take, to get to robust intelligence.
Neural networks being likened to System1 thinking isn't valid - System1 is not just about the 'fast' aspect. We do System1, having started with System2, then internalizing the explicit steps to be able to 'skip' over them, on account of repetition (eg. playing a musical instrument, commuting to work, ordering off DoorDash, and a thousand other things we do by rote). What lets us do this, is the physical experience that the reps provide. NNs don't have this - rapidly labeling something isn't the same as taking cognitive shortcuts.
Comprehension of the world (including scientific understanding) is unlikely to result from pure data, or even by combining it with symbolic reasoning, -if- the symbols result on account of -our- setting them up (eg. via the use of knowledge graphs, or common sense rules, etc). That's because, the system would still lack genuine understanding (knowledge graphs for sure would extend mere labeling, but the combination still has limits, eg. the frame problem of transcending the ANN+KG combination remains).
For a system to become genuinely intelligent, it would need to negotiate the environment directly, and be able to represent it directly, as well.
Displaying 1 comment