Artificial Intelligence Poised to Ride a New Wave

Chinese professional Go player Ke Jie preparing to make a move during the second game of a match against Google’s AlphaGo in May 2017.

Artificial intelligence (AI), once described as a technology with permanent potential, has come of age in the past decade. Propelled by massively parallel computer systems, huge datasets, and better algorithms, AI has brought a number of important applications, such as image- and speech-recognition and autonomous vehicle navigation, to near-human levels of performance.

Now, AI experts say, a wave of even newer technology may enable systems to understand and react to the world in ways that traditionally have been seen as the sole province of human beings. These technologies include algorithms that model human intuition and make predictions in the face of incomplete knowledge, systems that learn without being pre-trained with labeled data, systems that transfer knowledge gained in one domain to another, hybrid systems that combine two or more approaches, and more powerful and energy-efficient hardware specialized for AI.

The term “artificial intelligence” was coined by John McCarthy, a math professor at Dartmouth, in 1955 when he—along with Marvin Minsky of the Massachusetts Institute of Technology (MIT), Claude Shannon of Bell Laboratories, and Nathaniel Rochester of IBM—said they would study “the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.” McCarthy wanted to “find out how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.” That is not a bad description of the goals of AI today.

The path of AI over the ensuing 62 years has been anything but smooth. There were early successes in areas such as mathematical problem solving, natural language, and robotics. Some of the ideas that are central to modern AI, such as those behind neural networks, made conceptual advances early on. Yet funding for AI research, mostly from the U.S. government, ebbed and flowed, and when it ebbed the private sector did not take up the slack. Enthusiasm for AI waned when the grandiose promises by researchers failed to be met.

AI Comes of Age

A turning point for AI, and for the public’s perception of the field, occurred in 1997 when IBM’s Deep Blue super-computer beat world champion Garry Kasparov at chess. Deep Blue could evaluate 200 million chess positions per second, an astonishing display of computer power at the time. Indeed, advances in AI over the past 10 years owe more to Moore’s Law than to any other factor. Artificial neural networks, which are patterned after the arrangement of neurons in the brain and the connections between them, are at the heart of much of modern AI, and to do a good job on hard problems they require teraflops of processing power and terabytes of training data.

Michael Witbrock, a manager of Cognitive Systems at IBM Research, says about two-thirds of the advances in AI over the past 10 years have come from increases in computer processing power, much of that from the use of graphics processing units (GPUs). About 20% of the gain came from bigger datasets, and 10% from better algorithms, he estimates. That’s changing, he says; “Advances in the fundamental algorithms for learning are now the main driver of progress.”

Witbrock points, for example, to a technique called reinforcement learning, “systems which can build models that hypothesize about what the world might look like.” In reinforcement learning, systems are not trained in advance with huge amounts of labeled data—which is called “supervised learning”—but are simply rewarded when they get the right answer. In a sense, they are self-trained. This psychology-based approach more nearly represents the way humans learn.

Reinforcement learning is useful in games such as chess or poker, which have clearly defined objectives for which system developers don’t know in advance how to achieve the desired outcome. The machine plays millions of games against itself, and gradually adapts its behavior to strategies that tend to win.

Animals and Humans Do It

According to Yann LeCun, director of AI research at Facebook, supervised learning is the dominant AI method today, while reinforcement learning occupies a niche mostly in games.

A third type of learning, called predictive learning, is emerging. It is unsupervised, meaning it does not need to be trained with millions of human-labeled examples. “This is the main form of learning that animals and humans use,” LeCun says. “It allows us to learn by observation.” Babies easily learn that when one object moves in front of another, the hidden object is still there, and that when an object is not supported, it falls. No one teaches the baby those things by giving it rules or labeled examples.

“We don’t yet quite know how to reproduce this in machines, and that’s a shame,” LeCun says. “Until we learn how to do this, we will not go to the next level in AI.”

Meanwhile, predictive learning has become one of the hottest topics in AI research today.

Predictive learning is alluring because it represents a step toward human ways of thinking, but also because it reduces or eliminates the expensive curation of big databases of labeled data. “You just show the machine thousands of hours of video, and it will eventually figure out how the world works and how it’s organized,” LeCun says.

Facebook and others are developing ways to predict the occurrence of a word in a block of text based on the known words around it. A popular technique called Word2Vec, proposed by Tomas Mikolov (then at Google and now at Facebook AI Research), learns to represent words and text in the form of vectors, or lists of numbers, that contain the characteristics of a word, such as syntactic role and meaning. Such unsupervised word-embedding methods are widely used in natural language understanding and language translation applications.

Another promising new approach to unsupervised predictive learning lies in something called generative adversarial networks (GAN), in which two neural nets train themselves by competing in a zero-sum game to produce photorealistic images. Originally proposed by Ph.D. candidate Ian Goodfellow (now at OpenAI), GANs are able to learn reusable image feature representations from large sets of unlabeled data. One experimental application can predict the next frame in a video, given some of the frames that precede it. LeCun calls GANs “the most interesting idea in machine learning in the last 10 years.”

Unsupervised predictive learning systems will be able to show some of the “common sense” that comes so easily to humans, says AI pioneer Geoffrey Hinton, an engineering fellow at Google. For example, in the sentence, “The trophy would not fit in the suitcase because it was too small,” humans know immediately that “it” refers to the suitcase, not the trophy. It’s important to know that when translating the sentence into French, because the two nouns have different genders in French. “Neural nets can’t handle that very well at present,” Hinton says, “but they will one day when they are big enough to hold a lot of real-world knowledge.”

Last year AlphaGo, developed by Google DeepMind, surprised the world when it became the first AI system to beat a Go professional in a five-game match. Chess programs can evaluate possible moves far ahead in the game, but Go programs are unable to do that because the number of possible moves quickly becomes astronomical. “AlphaGo depends on neural nets having the ability to model intuitive reasoning based on patterns, as opposed to logical reasoning,” Hinton says. “That’s what Go masters are good at. “

AlphaGo is a hybrid of supervised learning and reinforcement learning. It was initially trained on a database of 30 million moves from historical games; then it was improved, via reinforcement learning, by playing many games against itself. AlphaGo uses one neural net to decide which moves are worth considering in any given position, and another neural net to evaluate the position it would arrive at after a particular sequence of moves. Hinton estimates that AlphaGo uses 10,000 times as much compute power as Deep Blue did 20 years ago.

AI as Pal

Manuela Veloso, head of the Machine Learning Department at Carnegie-Mellon University, predicts that for many years to come AI systems will collaborate with humans, rather than replacing them. “There will be this symbiosis between humans and machines, in the same sense that humans need other humans,” she says.

Asked where AI systems are weak today, Veloso says they should be more transparent. “They need to explain themselves: why did they do this, why did they do that, why did they detect this, why did they recommend that? Accountability is absolutely necessary.”

IBM’s Witbrock echoes the call for humanism in AI. He has three definitions of AI, two of them technical; the third is, “It’s an embodiment of a human dream of having a patient, helpful, collaborative kind of companion.”

Just such a companion is in the dreams of Facebook engineers. They foresee a day when every Facebook user has his or her own personalized virtual assistant. “The amount of computation required would be enormous,” LeCun cautions. “We are counting on progress in hardware to be able to deploy these things.” AI developers say progress will come from greater use of specialized processors like GPUs, as well as from entire systems designed specifically for running neural networks. Such systems will be much more powerful and more energy-efficient than those in use today.

Meanwhile, some approaches to AI are so new that even their developers don’t know exactly how they work. Google Translate learned to translate among 103 languages by being trained separately in individual language pairs—English to French, German to Japanese, and so on. But all of these unique pairwise combinations entail large development and processing costs, so Google developed a “transfer-learning” system that can drastically cut costs by applying the knowledge learned in one application to another. Remarkably, it has shown it can translate between pairs of languages it has never encountered before.

“There will be this symbiosis between humans and machines, in the same sense that humans need other humans.”

Called “zero-shot” translation, the Google system extracts the inherent “meaning” of the input language (which does not have to be specified) and uses that to translate into another language without reference to stored pairs of translated words and phrases. “It suggests there’s an ‘interlingua’ the neural net is learning; a high-level representation of the meaning of sentences, beyond any particular language,” Hinton says.

Yet work remains to be done to validate this hypothesis, to understand the system, and to make it work better. “It’s still not quite as good as a really good human translator,” Hinton says.

Further Reading

Mathieu, M., Couprie, C., and LeCun, Y.
Deep Multi-Scale Video Prediction Beyond Mean Square Error ICLR 2016 conference paper, Version 6, Feb. 26, 2016 https://arxiv.org/abs/1511.05440

Radford, A., Metz, L., and Chintala, S.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Version 2, Jan. 7, 2016 https://arxiv.org/abs/1511.06434

LeCunn, Y.
Unsupervised Learning: the Next Frontier in AI Nov. 28, 2016 https://www.aices.rwth-aachen.de/charlemagne-distinguished-lecture-series

LeCun, Y., Bengio, Y., and Hinton, G.
Deep Learning Nature, v.521, p.436–444, May 28, 2016 http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.html

Johnson, M., et al
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation Nov. 14, 2016 https://arxiv.org/abs/1611.04558