Q&A: A Lifelong Learner

ACM's 2010 A.M. Turing Award recipient Leslie Valiant

Leslie Valiant, the T. Jefferson Coolidge Professor of Computer Science and Applied Mathematics in Harvard University’s School of Engineering and Applied Sciences, is the recipient of the 2010 ACM A.M. Turing Award. Valiant is best known for the "probably approximately correct" model of machine learning, which provided an essential framework for studying the learning process and led to major advances in areas from artificial intelligence to computer vision. (A profile of Valiant, "Beauty and Elegance," appears on p. 14.)

Let’s talk about your background. What drew you to computer science?

Growing up I was intrigued by both physics and mathematics, and perhaps most of all by the connection between the two. At Cambridge I studied mathematics. I had various tutors, both graduate students and faculty, and when I could I tried to find out what they were working on. Computer science was a way of trying a different diretion. My Ph.D. thesis at Warwick was on questions of computability suggested by my advisor, Mike Paterson. In retrospect, this was a good area to cut one’s teeth on as it was so quintessentially about computation. Just how rich the field is became clear to me only much later. I consider it my very good fortune to have found my way into it.

How did you first get interested in machine learning?

After my Ph.D., I worked in algorithms and complexity theory. I observed work in artificial intelligence only from a distance. I had accepted the argument that humans provided an existence proof that intelligent behavior is possible by computational mechanisms. However, I recall being puzzled that so many people could be working with such confidence in an area where the concepts seemed so ill-defined. Several topics were studied at that time within AI, such as reasoning, expert systems, planning, natural language, robotics, vision, as well as learning. But which of these was the most basic? Which one had some fundamental definition? I convinced myself somehow that the answer must be learning, but I am not sure how I got there.

Describe how you came to develop the "probably approximately correct," or PAC, model of machine learning.

The main problem, as I saw it, was to specify what needs to be achieved if a learning algorithm is to be declared effective. It was clear that this definition would need to be quantitative if it was to mean anything, since with infinite data and computation there is no real problem. Existing statistical notions, such as Bayesian inference, did not seem enough because they did not distinguish quantitatively between what is learnable in practice and what is not. The final formulation emerged quite rapidly. The state of complexity theory at the time made this possible for someone asking the right questions. The main point of PAC learning is that while it has a statistical component, because uncertainty needs to be allowed for, it is the computational aspect that gives it substance, since without that essentially everything would be classified as learnable.

Can you tell me about some of the computational models you developed in conjunction with PAC?

I’ll describe the one model that led to consequences far greater than I could have originally imagined. In the course of work that sought to explain why certain easy-looking learning problems, such as learning regular languages, had proved so intractable, Michael Kearns and I formulated the notion of weak learning. This was essentially a gambler’s version of PAC learning, where it was sufficient to make predictions with accuracy just better than random guessing, rather than with near certainty. The question of whether this was as powerful as PAC learning became a natural open problem. Successive contributions by Kearns, Schapire, and Freund not only resolved this question in the very unexpected positive direction, but yielded the boosting technique, perhaps the most powerful general learning method now known. This technique, somewhat magically, has been used to improve the performance of almost any basic learning algorithm on real datasets. This is an excellent example of how a theoretical question about abstract models can get at the heart of an important phenomenon and yield rich practical dividends.

You made a number of important contributions to parallel computing in the 1980s. Can you tell me about your more recent work in that arena?

The root problem with parallel machines has not been that any one is inherently bad, but more that it is difficult to make sense of them as a totality. The most striking fact about sequential machines is that they are all the same—they emulate the von Neumann abstraction. This sameness has been vital to the growth of computing since it has enabled us to write portable software for this abstraction and make machines to emulate it. Thus the von Neumann model has served as the vital "bridging model" between the software and hardware.

I have been long interested in the question of whether a similarly effective bridging model exists for parallel machines and what that might be. Parallelism has now arrived with multiple cores in our laptops. My view is that the irresistible driving forces that have brought this about have been the imperatives of physics. It follows, I think, that we have to look to the driving physical limitations to see what is most fundamental about parallel computing. Computation, communication, synchronization, and memory all need various resources, which in turn are limited by the physical arrangements of the devices. The sought-after bridging model has to account for these costs. We have been spoiled over the last 60-odd years by having an abstraction as simple as the von Neumann model suffice. Life will become a little more complicated now. The absence of an accepted bridging model is evident even at the theoretical level of algorithms.

My most recent work addresses these issues via a particular proposed bridging model, called the Multi-BSP, which tries to capture the physical limitations of machines as simply as possible. It uses barrier synchronization in a hierarchical manner and idealizes a machine as a set of numerical parameters that specify a point in the space of possible machines. Mention of the many architectural details of current machines that are not inevitably suggested by physics is deliberately omitted.

Let’s talk about your most recent work in computational neuroscience. Can you explain the "neuroidal model" you developed in your book Circuits of the Mind?

The neuroidal model is designed to describe how basic computational tasks are carried out in the brain at the level of neurons. We do not know what changes the brain undergoes when it memorizes a new word or makes a new association. We need some language for describing the alternative algorithms that a network of neurons may be implementing. Memorizing a new word is easy enough for a computer, but it is still mysterious how the brain does it. Devising algorithms that do not obviously contradict biology is not easy, because biology appears very constrained. Each neuron is connected to only a small fraction of the others, and its influence on each of those is believed to be weak on the average.

You also developed a formal system called robust logics that seeks to reconcile what you’ve described as "the apparent logical nature of reasoning and the statistical nature of learning."

Robust logic is aimed at tasks which we do not understand how to do on any computational device, let alone the brain. The tasks in question are those of reasoning using information that is learned from experience and therefore uncertain. Humans manage to make effective decisions even in an uncertain world and even with only partial knowledge. This skill is close to the heart of intelligence, I believe, and understanding it better raises some fundamental scientific questions.

How does computation in the brain differ from computation in a computer?

At the level of neurons the organization is clearly very different—there appears to be no global addressing mechanism for a start. However, there is a good chance, I believe, that at the next level up from neurons, some basic primitives are realized which we will be able to describe in more familiar computational terms. This is the enterprise that the neuroidal model is designed for.

The brain has some general computational character, which we ought to be able to decipher in the foreseeable future. It also contains more specific information acquired through evolution and learning that may be more difficult to uncover. Evolution and learning are subtle adaptive processes, and to emulate their results, we have to understand both the algorithms they use, as well as the environments in which the evolution and learning took place.

Q&A: A Lifelong Learner

DOI

June 2011 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Q&A: A Lifelong Learner

DOI

June 2011 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.