Opinion
Artificial Intelligence and Machine Learning Last byte

Q&A: A Sure Thing

Artificial intelligence pioneer Judea Pearl discusses probability, causation, the calculus of intervention, and counterfactuals.
Posted
  1. Article
  2. Author
2011 ACM A.M. Turing Award winner Judea Pearl
"Nothing came out of my Ph.D. dissertation except one piece of analysis that physicists later named 'Pearl vortex,' surpassing my wildest dream for immortality."

ACM A.M. Turing Award winner Judea Pearl took a somewhat unusual path to the field that now celebrates his pioneering achievements in artificial intelligence. Born in Israel in 1936, he grew up in Bnei Brak, a biblical town his grandfather refounded in 1924. After serving in the Israeli army and joining a kibbutz, he decided to study electrical engineering at the Technion-Israel Institute of Technology. (A profile of Pearl, "Game Changer," appears on p. 22.)

He came to the U.S. in 1960, where he did graduate studies in physics and electrical engineering at Rutgers University and the Polytechnic Institute of Brooklyn, and worked at RCA Research Laboratories. In 1970, Pearl joined the University of California, Los Angeles’s (UCLA’s) computer science department and began the research for which he has become famous, creating a representational and computational foundation for reasoning with uncertain information.

In 2002, he founded the Daniel Pearl Foundation in honor of his son, a Wall Street Journal reporter who was kidnapped and murdered by anti-American militants in Pakistan.

You began your career in the 1960s at RCA Research Laboratories. What were you doing there?

I was doing something totally different than what I’m doing now: memory systems. At that time, computers used magnetic-core memories, which were really clumsy. You had to string magnetic donuts with wires, and hundreds of girls strung them one by one. People realized that this was a bottleneck of computers, so everyone was searching for new mechanisms to store information. Some worked on new magnetic configurations, others worked on photochromic memory, and I worked on superconductors, where one could store information in the form of localized circulating vortices.

How did you move to academia?

Eventually, everything was wiped out by semiconductors, and people in this arena had to find new jobs. Nothing came out of my Ph.D. dissertation except one piece of analysis that physicists later named "Pearl vortex," surpassing my wildest dream for immortality. So, looking for another endeavor, I came to the UCLA computer science department and offered my expertise with memory systems. But nothing was done in academia to utilize my experience on the hardware side, so I looked for another challenging arena. And who doesn’t want to emulate him- or herself?

You have said that computer scientists are driven to the field out of a desire to emulate and understand themselves.

It’s also why psychologists go into psychology. But we are luckier because we have a mechanism that can really do it. We have a machine that can emulate, quite powerfully, everything we associate with thought processes.

So you turned your attention to artificial intelligence, and to using probability for the representation and acquisition of knowledge.

On the one hand, I felt very strongly that we think probabilistically. On the other hand, we couldn’t put probability into the computer without exceeding memory capacity. And you ask yourself, If we can’t put it into a computer—if it takes exponential storage—how do we do it in our minds? The answer is that we utilize knowledge about irrelevance. We decompose problems into chunks that are only loosely connected.

This is what gets represented, in the model of belief propagation you then developed, through Bayesian networks.

If you have a network of loosely connected components, you can reason probabilistically without encountering exponential complexity. You can represent it parsimoniously and you can update it swiftly, and, moreover, you can update it in a distributed fashion. And that’s very important, because we don’t have a supervisor in our brain telling each neuron when to fire.

Your work on the topic was influenced by the writings of psychologist David Rumelhart, who proposed that, while we read, our brain’s neural modules each perform simple, repetitive tasks, and that these modules use both top-down and bottom-up modes of inference to collaborate with one another.

If you pose these features as an architecture for doing things probabilistically, you ask yourself, When can we do it distributedly and still get the probabilistically correct answer to every question? And that led to a tree architecture and a proof that it converges eventually to the answers that orthodox probability theory dictates. And then came polytrees and the ultimate question of how we can do it when we have a general loopy network. Here I conjectured that the mind simply ignores the loops and allows every processor to act as if it was embedded in a polytree—and this worked miraculously well.

At this point, the practitioners took over, and they were able to do it much better than I. Even the theoreticians did better than I—they proved convergence under various conditions, and essentially I left this area to more talented and motivated researchers.


"I called it do-calculus because it allows you to reduce questions about the effect of interventions to symbolic manipulations."


You left probability to work on causation.

Yes, primarily because it became clear that people encode world knowledge through causal, not probabilistic, relationships, and all those fancy notions of relevance and irrelevance come from causal, not probabilistic, considerations.

Among your best-known accomplishments is the creation of a calculus of intervention that enables us to compute the consequences of various actions.

The idea was to treat actions and observations as distinct symbols situated within the same formal sentence. This allows you to infer the consequences of actions from a combination of data and qualitative knowledge encoded in the form of a causal diagram.

In other words, it is where correlation and causation meet.

Yes, they meet in the calculus. I called it do-calculus because it allows you to reduce questions about the effect of interventions to symbolic manipulations. You want to predict what will happen if you do something based on what you observe. So you express this question in symbolic algebra and you can ask the question "What if I do x?" or "What if I see y?" as well as any other combination of doing and seeing. Then you submit the query to the inference engine and let it grind through until it gets you the right results.

Simulating intervention, by the way, was an idea that was thought of by economists in 1943. Trygve Haavelmo had this idea that economics models are a guide to policy-making, and that you can predict what will happen when the government intervenes and raises taxes or imposes duties by modifying the equations in the model. And that was taken on by other economists, but it didn’t catch, because they had very lousy models of the economy, so they couldn’t demonstrate success. And because they couldn’t demonstrate success, the whole field of economics regressed and became a hotbed for statistical predictions. Economists have betrayed causality. I never expressed it this way before, but in all honesty this is what it boils down to. In computer science, we remain faithful to logic and try to improve our models, while economists compromised on logic to cover up for bad models.

Your work on causality culminated in counterfactuals.

There are three levels of causal relationships. The zero level, which is the level of associations, not causation, deals with the question "What is?" The second level is "What if?" And the third level is "Why?" That’s the counterfactual level. Initially, I thought of counterfactuals as something for philosophers to deal with. Now I see them as just the opposite. They are the building blocks of scientific understanding.

Does your research inform your work at the Daniel Pearl Foundation, especially in conducting interfaith dialogues?

I have an advantage over my dialogue partners in that I’m an atheist, and I understand religious myths are just metaphors, or poetry, for genuine ideas we find difficult to express otherwise. So, yes, you could say I use computer science in my religious dialogues, because I view religion as a communication language. True, it seems futile for people to argue if a person goes to heaven from the East Gate or the West Gate. But, as a computer scientist, you forgive the futility of such debates, because you appreciate the computational role of the gate metaphor.

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More