Guiding Computers, Robots to See and Think

Though Stanford University professor Fei-Fei Li began her career during the most recent artificial intelligence (AI) winter, she’s responsible for one of the insights that helped precipitate its thaw. By creating Image-Net, a hierarchically organized image database with more than 15 million images, she demonstrated the importance of rich datasets in developing algorithms—and launched the competition that eventually brought widespread attention to Geoffrey Hinton, Ilya Sutskever, and Alex Krizhevsky’s work on deep convolutional neural networks. Today Li, who was recently named an ACM Fellow, directs the Stanford Artificial Intelligence Lab and the Stanford Vision and Learning Lab, where she works to build smart algorithms that enable computers and robots to see and think. Here, she talks about computer vision, neuroscience, and bringing more diversity to the field.

Your bachelor’s degree is in physics and your Ph.D. is in electrical engineering. What drew you to computer vision and artificial intelligence (AI)?

When I was an undergrad at Princeton, I had a lot of academic freedom. By sophomore year, I was already fascinated by the writings of physicists from the early 20^th century—people like Schrödinger and Einstein who, in the later part of their careers, all had a lot of curiosity about life and intelligence. Then I did a couple of research projects related to neuroscience and modeling; I was hooked. I decided to pursue a Ph.D. in a combination of cognitive neuroscience and computer vision—we didn’t call it AI at that point.

This was during one of the so-called AI winters, when interest and investment cooled as people realized technologies had failed to live up to their hype.

While I was studying for my Ph.D., it was a very interesting time. Machine learning became a very important tool in computer vision, so I was in the generation of students who got a lot of exposure and training in that subject.

That training helped crystallize an insight that proved pivotal to the field of AI, namely that creating better datasets would help computers make better decisions. This prompted you to build ImageNet, a hierarchically organized image database in which each node of the hierarchy is depicted by hundreds and thousands of images.

In the field of AI, there are a few important problems that everyone works on; we call them `holy grail’ problems. One of them is understanding objects, which is a building block of visual intelligence. Humans are superbly good at recognizing tens of thousands and even millions of objects, and we do it effortlessly on a daily basis. So I was working on this problem when I was a Ph.D. student and in my early years as an assistant professor, along with many other people in the field. During that era, there was a huge effort to design machine learning models that could recognize objects. We also had to find sensible ways to benchmark their performance. And there were some very good datasets, but in general they were relatively small, with only one or two dozen different objects.

When datasets are small, it limits the type of models that can be built, because there’s no way to train algorithms to recognize the variability even of a single object like "cat."

People were making progress in that era, but the field was a little bit stuck, because the algorithms were unsatisfying. So around 2006, my students and I started to think about a different way of approaching the object recognition problem. We were thinking that instead of designing models that over-fit on a small dataset, we would think about very large-scale data, like millions and millions of objects, and that would drive machine learning models in a whole different direction.

So you started working on ImageNet, which seemed crazy at the time.

Our goal was to map out all the nouns in the English language, then collect hundreds of thousands of pictures to depict the variability of each object, like an apple or a German Shepherd. We ended up downloading and sifting through at least a billion pictures or more, and we eventually put together ImageNet though crowd-sourcing. That dataset was 15 million images and 22,000 object categories.

In your research at Stanford’s Vision and Learning Lab, you work closely not just with technologists, but also with neuroscientists. Can you tell me a bit about how that collaboration works?

Fundamentally, AI is a technical field. Its ultimate goal is to enable machine intelligence. But because human intelligence is so closely related to this field, it helps to have a background and collaborators in neuroscience and cognitive science. Take today’s deep learning revolution. The algorithms we use today in neural networks were inspired by classic studies of neuroscience back in the 50s and 60s, when scientists found neurons are layered together and send information in a hierarchical way. In the meantime, cognitive science has always been an essential part of guiding AI’s quest for different kind of tasks. Many computer scientists were inspired to work on object recognition, for example, because of the work cognitive scientists had done.

"Our goal was to map out all the nouns in the English language, then collect … pictures to depict the variability of each object, like an apple or a German Shepherd."

One of your current interdisciplinary collaborations is a neural network that implements curiosity-driven learning.

Human babies learn by exploring the world. We are trying to create algorithms that bear those kinds of features—where computers go where they go out of curiosity rather than being trained on traditional tasks like labeled images.

You have spoken before about the need to think about AI from a humanistic and not just a technical perspective, and you just helped launch Stanford’s Human-Centered AI Initiative (HAI). Can you talk about your goals?

We want to create an institute that works on technologies to enhance human capabilities. In the case of robotics, machines can do things humans cannot. Machines can go to dangerous places. They can dive deeper in water and dismantle explosive devices. Machines also have the kind of precision and strength humans do not. But humans have a lot more stability and understanding, and we have an easier time collaborating with one another. There are a lot of potential scenarios we can imagine in the future where machines assist and augment humans’ work, rather than replacing it.

You’ve also been vocal about the need to include a more diverse set of voices in computer science and AI research.

If we believe machine values represent human values, we need to believe we have fully represented humanity as we develop and deploy our technology. So it’s important to encourage students of diverse backgrounds to participate in the field. It’s also important, at this moment, to recognize the social impact of technology is rising. The stakes are higher than ever, and we also need to invite future business leaders, policymakers, humanists, social scientists of diverse backgrounds to be technologically literate, to interact with the tech world, and to bring that diverse thinking into the process.

Can you tell me about Stanford’s new AI4All program for high school students, which grew out of the earlier Stanford Artificial Intelligence Laboratory’s Outreach Summer Program (SAILORS)?

AI4All aims to increase diversity in the field of artificial intelligence by targeting students from a range of financial and cultural backgrounds. It’s a community we feel very proud of and are very proud to support. One of our earliest SAILORS alumna, a high school student named Amy Jin, continued working in my lab on videos for surgical training. Then, while still in high school, she authored a research paper with my team that was selected by the 2017 Machine Learning for Health Workshop’s Neural Information Processing Systems (NIPS) conference, one of the best-respected events in the field. What’s more, out of 150 papers, she won the award for best paper. We also have students who started robotics labs at their schools and hold girl-centered hackathons. Many of them are focusing on applications that put AI to good social use, from optimizing ambulance deployment to cancer research and cyberbullying.

Guiding Computers, Robots to See and Think

DOI

March 2019 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Guiding Computers, Robots to See and Think

DOI

March 2019 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.