News
Artificial Intelligence and Machine Learning News

Teaching Computers with Illusions

Exploring the ways human vision can be fooled is helping developers of machine vision.
Posted
  1. Introduction
  2. Further Reading
  3. Author
  4. Figures
2014 ImageNet competition entry
Google's winning entry in the 2014 ImageNet competition helps computers distinguish between individual objects.

Earlier this year, debate over the color of a dress set the Internet ablaze with discussion over why people were viewing the same exact image, yet seeing it differently. Now throw computers in the mix; unlike humans, who see certain images differently, machines register and recognize visual images on another level altogether. What humans see is determined by biology, vision experts say, while computers determine vision from physical measurements.

While the two fields can inform one another, researchers say more work needs to be done to teach computers how to improve their image recognition.

Those efforts are important because we want machines such as robots to see the world the way we see it. “It’s practically beneficial,” says Jeff Clune, assistant professor and computer science director of the Evolving Artificial Intelligence Lab at the University of Wyoming. “We want robots to help us. We want to be able to tell it to ‘go into the kitchen and grab my scissors and bring them back to me,’ ” so a robot has to be taught what a kitchen looks like, what scissors are, and how to get there,” he says. “It has to be able to see the world and the objects in it. There are enormous benefits once computers are really good at this.”

Yet no matter how good machines might get at recognizing images, experts say there are two things they are lacking that could trip them up: experience and evolution.

Computers have already gotten pretty good at facial recognition, for example, but they “will never understand the nuances we grasp right away when we see a face and access all the information related to that face,” says Dale Purves, a neurobiology professor at Duke University. People, on the other hand, “have a ton of information based on what that face means to us … and we immediately understand the behavioral implications of a frown or a smile.” Getting to all that, he says, will be a long struggle for machine vision “because machines so far don’t know what’s important for behavioral success in the world and what’s not.”

In contrast, humans have grasped those nuances based on millions of years of evolution, as well as individual experience, Purves notes. “Many people have said in many elegant ways that nothing in biology makes sense, except in the light of evolution. I think that’s exactly right. Machine vision fails to recognize that dictum.”

People are trying to get artificial systems to see the world as it is, “whereas for our brain, the way our nervous system evolved through the ages is not necessarily to see the world as it is—it’s to see the world in a way that has made our survival and our reproduction more likely,” adds Susana Martinez-Conde, a professor and director of the Laboratory of Integrative Neuroscience at the State University of New York (SUNY) Downstate Medical Center.

The human brain makes “a lot of guesstimates,” explains Martinez-Conde, whose work focuses on visual perception, illusions, and the neurological basis for them. “We take limited information from the reality out there and fill in the rest and take shortcuts and arrive at a picture that may not be the perfect match with what’s out there, but it’s good enough.”

One well-known example of an illusion humans and machines register differently is that of rotating snakes (http://bit.ly/1IRuVDb). Martinez-Conde says the image is actually stationary, but appears to move when viewed on paper, “because [of] the way our motion sensitivity circuits in the brain are put together or work in such a way that when you have a certain sequence [it] is interpreted as motion, even though there’s no actual motion in the image.”

The human brain has vision neurons that specialize in detecting motion, and that is what the majority of people will see when they view the image, she says. However, age plays a role in what people see as well.

Because the snake illusion is relatively new, what is still not well understood is why people who are about 40 years old or younger are more likely to see motion, but those who are 50 years and older tend not to see it, Martinez-Conde notes. No one knows yet why the motor system experience changes as people age, she says. “The interesting thing is, the motor visual system deteriorates with age, and [yet] you tend to see more reality than illusion. Seeing motion in the [snake] illusion is a sign your visual system is healthy.”

Machine vision, on the other hand, is based on algorithms that can measure items in the environment and use them in driverless cars and elsewhere, says Purves. Humans do not have access to the same information that machine algorithms depend upon for vision.

“We human beings … have a very deep problem, being that we can’t get at the physical world because we don’t have ways of measuring it with apparatus like laser scanners or radar or spectrophotometers, or other ways to make measurements of what’s physically out there,” he says. Yet, “everyone admits we do better in face recognition and making decisions than machines do, using millions of years of evolutionary information” on a trial-and-error basis.

That does not stop people from trying to get humans and machines closer to seeing the same illusions. Kokichi Sugihara, a mathematician at Meiji University in Tokyo, has been working on a program that will enable computers to perceive depth in 2D drawings. His interest is to “allow a computer, by processing information input, to understand a 3D shape based on a projection drawn with lines, he writes on the university’s website (http://bit.ly/1CAwz7F).

“A computer often fails to reconstruct a 3D shape from a projection drawing and delivers error messages, while humans can do this task very easily,” Sugihara writes. “We visualize a 3D object from a 2D drawing based on the preconceived assumption that is obtained through common sense and visual experience … however, the computer is not influenced by any assumption. The computer examines every possibility in order to reconstruct a 3D object and concludes that it is ‘able to do it.’ “

There are different methods that can be used to “fool” computer algorithms so what systems and humans see is more closely aligned. One way to enhance artificial vision is to further study what our brains see, says Martinez-Conde. “We know, after all, they work well enough and our visual system is pretty sophisticated, so having a deeper understanding of our visual system from a neuroscience perspective can be helpful to improving computer vision.” She adds, however, that our visual system “is by no means perfect, so if we got to a point where computer vision is almost as good, that wouldn’t mean the work is done.”

Humans have used natural selection to incorporate in the neural networks in our brains every conceivable situation in the world with visual input, says Purves. “Once computers do that and evolve, in principal they should be as good as us, but it won’t be in visual measurements; they’re coming at [vision] from a very different way. There’s going to be a limit that will never get them to the level at which human beings operate.”

Yet machines can continue to be improved. “If you want to make a really good machine, evolve it” through trial-and-error experiences and by compiling those experiences in their artificial neural circuitry, says Purves. “There’s no reason that can’t be done; you just have to feed them the information that we used to evolve a visual system.” He estimates that in 20 years’ time, machine vision could be as good as human vision, once vision scientists are able to figure out how to evolve an artificial neural network to “survive” in environments “that are as complicated as the world we live in.”

“Humans and computers see things very differently, and there is a lot more for us to do to figure out how these networks work,” agrees Clune. One troubling issue he addressed in a paper is that if a computer identifies a random, static image as, say, a motorcycle, with 100% certainty, it creates a security loophole, he says. “Any time I could get a computer to believe an image is one thing and it’s something else, there are opportunities to exploit that to someone’s own gain.”

For example, a pornography company may produce images that appear to Google’s image filters like rabbits, but which contain advertisements with nudity; or, a terrorist group could get past artificial intelligence filters searching for text embedded in images by making those images appear to the AI as pictures of flowers, he explains. Biometric security features are also potentially vulnerable; “a terrorist could wear a mask of transparent, plastic film that has static printed on it that is not visible to humans, but could trick a facial recognition system into seeing an authorized security agent instead of recognizing a known terrorist,” Clune says.

While some believe one system could be fooled by certain images whereas another system trained to recognize them would not be, surprisingly, that is not always the case, he says. “I can produce images with one network and show them to a completely different network, and a surprising number of times the other network is fooled by the same images. So there really are some deep similarities in how these computer networks are seeing the world.”

There is no good way yet to prevent networks from being fooled by nefarious means, Clune says, but when the technology improves, “security holes will be closed and they’ll become smarter and more effective,” he says. “They’ll also do a better job when they encounter substantially different situations than they were trained on.”

Today robots can be trained on one type of image from the natural world, but if they encounter images that are too different, they break down and behave in strange and bizarre ways, Clune says. “They need to be able to see the world and know what they’re looking at.”

Back to Top

Further Reading

Purves, D. and Lotto, R. B.
Why We See What We Do: An Empirical Theory of Vision. 2011 ISBN-10: 0878935967

Macknik, S.L. and Martinez-Conde, S.
Sleights of Mind: What the Neuroscience of Magic Reveals About Our Everyday Deceptions. 2010 Henry Holt and Company, LLC. ISBN: 978-0-8050-9281-3

Sugihara, K. (1986).
Interpretation of Line Drawing. MIT Press, Cambridge. http://www.evolvingai.org/publications

Nguyen, A., Yosinski, J., and Clune, J. (2015)
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. In Computer Vision and Pattern Recognition (CVPR ’15), IEEE, 2015.

Back to Top

Back to Top

Figures

UF1 Figure. Google’s winning entry in the 2014 ImageNet competition helps computers distinguish between individual objects.

UF2 Figure. The rotating snakes illusion, as presented by Ritsumeikan University professor Akiyoshi Kitaoka.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More