News
Artificial Intelligence and Machine Learning News

It’s All About Image

Image recognition technology is advancing rapidly. Researchers are discovering new ways to tackle the task without enormous datasets.
Posted
  1. Introduction
  2. Picture Perfect
  3. A Sharper Focus
  4. Author
It’s All About Image, illustrative photo

Discovering the secrets of the universe is not a task for the timid and the impatient; there’s a need to peer into the deepest reaches of outer space and try to make sense of distant galaxies, stars, gas clouds, quasars, halos, and black holes. “Understanding how these objects behave and how they interact gives us answers to how the universe was formed and how it works,” says Kevin Schawinski, an astrophysicist and assistant professor in the Institute for Astronomy at ETH Zurich, the Swiss Federal Institute of Technology.

The problem is that traditional tools such as telescopes can see only so far, even with radical advances in optics and the placement of observatories in space, where they are free of the light and dust of Earth. For instance, the Hubble Telescope changed the way astrophysicists and astronomers viewed deep space by delivering far clearer images than previously possible. Of course, in this context, distance and time are inextricably linked. “But the images still do not allow us to see as far back in time as we would like,” Schawinski says. “The farther we can see, the more we can understand about the origins of the universe and how it has evolved.”

Enter computer image recognition, artificial neural networks, and data science; together, they are changing the equation. As huge volumes of data stream in, they are able to find answers to previously unfathomable questions. In recent years, scientists have begun to train neural nets to analyze data from images captured by cameras in telescopes located on Earth and in space. In many cases, the resulting machine-based algorithms can sharpen blurs and identify distant objects better than humans can.

“Data science and big data are revolutionizing many areas of astrophysics,” says François Lanusse, a post-doctoral researcher in the McWilliams Center for Cosmology at Carnegie Mellon University.

Indeed, the combination of more data, advances in data science, and new methods that allow researchers to easily and cheaply train neural networks is allowing scientists to boldly see where they have never seen before. No less important, these advances are not limited to astrophysics and astronomy; they have touched an array of other fields and have advanced autonomous vehicles, robots, drones, smart-phones and more. They’re also being used to better understand everything from how linguistic patterns contribute to racism to identifying the potential severity of hurricanes as they form.

Says Jeff Clune, an assistant professor of computer science at the University of Wyoming, “Until very recently, computers did not see and understand the world very well. The ability to train neural nets quickly and easily is transforming image recognition and enabling remarkable breakthroughs.”

Back to Top

Picture Perfect

Artificial neural nets are nothing new. The concept originated in the 1940s and researchers have experimented with them for the last quarter-century. Yet it was only over the last few years that the technology has matured to the point where computer image recognition and other artificial intelligence (AI) capabilities have become viable. Using anywhere from one to sometimes hundreds of graphical processing units (GPUs), these training networks—which function in a similar way to neural pathways in the human brain—recognize patterns in data that other computing systems cannot. Layered nodes learn from each other—and from other networks—much like the way children learn. Remarkably, because of their overall complexity, nobody knows exactly how each trained artificial neural net produces its useful results.

Rapid advancements in neural nets and deep learning are a result of several factors, including faster and better GPUs, larger nets with deeper layers, huge labeled datasets to train on, new and different types of neural nets, and improved algorithms. Typically, for computer image recognition, researchers feed lots of pictures of things—motorcycles, chimpanzees, trees, or space objects, for example—into the system so the neural net can learn what an object looks like and how to differentiate it from others. If a researcher is training the neural net to recognize animals, the system tends to learn faster and better if old data is transferred to the new task. For instance, if the original task was to identify lions and zebras, adding this data to the job of identifying elk and bears will help.

The system succeeds because there is now a shared knowledge between the two paths. “Already being good at one task makes a neural network faster and better at learning the second task,” Clune explains. “The system already has a basic understanding of things that are common to both tasks, such as eyes, ears, legs, and fur.” As training proceeds and a neural net becomes smarter, it can identify photos and other images it has never seen before. For example, Clune has achieved an accuracy rate as high as the 96.6% in the neural net compared to the 40,000+ humans who volunteered to label the same images. Others have found that the neural nets actually outperform humans. Remarkably, “In most cases, we can train a neural net within a couple of days,” he says.

Of course, this doesn’t mean that all systems are equally effective–and that the results are consistently useful. There’s also the goal of pushing the boundaries of computer image recognition further. At present, researchers train systems using labels. This means designating images for one type of animal ‘a lion’ and another ‘a zebra,’ or one galaxy ‘a spiral’ and another ‘an elliptical.’ The problem with this approach is that it’s time consuming and sometimes expensive. What is more, “sometimes you don’t have labels, or they are noisy labels,” says Ce Zhang, an assistant professor in the Systems Group at ETH Zurich. For instance, a “cougar” label might confuse the system if it is presented with both the car and the animal.


Researchers are turning to convolutional systems modeled from human visual processing, and generative systems that rely on a statistical approach.


Consequently, researchers are interested in an emerging area of deep learning that relies on different training methods, as well as unsupervised learning. University researchers as well as companies such as Alphabet, which operates Google Brain and DeepMind, have begun to study this space. They are turning to convolutional systems modeled after the visual processing that takes place in humans, and generative systems that rely on a more conventional statistical-based approach to learn the features of a dataset.

The end goal? “We want to just hand the computer the data and the algorithm and have it deliver results,” Schawinski says. “This type of capability would revolutionize astrophysics, but also science in general.”

Back to Top

A Sharper Focus

Advances in AI are now pushing the boundaries of neural nets and deep learning into an almost sci-fi realm, though the results produced by these systems are very real. Consider: Clune now uses generative systems to produce artificial images that look completely real to the human eye. These photo-realistic images range from birds and insects to mountains and even vehicles. He describes the technology as a “game changer.” Remarkably, over time, certain neurons in the deep learning network become better than others at recognizing and generating specific things, such as eyes, noses, bugs, or volcanoes. “The system actually figures out what it needs to recognize and know and allocates neurons to these concepts automatically,” he says.

To be sure, generative networks have value that extends beyond producing artificial images for art, video games, or augmented reality/virtual reality (AR/VR). Researchers have begun to use generative networks in competition with image-recognition networks to generate even more accurate results. Within this scenario, the generator network creates fake images and the image recognition network, known as a discriminator, analyzes the images and attempts to separate the real from the fake images. The discriminator later checks the validity of its findings and uses those results to further refine its algorithm. Over time, the discriminator becomes smarter and tells the generator how to adapt its output to generate even more realistic images.

The advantage of this approach is that the discriminator, referred to as a generative adversarial net (GAN), learns over time what matters most in the image, Zhang says. At a certain point, the system displays almost human-like intuition, he says; “results improve significantly.” Interestingly, this approach not only improves the quality of image detection, it may also trim the time required to train a network by reducing the number of images—essentially the volume of data—required to obtain useful results. Says Zhang: “An interesting question is how can we lower the requirement of a neural network in terms of how much data it needs to achieve the current level of quality?”

Another step is to make today’s artificial neural nets easier to use. The technology is still in its infancy and researchers often struggle to use tools and technology effectively. In some cases, they have to work with multiple nets in an iterative fashion to find one that works best. As a result, Zhang has developed a software program, ease.ml, that configures deep learning neural networks in a more automated and efficient way. This includes optimizing components such as CPUs, GPUs, and FPGAs and providing a declarative language for better managing algorithms.

“Right now, the user needs to deal with a lot of different decisions, including the type of neural net they want to use. There may be 20 different neural nets available for the same task. Choosing the right model and reducing complexity is important,” he explains.

Already, the software, combined with other deep learning techniques—including an algorithm called ZipML that reduces data representation without reducing accuracy—has cut noise and sharpened images significantly for the astrophysics group at ETH Zurich. As a result, Schawinski and others can now peer more deeply into the universe.

“Unlike other areas of science, we cannot run experiments in a lab and simply analyze the results,” ETH Zurich explains. “We are dependent on telescopes and images to look back in time. We have to piece together all these fixed snapshots—essentially huge datasets—to gain insight and knowledge.”

Adds Lanusse: “Classical methods of astronomy and astrophysics are rapidly being superseded by data science and machine learning. They not only do a job better, but they also offer new ways of looking at the data.”

The view into the future is equally compelling. Lanusse says that in the coming years neural networks will drive enormous advances in fields beyond astrophysics. These systems will not only detect, recognize, and classify objects, they will understand what is taking place in an image or in a scene in real time. This, of course, could profoundly impact everything from the way autonomous vehicles operate to how medical diagnostics work. Ultimately, they will help us unlock the mysteries of our planet and the universe. They will deliver a level of understanding that wouldn’t have been imaginable only a few years ago.

Says Lanusse, “Computer image recognition is advancing rapidly. We are finding ways to train networks faster and better. Every gain in speed and accuracy of even a few percent makes a profound difference in the real-world impact.”

*  Further Reading

Nguyen, A., Yosinski, J., Bengio, Y., Dosovitskiy, A., and Clune, J.
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space. Computer Vision and Pattern Recognition (CVPR ’17), 2017. http://www.evolvingai.org/ppgn

Lanusse, F., Quanbin, M, Li, N., Collett, T.E., Li, C., Ravanbakhsh, S., Mandelbaun, R., and Poczos, B.
CMU DeepLens: Deep Learning for Automatic Image-based Galaxy-Galaxy Strong Lens Finding. March 2017. arXiv:1703.02642. https://arxiv.org/abs/1703.02642.

Wang, K., Guo, P., Luo, A., Xin, X., and Duan, F.
Deep neural networks with local connectivity and its application to astronomical spectral data. 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, 2016, pp. 002687–002692. doi: 10.1109/SMC.2016.7844646. http://ieeexplore.ieee.org/document/7844646/

Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.
Generative Adversarial Networks. June 2014. eprint arXiv:1406.2661. http://adsabs.harvard.edu/cgi-bin/bib_query?arXiv:1406.2661.

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More