“It’s not hard to collect a bunch of insects,” said evolutionary biologist Josef Uyeda, an associate professor at Virginia Tech. “I can put a little trap in the rainforest, some insects will fall in, I put them in a jar, and maybe I have some new species in there.
“But what is the bottleneck? It’s getting expert eyes on all those specimens. That is such a huge bottleneck that the pace of discovering new species is very slow. Imageomics has the potential to change this.”
Imageomics is a new interdisciplinary research field that combines computer science and biology. The basic idea is to apply machine learning that is guided by biological knowledge to many millions of images of organisms in order to answer fundamental biological questions about ecology and evolution. Those images can be captured in the wild by professional biologists with professional cameras, or by amateurs with smartphones; they can be made by drones, they can come from citizen science projects, from digitized museum collections, and even from drawings.
A driving force behind this new research field is Tanya Berger-Wolf, director of the Translational Data Analytics Institute and a professor of Computer Science Engineering, Electrical and Computer Engineering, as well as Evolution, Ecology, and Organismal Biology, at the Ohio State University. She leads the nationwide Imageomics program, which started in 2021 with funding for five years under the National Science Foundation program “Harnessing the Data Revolution.”
“In Imageomics, we try to extract information about traits and phenotypes from imaging the natural world, and connect them to function and genotypes,” explained Berger-Wolf. “The technology part allows us to look at many more images and features than ever before. The computational, analytical part allows us to look more carefully, expanding the human way of observing the world to spectra, scales, and resolutions where humans do not see well.”
Uyeda offered a concrete example: “I tell my students who are collecting animal data to first measure the forelimb, then measure the eye, then measure the tail, and so on. But that does not necessarily correspond with how genetics works. A gene can affect a whole suite of traits. As humans we are biased in what traits we measure and how we measure them. My hope is that with AI, and especially with AI that is guided by biological knowledge, we can get a better idea of how to actually quantify traits.”
In a project called Phenoscape, Uyeda and his collaborators worked toward building bridges between the domain knowledge of biological experts and the computational models that are commonly used to analyze big data. Previously, this domain knowledge—for example, how parts of an organism are defined, connected, and integrated—rarely entered into a computational pipeline directly. “However, if AI is to effectively replicate extracting traits from images, it becomes essential that this knowledge is available for computations,” Uyeda said.
Despite having started so recently, Imageomics has already delivered some exciting results. One of them is the multimodal foundation model BioCLIP for the Tree of Life (the graphic tool biologists use to organize evolutionary relationships among plants, animals, and all other forms of life). Said Berger-Wolf, “BioCLIP was trained on 10 million images of animals, plants, and fungi from a total of about 450,000 species, about a quarter of all the named species, and about 5% of the estimated number of species.”
The model, the data, and the code of BioCLIP are freely available and everybody can experiment with a demo. Give the demo a photo of an organism, and it will predict its species name.
“BioCLIP not only works better than existing tools,” Berger-Wolf said, “it even works on image types it has never been trained on, like drawings or camera trap images. It can identify to the genus level species that it has never seen, thanks to the fact that the machine learning model is grounded in biological knowledge. And even more excitingly, we are now building a next version that will be able to give explanations for what it sees. For example, BioCLIP might say: this is a painted bunting, because it has a red-orange belly, a yellow back, and a blue cap. Or: this is a female of that species, because of such-and-such traits. We are training this next version on 200 million images of almost all the two million named species.”
For centuries, biologists have used drawings and photos of animals and plants to understand them. However, thanks to AI, they now have a tool that can greatly help them do things that were previously impossible, like recognizing subtle patterns humans cannot see. That is a second exciting result that Imageomics has delivered recently.
Berger-Wolf discussed research by some of her colleagues: “They studied two different species of butterflies, one of which uses mimicry to change wing patterns to impersonate the other species; this is supposed to fool its predator. When visual mimicry is used, humans often find it hard to tell the two species apart, but machine learning can.”
AI can even be used to determine how extensive the differences in color patterns need to be to fool birds. That hypothesis will be tested by printing butterflies with different wing patterns, showing those to predatory birds, and observing at which “butterflies” the birds will pick.
Said Berger-Wolf, “The last piece of the story is to try to discover the genes that are responsible for the subtle differences in the appearance of the two species of moths. We are connecting an ‘Image-Wide Association Study,’ IWAS, to the genome and we think we are close to finding those genes.”
Such results show the potential of Imageomics, Berger Wolf said. “I dare to say confidently that we will have new biological discoveries thanks to a partnership between humans and AI. We will create the ability to understand the world in ways we weren’t able before.”
Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment