News
Artificial Intelligence and Machine Learning

Can AI Talk to the Animals?

Using machine learning techniques to decode the vocalizations of whales could provide new insights into the cetaceans.

Posted
pod of narwhals

Credit: CoreyFord / Getty Images

Artificial intelligence (AI) has been making great strides in generating and translating human language. Large language models (LLMs) have quickly moved beyond simply dealing with human speech to recognizing other patterns that convey information, from DNA sequences to computer code. Now some scientists are cocking the computer’s metaphorical ear to animal vocalizations, hoping to discover if other creatures actually speak to each other in a way that might be recognizable to humans and, if so, what they are saying.

That is the idea behind the Cetacean Translation Initiative (Project CETI), which is using machine learning to try to decode the vocalizations of sperm whales. Other researchers are studying communication in species including elephants, monkeys, and crows. They are using the pattern recognition capabilities of AI to sort a cacophony of caws and rumbles and chatter into individual units that may carry meaning, and then trying to match those units with behavioral observations to determine what those meanings might be.

The research is still in its early days, trying to uncover fundamental building blocks of animal communications systems and attach meanings to the sounds those creatures make. Scientists have started out with relatively simple machine learning techniques such as classifiers to identify individual units of language and are quickly moving to more sophisticated systems, such as deep neural networks, hoping to figure out what animals might be talking about.

Project CETI, for instance, has uncovered new details about the sequences of clicks that sperm whales make. Those sequences, known as codas, range from three to 40 clicks long and vary slightly between different social groups. Essentially, the groups can be distinguished by their different coda dialects, like the difference between an American and a British accent.

Shane Gero, a whale biologist at Carleton University in Ottawa, Canada, has spent the past two decades in the waters off the coast of Dominica in the Caribbean, observing more than 30 families of whales and recording their calls. Over the years, he and his team created spectrograms of the calls—graphic representations that allow them to visualize acoustic features such as frequency and volume—and labeled them by hand, a time-consuming process. Now, as part of Gero’s collaboration with Project CETI, computer scientists at the Massachusetts Institute of Technology (MIT) have used his labeled data and supervised learning to train a model that can annotate new data more quickly and separate the calls according to which whale is making them.

It turned out that the whales’ vocalizations were more sophisticated than had previously been thought. Past research had identified 21 codas in Caribbean sperm whales, and a total of approximately 150 worldwide. The whales’ observed behavior, however, was too complex to be covered by a relatively simple communication system with a small, fixed set of messages.

The CETI researchers discovered that, in addition to the rhythm and tempo that they knew about, the codas had two other features that could vary, which they named rubato and ornamentation. ‘Rubato’ involves subtle changes in the intervals between clicks. ‘Ornamentation’ is the occasional addition of an extra click. With four variables in different combinations, whales can produce a large set of different codas; the researchers identified 8,719 distinct ones.

Researchers also found what they are calling a phonetic alphabet for the whales, similar to the set of phonemes (basic units of sound) that humans use to build words and sentences. “The internal structure mimics in some ways aspects of phonology in human languages, where you have these different ways of putting together bits of the vocal apparatus to make a large set of sounds,” said Jacob Andreas, an associate professor in the Computer Science and Artificial Intelligence Laboratory at MIT, who participated in the research.

Humans are able to build up their basic units of sound into an unlimited array of meanings, said Pratyusha Sharma, a Ph.D. student at MIT and lead author of the paper reporting the phonetic alphabet. “We don’t yet know if whales can create an infinite space of meanings,” she said. “What we do know is they’re a lot more expressive than what was believed one year ago.”

GANs, Trees, and LLMs

This particular paper used fairly basic machine learning tools, such as a Gaussian mixture model, a probabilistic algorithm used to group data points into clusters. Researchers are already working, however, on applying more sophisticated AI to both the data they have and new data that they are continuing to collect. To figure out whether the whales’ vocalizations can convey information, Gero and the MIT team are working on what they call WhaleLM, a type of large language model. Like the LLMs that power chatbots or write computer code, WhaleLM learns the patterns underlying the whales’ calls and then uses what it has learned to predict what should come next.

In a preprint that has not yet been peer-reviewed, the researchers looked at whether they could use the sequence of codas being generated by the LLM to predict the current behavior and future actions of the whales, such as diving, and found that they could, with an accuracy of 72% for behavior and 86% for future actions. They trained the model on codas and looked at how accurately they predicted the next coda in a sequence. To test which aspects of the sounds might convey meaning, they trained new models with shorter sequences of codas, different orders of codas, or codas in which they changed the rhythm, tempo, rubato, or ornamentation one at a time. They found that any of these actions made the predictions less accurate. The authors said the study provides the first evidence that the whales’ vocalizations do indeed contain information that the creatures can act on.

Other researchers have used older AI tools to learn about animal calls. Mickey Pardo, a post-doctoral researcher in behavioral ecology at Cornell University, used a random forest algorithm to predict which individual animal in a group of African elephants a particular call was addressing. Once it had learned the acoustic features of those calls, the algorithm was able to examine a fresh set of calls and predict the elephant for which they were intended. In essence, the evidence shows that the elephants address each other by name, Pardo said.

Gasper Begus, an associate professor at the University of California, Berkeley, whose work covers linguistics, AI, and cognitive science, uses generative adversarial networks (GANs) to discover the building blocks of communication in unknown languages. His neural networks try to learn speech the way a human baby does, by listening and imitating. One part of the GAN, the generator, tries to produce new sounds that seem like they were generated by humans, and another part, the discriminator, decides whether the speech is real or fake. A separate neural network makes sure the sounds produced are not just random words but actually carry information.

Begus and his colleagues have developed techniques to look inside the neural layers of the GANs to determine which features the network identified as important to creating speech. “The model learns that there are sounds in languages, it learns words, it learns all sorts of meaningful things without any supervision,” he said.

The linguistics lead at Project CETI, Begus applied his AI model to sperm whale vocalizations. The model identified aspects of the calls that had previously been identified as meaningful, such as the number of clicks in a coda. It also identified other patterns that had not been seen as carrying meaning. Those patterns, reported in a preprint, seem to be equivalent to the vowels in human speech.

Though the model can identify acoustic properties that animals may use to convey meaning, it cannot say what that meaning is. Figuring that out requires observing how animals react to the sounds. In cases such as the elephant name study, that can mean playing back the sound the computer has identified as containing a name and seeing if the animal responds as expected.

David Gruber, the founder and president of Project CETI, and Shafi Goldwasser, a computer scientist at MIT and 2012 recipient of the ACM A.M. Turing Award, think unsupervised machine translation may be able to help scientists figure out what whales or other animals are saying, if indeed they have something resembling a language. They developed two stylized models and fed them with synthetic data and used that to establish the bounds on what sample data would have to look like for translation to work.

Gruber, a professor of biology and environmental sciences at Baruch College, City University of New York, said they showed that such translation has a better chance of working when whales and humans have similar concepts, such as family, food, or swimming. Where their experiences do not overlap with humans—these are, after all, animals the size of a couple of school buses that spend hours at ocean depths where light does not penetrate—translation may be more of a challenge. They also showed that the more complex the language the computer was dealing with, the lower its error rate would be.

None of the researchers claim they will be able to actually talk to animals. It is difficult to even ask if animals have language because there is no consensus on what constitutes a language, Begus said. What is known, he said, is “If you look closely, there are many properties of language that other animals have as well.”

Gruber hopes that as researchers collect more data and as their AI models grow more sophisticated, they may eventually be able to grasp the meaning of animal calls. “We see ourselves as baby whales now,” he said, “and we’re just kind of understanding the basic fundamentals of the communication system.”

Further Reading

  • Sharma, P., Gero, S., Payne, R. et al.
    Contextual and combinatorial structure in sperm whale vocalisations. Nat Commun, 2024, 10.1038/s41467-024-47221-8
  • Pardo, M.A., Fristrup, K., Lolchuragi, D.S. et al.
    African elephants address one another with individually specific name-like calls. Nat Ecol Evol, 2024, 10.1038/s41559-024-02420-w
  • Sharma, P., Gero, S., Rus, D. et al.
    WhaleLM: Finding Structure and Information in Sperm Whale Vocalizations and Behavior with Machine Learning, bioRxiv preprint, 2024,  10.1101/2024.10.31.621071
  • Goldwasser, S., Gruber, D., Kalai, A.T., and Paradise, O.
    A Theory of Unsupervised Translation Motivated by Understanding Animal Communication, 37th Conference on Neural Information Processing Systems (NeurIPS), 2023, https://proceedings.neurips.cc/paper_files/paper/2023/file/7571c9d44179c7988178593c5b62a9b6-Paper-Conference.pdf
  • Modeling speech recognition and synthesis simultaneously https://www.youtube.com/watch?v=BTg6upDjUyw
  • A whale of a tale: How scientists are decoding the language of sperm whales https://www.youtube.com/watch?v=_8DnreYuddE

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More