Scientific Applications of Generative AI

Generative artificial intelligence (AI) systems such as ChatGPT and Midjourney have brought AI to center stage in the past year.

Whereas predictive AI, like automatic image recognition, draws conclusions from seeing patterns in data, generative AI creates new content in the form of text, images, audio, video, and even computer code. During the 2024 American Association for the Advancement of Science (AAAS) Annual Meeting in Denver this February, the session ‘Generative AI in Science’ zoomed in on the opportunities and risks of generative AI in the sciences, from the perspective of computer science, material science, and climate science.

Seeking, and Utilizing, Scientific Data

“Can generative AI accelerate the scientific discovery process itself?” was the key question Rebecca Willett of the University of Chicago posed in the session. “Not just the analysis of data, but also the design of new experiments or the generation of new hypotheses. Can we generate new sustainable materials, new distributions of matter in the universe, new microbiomes, or even new laws of nature?”

Willett, a professor of statistics and computer science, made the point that many things are different in science, compared with current commercial applications of generative AI. Scientific data is very different, said Willett; “In general, the amount of data in the sciences is much lower in quality. Furthermore, our goals in science are different from producing things that are plausible. We want to understand how things work outside of the range of what we have already observed.”

In addition, scientists do not scrape data from the Internet, like ChatGPT does; they perform experiments and simulations, both of which are expensive and time-consuming. “We then are confronted with the new challenge of how we decide which experiments or simulations to run,” said Willett. “Another challenge is that the generative AI must be able to capture rare events, because rare events are often what we really care about in science, like hurricanes in climate scenarios.”

Finally, scientific data can span an enormous range of scales, from interactions of individual molecules to emergent features of large-scale materials used in an airplane, for example. That makes scientific data very different from the text or image data on which current commercial generative AI is trained. As a result, Willett concluded, “Generative AI in the sciences offers exciting opportunities, but off-the-shelf tools are insufficient.”

Designing Bio-Inspired Materials

Massachusetts Institute of Technology (MIT) professor of engineering Markus Buehler is developing generative AI-based tools for the design of new bio-inspired materials. The conventional way of designing materials, used since the 1950s, relies on solving preconceived physical equations; Buehler said this computationally intensive and slow method is very much limited by human imagination. It is also difficult to start out with a list of desired material properties, and then try to calculate back to the molecular structure that can provide those properties.

Generative AI can change all this, Buehler said. “Generative AI lets us go beyond human imagination. It allows us to learn new concepts, to read all the books in the world and all the data ever measured. In addition, it innately is able to solve inverse problems: how to go from the desired material properties to the underlying molecules? That’s a big game-changer.”

Key to the prediction of new materials with GAI are knowledge graph structures, Buehler explained. “Instead of trying to describe a material by a detailed atom-by-atom calculation, these graphs intrinsically capture what really matters in a molecular structure and what matters less.” He illustrated how an amyloid protein can be described by a graph that describes a subset of knowledge about the protein on multiple scales.

Buehler showed the possibilities of combining generative AI with physics-based modeling into a new form of physics-grounded artificial intelligence. As part of such a combined model, he and his colleagues have developed a generative AI system consisting of different interacting generative AI models to make new scientific discoveries: one model writes computer code, interprets data, and reasons, while another runs the computer code and generates new data, and yet another has access to a large collection of scientific literature.

He said, “We can ask such a multi-agent model to design a particular type of material, ask it even to tell why it comes up with a certain solution, and predict how it behaves in great detail in practice. This is what we have demonstrated in the lab. It sounds futuristic, but it’s already happening today.”

Improving Climate Science

Atmospheric physicist Duncan Watson-Parris of the University of California, San Diego, concluded with a story about generative AI models for climate science. Watson-Parris started by discussing one of last year’s scientific breakthroughs, according to the journal Science: an AI model trained on 40 years of historical weather data was able to forecast weather several days out on a laptop much more rapidly, but just as accurately, as the world’s top forecasting agencies using a supercomputer.

“This is offering a lot of opportunity to open up the stage for companies and organizations to make forecasts more available and accurate,” Watson-Parris said. “We are using these AI forecasts, among others, to better quantify the risks of extreme weather events.”

Climate, by definition, is something like average weather, so if AI can play a role in weather prediction, it can also play a role in climate prediction, Watson-Parris said. He offered the example of predicting what climate will be in the year 2100 under four plausible socioeconomic scenarios of fossil-fuel emissions, explaining, “Using climate model simulations as training data, we can build emulators of climate models that predict such scenarios on a laptop, rather than on a supercomputer.”

One of these scientific studies, he said, demonstrated that the emulator performs at least just as effectively as a full climate model simulation. “So, I think that there is a real potential for these models to enable policymakers and other people who don’t have the capacity to run detailed climate models to still explore various climate scenarios.”

Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.