Artificial Intelligence and Machine Learning News

How Computers Are Changing Biology

Sophisticated computer models and simulations are replacing test tubes and beakers. This revolution in biology research is redefining medicine, agriculture, and more.
Symmetric Dimer of Dimers puzzle
  1. Introduction
  2. Cellular Callings
  3. Beyond Medicine
  4. Further Reading
  5. Author
  6. Figures
Symmetric Dimer of Dimers puzzle
A Symmetric Dimer of Dimers puzzle, in which the goal is to find the best way of folding the protein (in color) so it binds best with its three grey copies.

It is no secret that computer modeling is changing science. The ability to extract meaningful information from huge data sets and build sophisticated models is altering everything from astronomy to quantum physics. Yet, perhaps no discipline is witnessing more tangible benefits from computer modeling than biology.

“There is an incredible amount of valuable information embedded in biological systems,” observes Michael Levitt, a Stanford University professor of structural biology. “Precise molecules and powerful computers are in many ways made for each other.”

Levitt, part of a trio that captured the 2013 Nobel Prize for Chemistry, sees a biological world transformed by computers. Over the last few decades, he says, biologists have moved from hands-on experiments to increasingly complex computer models and simulations. They have unlocked the human genome and identified previously unknown side effects in pharmaceutical drugs. Now, researchers are plugging in this knowledge to design artificial organs and revolutionize everything from medicine to food science. Suffice it to say, this new frontier of computational biology and bioinformatics is changing our world.

As biologists swap test tubes for hard drives, the possibilities grow. Says David Shaw, chief scientist at D.E. Shaw Research and a senior research fellow at the Center for Computational Biology and Bioinformatics at Columbia University, “Computation is beginning to take its place alongside experimentation as a full partner in the scientific enterprise, serving not only as a source of hypotheses, but as an independent source of evidence. Each is capable of elucidating biological and biochemical phenomena that are inaccessible to the other, enabling important scientific advances that could not be attained using either approach in isolation.”

Back to Top

Cellular Callings

The origins of bioinformatics extend back to the early 1970s. At that time, Dutch theoretical biologists Paulien Hogeweg and Ben Hesper recognized complex mathematical patterns exist in the biological world, and it is possible to develop algorithms to understand them in a more comprehensive way. Over the ensuing decades, radical advances in computer processing power, storage, software, and mathematical algorithms have led to enormous advances in the field. Today, computational biology and bioinformatics are widely used to tackle challenges that would have been unthinkable in times past. Hogeweg describes the process as “a sensible way to model the complexities of biological organisms.”

Pavel Pevzner, professor of computer science at the University of California, San Diego, says computational technology turns biology upside down. “Biology has been transformed into a digital science. It is nearly impossible to work in the field without using computational tools and computational expertise across multiple disciplines.” He says computers have not only opened doors by speeding up modeling from months or years to hours, they have led to qualitatively better data—and helped researchers spot complex and hidden relationships in the data.

Not surprisingly, much of the attention so far has focused on widely publicized biology-centric endeavors, such as the Human Genome Project and Foldit, the latter a multiplayer game that has led to breakthroughs in AIDS research.

“Computation is beginning to take its place alongside experimentation as a full partner in the scientific enterprise.”

Computational biology and informatics have made it possible to dynamically map and analyze DNA and protein sequences, design and validate pharmaceutical drugs, and build models for artificial organs. Scientists are also combining research areas to explore new territory. For example, a consortium of European scientists has developed a new informatics platform called GENOBOX, which could help industry predict how food bacteria and probiotics affect a person’s specific genome.

Ron Shamir, professor of computer science and bioinformatics at Tel Aviv University and an ACM Fellow, says a big part of the biotechnology revolution during the past decade has encompassed so-called next-generation sequencing, which facilitates ultra-fast, ultra-cheap sequencing of DNA. Since they are so efficient, these technologies take a step beyond merely sequencing genomes, and are used as a measuring tool for various biological entities.

Shamir observes, “As the cost of sequencing has dropped over the last 10 years, capabilities that were once inconceivable are now accessible and achievable.” He notes human genome sequencing that initially cost approximately $3 billion will soon be done for about $1,000; the cost is likely to drop to a few hundred dollars within a few years.

On the other hand, these tools create formidable computational challenges, particularly related to data storage, transfer, and analysis.

Nevertheless, the enormous boost in computational power, along with the ability to conduct research at a dramatically lower cost, is radically redrawing the biology landscape. For example, Levitt’s work has centered on theoretical, computer-aided analysis of protein, DNA, and RNA molecules responsible for life at the most fundamental level. Understanding the precise molecular structures of biological molecules is an essential first step in understanding how they work, and in designing drugs to alter their function.

Meanwhile, Shamir is constructing and refining algorithms that allow scientists to better understand the relationship between chromosomes and cancer, and to decipher biological systems regulation. “The goal is to gain insight into how genes are regulated by other genes and proteins,” he says. “We have in each cell a huge dynamical system that responds to the environment and changes over time. Understanding how genes and proteins are regulated, and how they change, is critical for medicine, agriculture, and basic biology.”

To be sure, researchers such as Shaw are tackling the riddles of biology in new and innovative ways—particularly at the intersection of the human genome and medicine. He and his research group have built a special-purpose supercomputer to simulate changes in a protein’s three-dimensional structure that occur on a millisecond time scale. This machine and the resulting data have helped the researchers unravel the molecular mechanisms underlying a number of biological processes and diseases. Many in the field believe that such advanced computer modeling and simulations could radically change the way pharmaceutical companies develop future drugs, a process that has in recent years become increasingly difficult, expensive, and time consuming. It could also help reduce dependence on animals for testing.

Computer modeling allows researchers to cycle through a mind-bending array of scenarios, while simulating how the body will react to differing types and levels of medication. Over time, as more and better data is plugged into the model and the computer keys on relationships and correlations, the model presumably becomes more accurate. This modeling approach complements traditional biological research methods and has the potential to reduce costs, speed development, and improve the efficacy of medications.

Such modeling also opens up new possibilities. For instance, researchers from eight major institutions are now collaborating on the Artificial Pancreas Project, attempting to develop and test sophisticated software that will automatically control glucose levels for people with type 1 diabetes.

Back to Top

Beyond Medicine

Nanotechnology, gaming, crowdsourcing, and connected devices are also emerging as important components in the giant bioinformatics cog. For example, a game called Dizeez created by The Scripps Research Institute aims to resolve questions about genetic medicine. It has resulted in the identification of several novel gene-disease annotations. Another game created by New England Biolabs, Cut it Out, revolves around players creating and manipulating DNA sequences.

As researchers turn to these tools, the possibilities grow exponentially. Using sensors and data input from mobile phones, biologists are not only able to capture a more complete snapshot of the surrounding environment and various factors—perhaps a moving picture is a more relevant analogy—they’re able to pore over vastly more data across a wide swath of disciplines and fields. “Biological systems provide remarkable insights into many things occurring in the world around us,” Levitt observes.

According to Shamir, the key to mining relevant biomedical data is refining algorithms to appropriately estimate the value of relevant experiments and data.

The possibilities are nearly limitless. In addition to vastly improved medical therapies and drugs, researchers could use biological data to better understand air and water pollution patterns and how they impact health; how hazardous substances disperse and interact with their surroundings, and how soil organisms and different chemicals react to different conditions. All of this could lead to new types of pollution controls, better HAZMAT monitoring and protective clothing, and vastly improved food science and farming methods. Bioinformatics research could also produce new and better types of fuels, and revolutionize everything from batteries to industrial manufacturing.

Developing better algorithms and computer models requires fresh thinking and interdisciplinary input. Shaw believes in most areas of computational biology, the most significant contributions tend to emerge from interdisciplinary research that “brings computer scientists together with biologists, chemists, and other application experts. Collaborations of this sort often lead to creative new approaches to problems that would be difficult to solve using the paradigms traditionally employed within any one of those disciplines.”

According to Shamir, the key to mining relevant biomedical data is refining algorithms to appropriately estimate the value of relevant experiments and data. In addition, it is critical to further improve storage capacity and compression—and to tap into cloud computing more effectively—in order to make data more accessible. “We need new and more sophisticated bioinformatics algorithms that can integrate heterogeneous data better,” Shamir says. Increasingly, “The challenge isn’t obtaining data, it is figuring out exactly how to decipher it. Right now, analysis is the bottleneck.”

Levitt believes bioinformatics will fundamentally alter science in the years ahead. “At a certain level, there is a structure to all data that exists in the physical world,” he explains. “Today, many of the methods used to analyze data are generic; researchers look for correlations, dependency, and causality.” However, as researchers learn to drill down to a more granular layer and gain a much deeper understanding of objects, context, and relationships, remarkable advances will follow.

On a molecular level, “A bridge and an eating utensil are both made of steel; it is the shape of the object and how it works that determines its place in the scheme of things.” As we learn to better recognize and differentiate complex patterns in biology, Levitt says, we will begin to see the shapes and structures of biological things beyond the basic structures. “It is possible to gain a level of knowledge that will revolutionize many aspects of our world.”

Back to Top

Further Reading

Karplus, M., Levitt., M., Warshel, A.
Development of Multiscale Models for Complex Chemical Systems, The Royal Swedish Academy of Science,

Orenstein, Y., Linhart, C., Shamir, R.
Assessment of Algorithms for Inferring Positional Weight Matrix Motifs of Transcription Factor Binding Sites using Protein Binding Microarray Data, PLoS ONE, 7 (9) e46145, 2012.

Shaw, D.E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R.O., Eastwood, M.P., Bank, J.A., Jumper, J.M., Salmon, J.K., Shan, T., Wriggers, W.
Atomic-Level Characterization of the Structural Dynamics of Proteins, Science, Vol. 330, October, 15, 2010.

Shaw, D.E., Dror, R.O., Salmon, J.K., Grossman, J.P., Mackenzie, K.M., Bank, J.A., Young, C., Deneroff, M.M., Batson, B., Bowers, K.J., Chow, E., Eastwood, M.P., Ierardi, D.J., Klepeis, J.L., Kuskin, J.S., Larson, R.H., Lindorff-Larsen, K., Maragakis, P., Moraes, M.A., Piana, S., Shan, Y., Towles, B.
Molecular Dynamics Simulations on Anton, SC ’09 Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Article No. 39.

Back to Top

Back to Top


UF1 Figure. A Symmetric Dimer of Dimers puzzle, in which the goal is to find the best way of folding the protein (in color) so it binds best with its three grey copies.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More