Biologists are awash in data; the genomes of 18,840 organisms had been sequenced by mid-October 2012, according to the U.S. Department of Energy's Joint Genome Institute. And scientists around the world are trying to determine how to wring value from all that data, with projects studying how those genes interact with each other and the environment, how embryos develop, how toxins affect tissues, and why some cells become cancerous, among many other questions. With so much data, and with questions that hinge on understanding the complex interplay of multiple factors, computer scientists are working to develop software that can simulate the behavior of biological systems, from cells to organs to entire organisms.
"It's trivial to generate terabytes of data in a day or two," says Mark Isalan, a systems biologist at the Centre for Genomic Regulation in Barcelona, Spain. The bigger challenge is how to use all that data to address important questions.
Yet that volume of data also makes problems in biology more tractable for computer scientists seeking to model biological behavior. "Because we can measure the cells much more precisely, we actually have numbers we can give to models," Isalan says. And once they have built accurate models, researchers can then change some of the model's parameters and see what happens, which can help explain the mechanisms of disease, or suggest new targets for drugs to treat diseases.
Researchers from Stanford University and the Craig Venter Institute in Rockville, MD, have taken an important step toward getting value out of the huge datasets by creating what they say is the first comprehensive computational model of a living organism. They built a model of the bacterium Mycoplasma genitalium, a parasite that infects the human urethra. The scientists chose that microbe because it has only 525 genes, the fewest of any independently living organism; humans, by contrast, have around 25,000. "We really wanted to build something that was complete," says Jonathan Karr, a doctoral student in biophysics at Stanford and lead author of a paper about the work. "Anything larger we felt would just be impossible at this point to build a complete model." Even this model, the researchers say, is essentially a "first draft."
To build their model, the team broke down the cell into various individual functions. They came up with 28 sub-models, each simulating a different process. One, for instance, modeled the replication of DNA, while another described the process of transcribing RNA, and another simulated how proteins produced by the cell folded into particular shapes. One advantage of this approach, says Karr, is that it makes sense from a software engineering perspective; "That's the way programs are built," he says. Another plus is that, instead of using one type of mathematical representation to describe the entire organism, it allowed the researchers to use the mathematical approach most appropriate to the activity of a particular module and to the amount of data available about that cellular function. One module might rely on ordinary differential equations, for instance, while another uses a Boolean model. "Different aspects of cell biology are not all characterized to the same level of detail," Karr explains. "There are just certain aspects of cell life we know more about and others where we know less." The team fed the model data gleaned from research literature about the bacteria and other similar organisms, and added additional information they generated in laboratory experiments.
To make all the sub-models work together, the researchers built a piece of software to plug them all into. The sub-models run independently for a short time, less than a second. But to emulate how biology relies on feedback loops, the sub-models are linked by 16 variable metabolic states that, taken together, represent everything going on in the whole cell. At each time step of one second, the sub-models take the states of those variables and use them to run their simulation, then make the revised values available to the other sub-models. Also at each time step, the computer estimates the amount of metabolic resources a given biological process would require, then allocates the cell's total resources proportionally among the different processes. These stepsmeasure, calculate, share, repeatrun thousands of times until the simulated cell reaches the point where a real-life cell would divide into two, at which point the simulation is done. When the team ran their simulation, the computer produced results that matched those that had already been determined in lab experiments. But beyond that, says Karr, the simulation can also highlight inconsistencies in the data and suggest the existence of cellular functions that are not yet recognized, pointing the way to new lines of research. "The model helps us reason about the things we as a field collectively don't know about cell biology," he says.
While the Stanford group focuses on simulating a whole organism, other researchers are concentrating on simulating an organ. Since 2005, the Blue Brain Project at École Polytechnique Fédérale de Lausanne, in Switzerland, has been developing a computer model of a brain. So far, they have built and run a representation of part of a rat's cortex, consisting of 10,000 neurons. The researchers have asked the European Union to fund a 10-year, one billion euro project to create a functioning model of an entire human brain, with hundreds of millions of neurons that could be used to simulate neurological diseases or the effects of various drugs on the brain. At press time, an answer was imminent.
In Germany, the Virtual Liver Network consists of 70 research groups working to build a model that, while not fully duplicating the liver, represents the physiology of the organ, simulating biological functions at different levels, from activity within individual cells to the liver as a whole. Meanwhile, the U.S. Environmental Protection Agency (EPA) is working on a similar project, with the aim of being able to simulate the effects of drugs and environmental toxins on the liver. The agency also has a virtual embryo project to study how certain chemicals might cause birth defects.
The EPA needs such data to set regulations about what levels of exposure to chemicals should be considered safe for humans. To date, such levels are set based on data from animal models, but an animal study can take up to two years and cost millions of dollars, says Imran Shah, a computational systems biologist at the EPA's National Center for Computational Toxicology in Research Triangle Park, NC, who works on the virtual liver. Further, he says, there is some question as to how closely the effects of chemicals in animals match what happens in humans; there are proteins whose increased production causes liver cancer in rats but not in people, for instance. And results in animal studies may come from the high concentration of toxins used in tests, which may not replicate a real-life situation. "In most cases what the EPA cares about is long-term and very low-level exposure," Shah says.
"The model helps us reason about the things we as a field collectively don't know about cell biology."
The approach Shah's team takes is agent-based multi-scale modeling. They make models at various levels of organizationthe molecular pathways within a cell, the cell as a whole, groups of cells, sections of liver. Like the Stanford work, the whole model is built as a series of modules, with each module acting as an autonomous agent. One module might be responsible for metabolizing a substance, another might affect blood flow through capillaries. The simulation focuses on a lobule, a functional unit of the liver containing roughly one million cells of various types, with a defined three-dimensional structure. Blood flows through the lobule, nutrients are exchanged, bile is excreted. The team simulates the activity of a single lobule in detail, then groups 20 or 30 of them together to build a larger model of liver function.
This kind of modeling is a different approach to toxicology than statistical modeling, which looks for associations between, say, a potential toxin and a negative result. "We try to think about it more in terms of a mechanistic level," Shah says. Mechanistic modeling may not just reveal that a chemical has an ill effect, but lead to a greater understanding of why.
The project is far from complete. The EPA team presented its first proof of concept this year, running Simulations on 10 virtual individuals with 10 virtual livers, but Shah said there needs to be much more refining of the biological information that goes into the models. One challenge that remains is verifying that what the model shows is a valid representation of the real world; how do you test the computer's prediction against actual lab results when you cannot do these experiments on humans?
Karr would like to move from his bacterium model to more complex organisms, starting perhaps with yeast and then moving to simple multi-celled organisms like worms, and on up the scale of complexity from there. Such models could allow synthetic biologists to design and engineer organisms, such as microbes that efficiently convert biomass into fuel or pharmaceuticals. And they could play a role in personalized medicine, allowing doctors to prescribe the best treatment based on an individual's own genome and history. That will demand a lot of work, both gathering the biological information and figuring out the best computational approaches. "We eventually need to be able to understand how you get from a person's DNA to the behavior of a human being," Karr says. "And we're going to need very detailed models to be able to do that."
Karr, J.R. et al.
A whole-cell computational model predicts phenotype from genotype, Cell 150, July 20, 2012.
Wambaugh, J. and Shah, I.
Simulating microdosimetry in a virtual hepatic lobule, PLoS Comput Biol 6, 4, Apr. 22. 2010.
A cell in a computer, Nature 498, Aug. 2, 2012.
Wambaugh, J, and Shah, I.
Virtual tissues in toxicology, J Toxicol. and Environmental Health, Part B, 13, 2010.
How do neurons connect to each other? Blue Brain Project opens new insights, Sept. 17, 2012; http://www.youtube.com/watch?v=ySgmZOTkQA8/
Figure. The huge volume of data generated from genome sequencing technologies, like those used as part of the DOE's Joint Genome Institute, has inspired computer scientists worldwide to create software that can take that data and build computational models simulating the behavior of biological systems.
©2013 ACM 0001-0782/13/02
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2013 ACM, Inc.
No entries found