The power of computers to juggle vast quantities of data has proved invaluable to science. In the early days, machines performed calculations that would have taken humans far too long to perform by hand. More recently, data mining has found relationships that were not known to existnoticing, for instance, a correlation between use of a painkiller and incidence of heart attacks. Now some computer scientists are working on the next logical stepteaching machines to run experiments, make inferences from the data, and use the results to perform new experiments. In essence, they wish to automate the scientific process.
One team of researchers, from Cornell and Vanderbilt universities and CFD Research Corporation, took a significant step in that direction when they reported last year that a program of theirs had been able to solve a complex biological problem. They focused on glycolysis, the metabolic process by which cellsyeast, in this casebreak down sugars to produce energy. The team fed their algorithm with experimental data about yeast metabolism, along with theoretical models in the form of sets of equations that could fit the data.
The team seeded the program with approximately 1,000 equations, all of which had the correct mathematical syntax but were otherwise random. The computer changed and recombined the equations and ranked the results according to which produced answers that fit the dataan evolutionary technique that has been used since the early 1990s. The key step, explains Hod Lipson, associate professor of computing and information science at Cornell, was to not only rank how the equations fit the data at any given point in the dataset, but also at points where competing models disagreed.
"If you carefully plan an experiment to be the one that causes the most disagreement between two theories, that's the most efficient experiment you can do," says Lipson. By finding the disagreements and measuring how much error the different equations produced at those points, the computer was able to refine the theoretical models to find the one set of equations that best fit the data. In some cases, Lipson says, the technique can even develop a full dynamical model starting from scratch, without any prior knowledge.
Lipson and then-doctoral student Michael Schmidt started developing Eureqa, the algorithm the work was based on, in 2006. Three years later, they published a paper in Science, describing how Eureqa took measurements of the complex motions of a double pendulum and used the evolutionary process to derive Hamiltonian and Lagrangian differential equations that describe the laws of motion.
Lipson's hope is that this approach can expand human capabilities in science. "In classical experimentation, you vary one parameter at a time," he says. A computer can alter all the parameters at once to find the best experiment. "If you have 18 variables, there's no way you can conceptualize that, but the machine can."
The research does not knock humans out of scientific discovery. "The expectation would be that you find the model fast, and you are able to look deeper into your data," Lipson says. "In the end, you do have to have an expert to look at these equations and ponder their meaning and decide whether they're actually useful."
Increasingly, scientists are enlisting the help of robots, such as the ones used in high-throughput drug discovery, to manipulate materials and gather data. "The concept is to try to automate the cycle of scientific research, which involves forming hypotheses and being able to test them," says Ross King, who headed the computational biology group at Aberystwyth University but recently moved to the Manchester Interdisciplinary Biocenter at the University of Manchester where he is a professor of machine intelligence. He used a robotic system he calls Adam to run biological experiments on microtiter plates that hold miniscule amounts of samples, and can run several different experiments within millimeters of each other. Adam compared growth rates of natural yeast cells with a series of genetically altered cells, each of which was missing a different gene, and was able to identify the function of different genes.
Why not enlist a robot to generate data, form a hypothesis, and design experiments to test its hypothesis?
King says robots have already surpassed humans' capabilities in the physical aspects of the experiment, such as the placement of different yeast strains in miniscule holes on the microtiter plates. So why not havea robot conduct the entire experimentgenerate data, form a hypothesis, design new experiments to test the hypothesis, and then work with the new data?
The system is not quite that advanced yet, King admits. "I don't think Adam is capable of generating a hypothesis that is cleverer than what a human could do," he says. "I think in maybe 20 years there will be robots able to do almost any experiment in the wet biology lab."
"We're a long way from automating the entire scientific enterprise," says Patrick Langley, head of the Cognitive Systems Laboratory at Stanford University. Langley has argued for the validity of computational discovery since the early 1980s, when he and the late Herbert Simon, a pioneer in artificial intelligence, first wrote about an early scientific discovery program called BACON. The work by Lipson and King, he says, is well within that tradition.
"You can view discovery as a search though a space of laws and models," says Langley. "Historically, people had to search that space themselves." There is no reason, he says, people should not hand off some of that searching to machines, just as they handed off tasks such as focusing astronomers' telescopes. "They'll let scientists be more productive and make more rapid progress."
It is not merely the vast mountains of data the sciences are producing these days that requires computational assistance, Langley explains. There may be plenty to discover in the social sciences, but these are complex systems, with numerous variables often interacting in subtle ways. "They are all really complicated," says Langley, "and we will really need computational tools to go after them."
But automating science also involves formidable challenges, such as how to scale up algorithms like Eureqa to handle more complex models. While Langley considers Lipson's work to be technically solid, it deals with only a seven-dimensional model. What happens if the computer has to deal with hundreds of equations, all interacting with one another? The computer's runtime could grow exponentially with the complexity of the model. Computer scientists need to develop methods to deal with much more complex models, Langley says.
It is not difficult to envision how computers could speed up scientific discovery by plowing through vast arrays of data and "spitting out hypotheses" for testing, suggests Bruce Buchanan, professor of computer science, philosophy, and medicine at the University of Pittsburgh. In the 1960s, Buchanan was involved in Dendral, an early scientific discovery computer program that used mass spectrometry to identify unknown organic molecules. Simply sorting out the interesting molecules from vast arrays of information, the way Google identifies the most popular results of a search query, would be useful. "There are so many cases where discoveries could have been made sooner if people were looking at different data, asking better questions," Buchanan says.
Robots will be able to conduct almost any experiment in a wet biology lab in about 20 years, says Ross King.
The larger question is whether computers could ever go beyond what philosopher Thomas Kuhn called "normal science" and achieve a real scientific revolution. Elihu Abrahams, a condensed matter physicist at Rutgers University, and Princeton University physicist Philip Anderson, expressed their doubts in a letter to Science after its publication of separate computational discovery papers by Lipson and King in 2009. "Even if machines did contribute to normal science," they wrote, "we see no mechanism by which they could create a Kuhnian revolution and thereby establish new physical law." Abrahams says he has seen nothing since then to change his mind.
King admits the field has not yet reached that point. Adam "knows far more facts than any human, but it doesn't have a deep understanding of biology," he says. "I think you need a deep understanding to actually manipulate concepts in novel ways."
Computational discovery can certainly make science more efficient and cost effective, and free up scientists to do more deep thinking, he argues. But for machines, King says, "the things which the great scientists do are still a long way off."
Bridewell, W. and Langley, P.
Two Kinds of Knowledge in Scientific Discovery, Topics in Computer Science 2, 1, Jan. 2010.
Dzeroski, S. and Todorovski, L. (Eds.)
Computational Discovery of Communicable Scientific Knowledge. Springer, Berlin, Germany, 2007.
King, R., Liakata, M., Lu, C., Oliver, S., and Soldatova, L.
On the formulation and reuse of scientific research, Journal of the Royal Society Interface 8, 57, April 2011.
The computational support of scientific discovery, International Journal of Human-Computer Studies 53, 3, Sept. 2000.
Schmidt, M., Vallabhajosyula, R., Jenkins, J., Hood, J., Soni, A., Wikswo, J., and Lipson, H.
Automated refinement and inference of analytical models for metabolic networks, Physical Biology 8, 5, Oct. 2011.
Waltz, D. and Buchanan, B.G.
Automating science, Science 324, 5923, April 3, 2009.
©2012 ACM 0001-0782/12/0500 $10.00
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2012 ACM, Inc.