Science has been growing new legs of late. The traditional "legs" (or "pillars") of the scientific method were theory and experimentation. That was then. In 2005, for example, the U.S. Presidential Information Technology Advisory Committee issued a report, "Computational Science: Ensuring America's Competitiveness," stating: "Together with theory and experimentation, computational science now constitutes the 'third pillar' of scientific inquiry, enabling researchers to build and test models of complex phenomena." The report offered examples such as multi-century climate shifts, multidimensional flight stresses on aircraft, and stellar explosions.
This "third leg" of science has become a standard coin (run a Web search on this phrase!). However, this leg has been recently augmented by yet a "fourth paradigm" (or "leg") that refers to the usage of advanced computing capabilities to manipulate and explore massive datasets. For example, the decoding of the human genome in 2001 was a triumph of large-scale data analysis. Now science allegedly has four legs, and two of them are computational!
I find myself uncomfortable with science sprouting a new leg every few years. In fact, I believe that science still has only two legstheory and experimentation. The "four legs" viewpoint seems to imply the scientific method has changed in a fundamental way. I contend it is not the scientific method that has changed, but rather how it is being carried out. Does it matter how many legs science has? I believe it does! It is as important as ever to explain science to the lay public, and it becomes more difficult to explain when it grows a new leg every few years.
Let us consider the first leg: theory. A scientific theory is an explanatory framework for a body of natural phenomena. A theory can be thought of as a model of reality at a certain level of abstraction. For a theory to be useful, it should explain existing observations as well as generate predictions, that is, suggest new observations. In the physical sciences, theories are typically mathematical in nature, for example, the classical theory of electromagnetism in the form of Maxwell's Equations. What is often ignored is the fact that any application of a mathematical theory requires computation. To make use of Maxwell's Equations, for example, we need to solve them in some concrete setting, and that requires computationsymbolic or numeric. Thus, computation has always been an integral part of theory in science.
What has changed is the scale of computation. While once carried out by hand, computation has required over time more advanced machinery. "Doing" theory today requires highly sophisticated computational-science techniques carried out on cutting-edge high-performance computers.
The nature of the theories has also changed. Maxwell's Equations constitute an elegantly simple model of reality. There is no analogue, however, of Maxwell's Equations in climate science. The theory in climate science is a highly complex computational model. The only way to apply the theory is via computation. While previous scientific theories were typically framed as mathematical models, today's theories are often framed as computational models. In system biology, for example, one often encounters computational models such as Petri Nets and Statecharts, which were developed originally in the context of computer science.
Computation has also always been an integral part of experimentation. Experimentation typically implies carrying out measurements, and the analysis of these measurements has always been computational. Again, what has changed is the scale. The Compact Muon Solenoid experiment at CERN's Large Hadron Collider generates 40 terabytes of raw data per second, a volume one cannot hope to store and process. Handling such volume requires advanced computation; the first level of data filtering, for example, is carried out on fast, custom hardware using FPGAs. Analyzing the still-massive amount of data that survives various levels of filtering requires sophisticated data-analysis techniques.
So science is still carried out as an ongoing interplay between theory and experimentation. The complexity of both, however, has increased to such a degree that they cannot be carried out without computation. There is no need, therefore, to attach new legs to science. It is doing fine with two legs. At the same time, computational thinking (a phrase coined by Jeannette Wing) thoroughly pervades both legs. Computation is the universal enabler of science, supporting both theory and experimentation. Today the two legs of science are thoroughly computational!
Moshe Y. Vardi, EDITOR-IN-CHIEF
©2010 ACM 0001-0782/10/0900 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
Computation and data mining are tools, or methods, that must themselves be subject to, and supported by, theory and experimentation. How else would we be confident we understand how they work, or confident their results are consistent and reproduceable?
The excited and enthusiastic claims that computation and/or data mining are *new* underpinnings of science seem to me to be founded upon a deep misunderstanding about what 'experiment' and 'theory' *are* and what they *do*. But this itself is not new.
Change of scale or scope do not, of themselves, alter the fundamentals of the processes involved. Neither driving faster, or the capacity to drive faster, alters the *nature* of driving; but it does alter the relationships between factors that, for example, make driving risky (eg reaction time of driver vs time to react to oncoming vehicles.)
Stephen Wolfram is excited about the potential for his product: it is an excellent product. Complex modelling is a complex process, but it is no less theory-driven, no less experimental, than any other approach. Rather than a new *kind* of science, it's a developing cluster of tools and methods for using them - a new *approach* at best (arguably it's an extension of approaches that harness the potential of computers).
It is worth taking a moment to distinguish between models and approximations of reality - and in particular *how* we generate and test those models - and reality itself. Lest we fall afoul of deterministic thinking.
With all due respect, I believe that the idea that computational science is no more than a straightforward extension of traditional scientific methods made possible by computers is extraordinarily naive and inaccurate. Lets look at the issue in a bit more detail. A simple definition of science is this: the activity concerned with the systematic acquisition of knowledge. The English word is derived from the scientia, Latin for knowledge. According to the Cambridge dictionary, it is the enterprise that organizes knowledge in the form of testable explanations and predictions about the universe. The question of precisely how knowledge is acquired has been the subject of debate among philosophers of science for some 2,500 years, ranging from the philosophy of skepticism of David Hume in the 18th century that asserted that inductive logic cannot lead to knowledge, to contemporary writers on the logic of science, such as E.T. Jaynes, who said that all scientific knowledge has been obtained by induction. After millennia of debate by some of the greatest minds of human history, two avenues for acquiring scientific knowledge emerged: 1) observations, experimental measurements, information gained by the human senses, aided by instruments, and 2) theory, inductive hypotheses often framed in the language of mathematics. Observation and theory are thus the two classical pillars of science. According to the Oxford Dictionary, the scientific method is a method of procedure that has characterized natural science since the 17th century, consisting in systematic observation, measurement, experience and experiment and the formulation, testing, and modification of hypotheses. In the rather short span of time, measured against around 400,000 years of human history, computational science has emerged as a dramatic new way to acquire knowledge and, therefore, to do science. It involves the use of computational algorithms to translate mathematical models of how the physical universe behaves into computer models that attempt to predict the future and reconstruct the past. It has been successful in a broad and increasing list of scientific and technological developments in medicine, engineering, and basic science.
But does this new discipline really constitute a new paradigm in the scientific method? The answer is: of course it does. There are abundant examples of the use of computer simulations to generate new knowledge about the physical universe that fall well outside the reach of contemporary methods of observation and experiment. Moreover, they can also be used to generate new hypothesis not naturally acquired by conventional inductive scientific processes. A special characteristic of this enterprise is that it is necessarily interdisciplinary, simultaneously bringing in methods and principles from mathematics, computer science, and core science, engineering, and medicine disciplines, to feed a body of knowledge and methodologies far apart from the traditional disciplines. I believe it is very important to recognize this interdisciplinary characteristics, and that this important new subject cannot be best cultivated in the traditional silos provided by most contemporary educational institutions.
Is there a fourth paradigm? I personally believe that the answer is no. My colleagues who work in data mining and data-intense science and technologies all call themselves computational scientists. Current research methods in such areas as data mining are not merely extensions of methods of observation, but are indeed new methods of acquiring knowledge from data. I put these under the general category of computational science.
In the end, does it really matter? Is it really necessary to think of computational science as a third pillar? In my opinion, the answer is very definitely yes. Recognizing it as such will influence how we organize future educational institutions, how funding agencies will support research, and how the components of computational science will be taught to the next generation of scientists. There is overwhelming evidence to justify that the field of computational science is indeed a third pillar of science. In my mind, the emergence of computational science is one of the most important developments in human history, a development that will revolutionize virtually ever aspect of science in the future.
University of Texas at Austin
The following letter was published in the Letters to the Editor in the March 2011 CACM (http://cacm.acm.org/magazines/2011/3/105325).
It's great to reflect on the foundations of science in Communications, as in Tony Hey's comment "Science Has Four Legs" (Dec. 2010) and Moshe Y. Vardi's Editor's Letter "Science Has Only Two Legs" (Sept. 2010), but also how the philosophy of science sheds light on questions involving the number of legs in a natural science.
Willard Van Orman Quine's 1951 paper "Two Dogmas of Empiricism" convincingly argued that the attempt to distinguish experiment from theory fails in modern science because every observation is so theory-laden; for example, as a result of a Large Hadron Collider experiment, scientists will not perceive, say, muons or other particles, but rather some visual input originating from the computer screen displaying experimental data. The interpretation of this perception depends on the validity of many nonempirical factors, including physics theories and methods.
With computation, even more factors are needed, including the correctness of hardware design and the validity of the software packages being used, as argued by Nick Barnes in his comment "Release the Code" (Dec. 2010) concerning Dennis McCafferty's news story "Should Code Be Released?" (Oct. 2010).
For such a set of scientific assumptions, Thomas S. Kuhn coined the term "paradigm" in his 1962 book The Structure of Scientific Revolutions. Imre Lakatos later evolved the concept into the notion of "research program" in his 1970 paper "Falsification and the Methodology of Scientific Research Programs."
In this light, neither the two-leg nor the four-leg hypothesis is convincing. Citing the leg metaphor at all, science is perhaps more accurately viewed as a millipede.
The following letter was published in the Letters to the Editor in the December 2010 CACM (http://cacm.acm.org/magazines/2010/12/102133).
As an editor of The Fourth Paradigm (http://research.microsoft.com/en-us/collaboration/fourthparadigm/default. aspx, Microsoft Research, Redmond, WA, 2009) and someone who subscribes to Jim Gray's vision that there are now four fundamental scientific methodologies, I feel I must respond to Moshe Y. Vardi's Editor's Letter "Science Has Only Two Legs" (Sept. 2010).
First, I should explain my qualifications for defending the science-has-four-legs premise. From 1964, beginning as a physics undergraduate at Oxford, until 1984, when I moved from physics to the Electronics and Computer Science Department, I was a working natural scientist. My Ph.D. is in theoretical particle physics, and, in my research career, I worked extensively with experimentalists and spent two years at the CERN accelerator laboratory in Geneva. In computer science, my research takes in all aspects of parallel computing architectures, languages, and tools, as well as methodologies for parallelizing scientific applications and more recently the multi-core challenge. From 2001 to 2005, before I joined Microsoft, I was Director of the U.K.'s eScience Core Program, working closely with scientists of all descriptions, from astronomers and biologists to chemists and environmental scientists. Here at Microsoft Research, I still work with practicing scientists.
I therefore have some relevant experience on which to ground my argument. By contrast, though Vardi has had a distinguished career in mathematics and computer science (and has done a great job with Communications), he has not, as far as I know, had much direct involvement with the natural sciences.
It is quite clear that the two new scientific paradigms computational and data-intensive do not displace experiment and theory, which remain as relevant as ever. However, over the past 50 years it is equally clear that computational science has emerged as a third methodology with which we now explore problems that are simply inaccessible to experiment. To do so, scientists need (along with their knowledge of experiment and theory) training in numerical methods, computer architecture, and parallel programming. It was for this reason that Physics Nobel Prize laureate Ken Wilson in 1987 called computational science the "third paradigm" for scientific discovery. He was investigating quantum chromo-dynamics, or QCD, describing the fundamental equations between quark and gluon fields behind the strong nuclear force. No analytic solution is possible for solving these equations, and the only option is to approximate the theory on a space-time lattice. Wilson pioneered this technique, using super-computers to explore the predictions of QCD in the physical limit when the lattice spacing tends to zero. Other examples of such computational exploration, including galaxy formation and climate modeling, are not testable through experiment in the usual sense of the word.
In order to explore new techniques for storing, managing, manipulating, mining, and visualizing large data sets, Gray felt the explosive growth of scientific data posed profound challenges for computer science (see Fran Berman's "Got Data? A Guide to Data Preservation in the Information Age," Communications, Dec. 2008). He therefore spent much of his last years working with scientists who were, as he said, "drowning in data." Working with astronomers allowed him the luxury of experimentation, since, he said, their data had "absolutely no commercial value." Similarly, the genomic revolution is upon us, and biologists need powerful tools to help disentangle the effects of multiple genes on disease, develop new vaccines, and design effective drugs. In environmental science, data from large-scale sensor networks is beginning to complement satellite data, and we are seeing the emergence of a new field of environmental informatics. Scientists require significant help not only in managing and processing the raw data but also in designing better workflow tools that automatically capture the provenance involved in producing the data sets scientists actually work with.
On the other hand, "computational thinking" attempts to demonstrate the power of computer science ideas not just in science but also in many other aspects of daily life. However, despite its importance, this goal should not be confused with the emergence of the two new methodologies scientists now need to assist them in their understanding of nature.
Hey and I are in violent agreement that science today is thoroughly computational. What I fail to see is why this requires it to sprout new legs. In fact, theory in science was mathematical way before it was computational. Does that make mathematics another leg of science?
Experimental science always relied on statistical analysis. Does that make statistics another leg of science? Science today relies on highly complex theoretical models, requiring analysis via computation, and experimental setups that yield massive amounts of data, also requiring analysis via computation. So science is thoroughly computational but still has only two legs theory and experiment.
Moshe Y. Vardi
Displaying all 4 comments