“Biology and computer science—life and computation—are related. I am confident that at their interface great discoveries await those who seek them.”
—Leonard Adleman, Scientific American, Aug. 1998
Natural computing is the field of research that investigates models and computational techniques inspired by nature and, dually, attempts to understand the world around us in terms of information processing. It is a highly interdisciplinary field that connects the natural sciences with computing science, both at the level of information technology and at the level of fundamental research.33 As a matter of fact, natural computing areas and topics come in many flavors, including pure theoretical research, algorithms and software applications, as well as biology, chemistry, and physics experimental laboratory research.
In this review we describe computing paradigms abstracted from natural phenomena as diverse as self-reproduction, the functioning of the brain, Darwinian evolution, group behavior, the immune system, the characteristics of life, cell membranes, and morphogenesis. These paradigms can be implemented either on traditional electronic hardware or on alternative physical media such as biomolecular (DNA, RNA) computing, or trapped-ion quantum computing devices. Dually, we describe several natural processes that can be viewed as information processing, such as gene regulatory networks, protein-protein interaction networks, biological transport networks, and gene assembly in unicellular organisms. In the same vein, we list efforts to understand biological systems by engineering semi-synthetic organisms, and to understand the universe from the point of view of information processing.
This review was written with the expectation that the reader is a computer scientist with limited knowledge of natural sciences, and it avoids dwelling on the minute details of various natural phenomena. Thus, rather than being overwhelmed by particulars, it is our hope that readers see this article as simply a window into the profound relationship that exists between nature and computation.
There is information processing in nature, and the natural sciences are already adapting by incorporating tools and concepts from computer science at a rapid pace. Conversely, a closer look at nature from the point of view of information processing can and will change what we mean by computation. Our invitation to you, fellow computer scientists, is to take part in the uncovering of this wondrous connection.a
The vivid images peppered throughout this story offer glimpses of what can happen when nature, art, and computer science join forces. While not directly referenced in this article, these images serve to offer readers some startling perspectives of nature up close as only technology can provide.
Nature as Inspiration
Among the oldest examples of nature-inspired models of computation are the cellular automata conceived by Ulam and von Neumann in the 1940s. John von Neumann, who was trained in both mathematics and chemistry, investigated cellular automata as a framework for the understanding of the behavior of complex systems. In particular, he believed that self-reproduction was a feature essential to both biological organisms and computers.40
A cellular automaton is a dynamical system consisting of a regular grid of cells, in which space and time are discrete. Each of the cells can be in one of a finite number of states. Each cell changes its state according to a list of given transition rules that determine its future state, based on its current state and the current states of some of its neighbors. The entire grid of cells updates its configuration synchronously according to the a priori given transition rules.
Cellular automata have been applied to the study of phenomena as diverse as communication, computation, construction, growth, reproduction, competition, and evolution. One of the best known examples of cellular automata—the “game of life” invented by Conway—was shown to be computationally universal. Cellular automata have been extensively studied as an alternative explanation to the phenomenon of emergence of complexity in the natural world, and used, among others, for modeling in physics and biology.
In parallel to early comparisons39 between computing machines and the human nervous system, McCulloch and Pitts proposed the first model of artificial neurons. This research eventually gave rise to the field of neural computation, and it also had a profound influence on the foundations of automata theory. The goal of neural computation was twofold. On one hand, it was hoped that it would help unravel the structure of computation in nervous systems of living organisms (How does the brain work?). On the other hand, it was predicted that, by using the principles of how the human brain processes information, neural computation would yield significant computational advances (How can we build an intelligent computer?). The first goal has been pursued mainly within the neurosciences under the name of brain theory or computational neuroscience, while the quest for the second goal has become mainly a computer science discipline known as artificial neural networks or simply neural networks.5
An artificial neural network consists of interconnected artificial neurons.31 Modeled after the natural neurons, each artificial neuron A has n real-valued inputs, x1 x2,…, xn, and it computes its own primitive function fAs follows. Usually, the inputs have associated weights, w1 w2, …, wn. Upon receiving the n inputs, the artificial neuron A produces the output fA(w1x1 + w2x2 + … + wnxn). An artificial neural network is a network of such neurons, and thus a network of their respective primitive functions. Some neurons are selected to be the output neurons, and the network function is a vectorial function that, for n input values, associates the outputs of the m output neurons. Note that different selections of the weights produce different network functions for the same inputs. Based on given input-output pairs, the network can “learn” the weights w1 …, wn. Thus, there are three important features of any artificial neural network: the primitive function of each neuron, the topology of the network, and the learning algorithm used to find the weights of the network. One of the many examples of such learning algorithms is the “backwards propagation of errors.” Back-propagation is a supervised learning method by which the weights of the connections in the network are repeatedly adjusted so as to minimize the difference between the actual output vector of the net and the desired output vector. Artificial neural networks have proved to be a fruitful paradigm, leading to successful novel applications in both new and established application areas.
While Turing and von Neumann dreamed of understanding the brain, and possibly designing an intelligent computer that works like the brain, evolutionary computation6 emerged as another computation paradigm that drew its inspiration from a completely different part of biology: Darwinian evolution. Rather than emulating features of a single biological organism, evolutionary computation draws its inspiration from the dynamics of an entire species of organisms. An artificial evolutionary system is a computational system based on the notion of simulated evolution. It features a constant- or variable-size population of individuals, a fitness criterion according to which the individuals of the population are being evaluated, and genetically inspired operators that produce the next generation from the current one. In an evolutionary system, the initial population of individuals is generated at random or heuristically. At each evolutionary step, the individuals are evaluated according to a given fitness function. To form the next generation, offspring are first generated from selected individuals by using operators such as mutation of a parent, or recombination of pairs or larger subsets of parents. The choice of parents for recombination can be guided by a fitness-based selection operator, thus reflecting the biological principle of mate selection. Secondly, individuals of the next generation are selected from the set of newly created offspring, sometimes also including the old parents, according to their fitness—a process reflecting the biological concept of environmental selection.
Evolutionary systems have first been viewed as optimization processes in the 1930s. The basic idea of viewing evolution as a computational process gained momentum in the 1960s, and evolved along three main branches.13 Evolution strategies use evolutionary processes to solve parameter optimization problems, and are today used for real-valued as well as discrete and mixed types of parameters. Evolutionary programming originally aimed at achieving the goals of artificial intelligence via evolutionary techniques, namely by evolving populations of intelligent agents modeled, for example, as finite-state machines. Today, these algorithms are also often used for real-valued parameter optimization problems. Genetic algorithms originally featured a population of individuals encoded as fixed-length bit strings, wherein mutations consisted of bit-flips according to a typically small, uniform mutation rate, the recombination of two parents consisted of a cut-and-paste of a prefix of one parent with a suffix of the other, and the fitness function was problem-dependent. If the initial individuals were to encode possible solutions to a given problem, and the fitness function were designed to measure the op-timality of a candidate solution, then such a system would, in time, evolve to produce a near-optimal solution to the initial problem. Today, genetic algorithms are also modified heavily for applications to real-valued parameter optimization problems as well as many types of combinatorial tasks such as, for example, permutation-based problems. As another application, if the individuals were computer programs, then the genetic algorithm technique would result in “the fittest” computer programs, as is the goal of genetic programming.22
Cellular automata, neural computation, and evolutionary computation are the most established “classical” areas of natural computing. Several other bio-inspired paradigms emerged more recently, among them swarm intelligence, artificial immune systems, artificial life, membrane computing, and amorphous computing.
A computational paradigm straddling at times evolutionary computation and neural computation is swarm intelligence.16 A swarm is a group of mobile biological organisms (such as bacteria, ants, termites, bees, spiders, fish, birds) wherein each individual communicates with others either directly or indirectly by acting on its local environment. These interactions contribute to distributive collective problem solving. Swarm intelligence, sometimes referred to as collective intelligence, is defined as the problem-solving behavior that emerges from the interaction of such a collection of individual agents. For example, in research simulating flocking behavior, each individual was endowed with three simple possible behaviors: to act as to avoid collision, to match velocity with neighbors, and to stay close to nearby flock mates. The simulations showed that flocking was an emergent behavior that arose from the interaction of these simple rules.
Particle swarm optimization was introduced as a new approach to optimization that had developed from simple models of social interactions, as well as of flocking behavior in birds and other organisms. A particle swarm optimization algorithm starts with a swarm of “particles,” each representing a potential solution to a problem, similar to the population of individuals in evolutionary computation.
Particles move through a multidimensional search space and their positions are updated according to their own experience and that of their neighbors, by adding “velocity” to their current positions. The velocity of a particle depends on its previous velocity (the “inertia” component), the tendency towards the past personal best position (the cognitive, “nostalgia” component), and the move toward a global or local neighborhood best (the “social” component). The cumulative effect is that each particle converges towards a point between the global best and its personal best. Particle Swarm Optimization algorithms have been used to solve various optimization problems, and have been applied to unsupervised learning, game learning, scheduling and planning applications, and design applications.
Ant algorithms were introduced to model the foraging behavior of ant colonies. In finding the best path between their nest and a source of food, ants rely on indirect communication by laying a pheromone trail on the way back to the nest if they found food, and following the concentration of pheromones in the environment if they are looking for food. This foraging behavior has inspired a large number of ant algorithms used to solve mainly combinatorial optimization problems defined over discrete search spaces.
A closer look at nature from the point of view of information processing can and will change what we mean by computation. Our invitation to you, fellow computer scientists, is to take part in the uncovering of this wondrous connection.
Artificial immune systems are computational systems devised starting in the late 1980s and early 1990s as computationally interesting abstractions of the natural immune system of biological organisms. Viewed as an information processing system, the immune system performs many complex computations in a highly parallel and distributed fashion.11 It uses learning, memory, associative retrieval, and other mechanisms to solve recognition and classification problems such as distinction between self and nonself cells, and neutralization of nonself pathogenic agents. Indeed, the natural immune system has sometimes been called the “second brain” because of its powerful information processing capabilities.
The natural immune system’s main function is to protect our bodies against the constant attack of external pathogens (viruses, bacteria, fungi, and parasites). The main role of the immune system is to recognize cells in the body and categorize them as self or nonself.12 There are two parts of the immune system: innate (non-specific) and adaptive (acquired). The cells of the innate immune system are immediately available to combat against a wide variety of antigens, without requiring previous exposure to them. These cells possess the ability of ingesting and digesting several “known” pathogens. In contrast, the adaptive immune response is the antibody production in response to a specific new infectious agent. Our body maintains a large “combinatorial database” of immune cells that circulate throughout the body. When a foreign antigen invades the body, only a few of these immune cells can detect the invaders and physically bind to them. This detection triggers the primary immune response: the generation of a large population of cells that produce matching antibodies that aid in the destruction or neutralization of the antigen. The immune system also retains some of these specific-antibody-producing cells in immunological memory, so that any subsequent exposure to a similar antigen can lead to a rapid, and thus more effective, immune response (secondary response).
The computational aspects of the immune system, such as distinguishing of self from nonself, feature extraction, learning, memory, self-regulation, and fault tolerance, have been exploited in the design of artificial immune systems that have been successfully used in applications. The applications are varied and include computer virus detection, anomaly detection in a time series of data, fault diagnosis, pattern recognition, machine learning, bioin-formatics, optimization, robotics, and control. Recent research in immunology departs from the self-nonself discrimination model to develop what is known as the “danger theory,” wherein it is believed that the immune system differentiates between dangerous and non-dangerous entities, regardless of whether they belong to self or to non-self. These ideas have started to be exploited in artificial immune systems in the context of computer security.
While artificial immune systems (a.k.a. immunological computation, immunocomputing) constitute an example of a computational paradigm inspired by a very specific subsystem of a biological organism, artificial life takes the opposite approach. Artificial life [ALife] attempts to understand the very essence of what it means to be alive by building ab initio, within in silico computers and other “artificial” media, artificial systems that exhibit properties normally associated only with living organisms.24 Lindenmayer systems (L-systems), introduced in 1968, can be considered as an early example of artificial life.
L-systems are parallel rewriting systems that, starting with an initial word, proceed by applying rewriting rules in parallel to all the letters of the word, and thus generate new words.34 They have been most famously used to model plant growth and development,29 but also for modeling the morphology of other organisms.
Building on the ideas of evolutionary computation, other pioneers of artificial life experimented with evolving populations of “artificial creatures” in simulated environments.9 One example was the design36 of evolving virtual block creatures that were selected for their ability to swim (or walk, or jump), and that competed for a common resource (controlling a cube) in a physically simulated world endowed with realistic features such as kinematics, dynamics, gravity, collisions, and friction. The result was that creatures evolved which would extend arms towards the cube, while others would crawl or roll to reach it, and some even developed legs that they used to walk towards the cube. These ideas were taken one step further25 by combining the computational and experimental approaches, and using rapid manufacturing technology to fabricate physical robots that were materializations of their virtually evolved computational counterparts. In spite of the simplicity of the task at hand (horizontal locomotion), surprisingly different and complex robots evolved: many of them exhibited symmetry, some moved sideways in a crab-like fashion, and others crawled on two evolved limbs. This marked the emergence of mechanical artificial life, while the nascent field of synthetic biology, discussed later, explores a biological implementation of similar ideas. At the same time, the field of Artificial Life continues to explore directions such as artificial chemistry (abstractions of natural molecular processes), as well as traditionally biological phenomena in artificial systems, ranging from computational processes such as co-evolutionary adaptation and development, to physical processes such as growth, self-replication, and self-repair.
While artificial immune systems constitute an example of a computational paradigm inspired by a very specific subsystem of a biological organism, artificial life attempts to understand the very essence of what it means to be alive.
Membrane computing investigates computing models abstracted from the structure and the functioning of living cells, as well as from the way the cells are organized in tissues or higher order structures.26 More specifically, the feature of the living cells that is abstracted by membrane computing is their compartmentalized internal structure effected by membranes. A generic membrane system is essentially a nested hierarchical structure of cell-like compartments or regions, delimited by “membranes.” The entire system is enclosed in an external membrane, called the skin membrane, and everything outside the skin membrane is considered to be the environment. Each membrane-enveloped region contains objects and transformation rules which modify these objects, as well as specify whether they will be transferred outside or stay inside the region. The transfer thus provides for communication between regions. Various formal mechanisms were developed that reflect the selective manner in which biological membranes allow molecules to pass through them.
Another biologically inspired feature of membrane systems as mathematical constructs is the fact that, instead of dealing with sets of objects, one uses multisets wherein one keeps track of the multiplicity of each object. The computational behavior of a membrane system starts with an initial input configuration and proceeds in a maximally parallel manner by the non-deterministic choice of application of the transformation rules, as well as of the objects to which they are to be applied. The output of the computation is then collected from an a priori determined output membrane. Next to the basic features indicated previously, many alternatives of membrane systems have been considered, among them ones that allow for membranes to be dissolved and created. Typical applications of membrane systems include biology (modeling photosynthesis and certain signaling pathways, quorum sensing in bacteria, modeling cell-mediated immunity), computer science (computer graphics, public-key cryptography, approximation and sorting algorithms, and solving computationally hard problems), and linguistics.
Amorphous computing is a paradigm that draws inspiration from the development of form (morphogenesis) in biological organisms, wherein interactions of cells guided by a genetic program give rise to well-defined shapes and functional structures. Analogously, an amorphous computing medium comprises a multitude of irregularly placed, asynchronous, locally interacting computing elements.1 These identically programmed “computational particles” communicate only with particles situated within a small given radius, and may give rise to certain shapes and patterns such as, for example, any pre-specified planar graph. The goal of amorphous computing is to engineer specified coherent computational behaviors from the interaction of large quantities of such unreliable computational particles interconnected in unknown, irregular, and time-varying ways. At the same time, the emphasis is on devising new programming abstractions that would work well for amorphous computing environments. Amorphous computing has been used both as a programming paradigm using traditional hardware, and as the basis for “cellular computing,” discussed later, under the topics synthetic biology, and computation in living cells.
Nature as Implementation Substrate
In the preceding section we saw cellular automata inspired by self-reproduction, neural computation by the functioning of the brain, evolutionary computation by the Darwinian evolution of species, swarm intelligence by the behavior of groups of organisms, artificial immune systems by the natural immune system, artificial life by properties of life in general, membrane computing by the compartmentalized organization of the cells, and amorphous computing by morphogenesis. All these are computational techniques that, while inspired by nature, have been implemented until now mostly on traditional electronic hardware. An entirely distinct category is that of computing paradigms that use a radically different type of “hardware.” This category includes molecular computing and quantum computing.b
Molecular computing (known also as biomolecular computing, biocomputing, biochemical computing, DNA computing), is based on the idea that data can be encoded as biomolecules—such as DNA strands—and molecular biology tools can be used to transform this data to perform, for example, arithmetic or logic operations. The birth of this field was the 1994 breakthrough experiment by Leonard Adleman who solved a small instance of the Hamiltonian Path Problem solely by manipulating DNA strands in test tubes.2
DNA (deoxyribonucleic acid) is a linear chain made up of four different types of nucleotides, each consisting of a base (Adenine, Cytosine, Guanine, or Thymine) and a sugar-phosphate unit. The sugar-phosphate units are linked together by covalent bonds to form the backbone of the DNA single strand. Since nucleotides may differ only by their bases, a DNA strand can be viewed as simply a word over the four-letter alphabet {A,C,G,T}. A DNA single strand has an orientation, with one end known as the 5′ end, and the other as the 3′ end, based on their chemical properties. By convention, a word over the DNA alphabet represents the corresponding DNA single strand in the 5′ to 3′ orientation, that is, the word GGTTTTT stands for the DNA single strand 5′- GGTTTTT -3′. A crucial feature of DNA single strands is their Watson-Crick complementarity: A is complementary to T, G is complementary to C, and two complementary DNA single strands with opposite orientation bind to each other by hydrogen bonds between their individual bases. In so doing, they form a stable DNA double strand resembling a helical ladder, with the backbones at the outside and the bound pairs of bases lying inside. For example, the DNA single strand 5′- AAAAACC – 3′ will bind to the DNA single strand 5′- GGTTTTT – 3′ to form the 7 base-pair-long (7bp) double strand
Another molecule that can be used for computation is ribonucleic acid, RNA. While similar to DNA, RNA differs in three main aspects: RNA is usually single-stranded while DNA is usually double-stranded, RNA nucleotides contain the sugar ribose, while DNA nucleotides contain the sugar deoxyribose, and in RNA the nucleotide Uracil, U, substitutes for Thymine, which is present in DNA.
There are many possible DNA bio-operations that one can use for computations,21 such as: cut-and-paste operations achievable by enzymes, synthesizing desired DNA strands up to a certain length, making exponentially many copies of a DNA strand, and reading out the sequence of a DNA strand. These bio-operations and the Watson-Crick complementary binding have all been used to control DNA computations and DNA robotic operations. While initial experiments solvedsimple instances of computational problems, more recent experiments tackled successfully sophisticated computational problems, such as a 20-variable instance of the 3-Satisfiability-Problem. The efforts toward building an autonomous molecular computer include implementations of computational state transitions with biomolecules, and a DNA implementation of a finite automaton with potential applications to the design of smart drugs.
More importantly, since 1994, research in molecular computing has gained several new dimensions. One of the most significant achievements of molecular computing has been its contribution to the massive stream of research in nanosciences, by providing computational insights into a number of fundamental issues. Perhaps the most notable is its contribution to the understanding of self-assembly, which is among the key concepts in nanosciences.30 Recent experimental research into programmable molecular-scale devices has produced impressive self-assembled DNA nanostructures35 such as cubes, octahedra, Sierpinski triangles,32 DNA origami, or intricate nanostructures that achieve computation such as binary counting, or bit-wise cumulative XOR. Other experiments include the construction of DNA-based logic circuits, and ribozymes that can be used to perform logical operations and simple computations. In addition, an array of ingenious DNA nanoma-chines8 were built with potential uses to nanofabrication, engineering, and computation: molecular switches that can be driven between two conformations, DNA “tweezers,” DNA “walkers” that can be moved along a track, and autonomous molecular motors.
A significant amount of research in molecular computing has been dedicated to the study of theoretical models of DNA computation and their properties. The model of DNA computing introduced by Head, based on splicing (a combination of cut-and-paste operations achievable by enzymes), predates the experimental proof-of-principle of DNA computing by almost 10 years. Subsequently, studies on the computational power of such models proved that various subsets of bio-operations can achieve the computational power of a Turing machine, showing thus that molecular computers are in principle possible.27 Overall, molecular computing has created many novel theoretical questions, and has considerably enriched the theory of computation.
Quantum Computing is another paradigm that uses an alternative “hardware” for performing computations.19 Already in 1980 Benioff introduced simulations of classical Turing Machines on quantum mechanical systems. However the idea of a quantum computer that would run according to the laws of quantum physics and operate exponentially faster than a deterministic electronic computer to simulate physics, was first suggested by Feynman in 1982. Subsequently, Deutsch introduced a formal model of quantum computing using a Turing machine formalism, and described a universal quantum computer.
A quantum computer uses distinctively quantum mechanical phenomena, such as superposition and entanglement, to perform operations on data stored as quantum bits (qubits). A qubit can hold a 1, a 0, or a quantum superposition of these. A quantum computer operates by manipulating those qubits with quantum logic gates. The notion of information is different when studied at the quantum level. For instance, quantum information cannot be measured reliably, and any attempt at measuring it entails an unavoidable and irreversible disturbance.
The 1980s saw an abundance of research in quantum information processing, such as applications to quantum cryptography which, unlike its classical counterpart, is not usually based on the complexity of computation but on the special properties of quantum information. Recently an open air experiment was reported in quantum cryptography (not involving optical cable) over a distance of 144km, conducted between two Canary islands.
It is indeed believed that one of the possible contributions of computer science to biology could be the development of a suitable language to accurately and succinctly describe, and reason about, biological concepts and phenomena.
The theoretical results that catapulted quantum computing to the forefront of computing research were Shor’s quantum algorithms for factoring integers and extracting discrete logarithms in polynomial time, obtained in 1994—the same year that saw the first DNA computing experiment by Adleman. A problem where quantum computers were shown to have a quadratic time advantage when compared to classical computers is quantum database search that can be solved by Grover’s algorithm. Possible applications of Shor’s algorithm include breaking RSA exponentially faster than an electronic computer. This joined other exciting applications, such as quantum teleportation (a technique that transfers a quantum state, but not matter or energy, to an arbitrarily distant location), in sustaining the general interest in quantum information processing.
So far, the theory of quantum computing has been far more developed than the practice. Practical quantum computations use a variety of implementation methods such as ion-traps, superconductors, nuclear magnetic resonance techniques, to name just a few. To date, the largest quantum computing experiment uses liquid state nuclear magnetic resonance quantum information processors that can operate on up to 12 qubits.
Nature as Computation
The preceding sections describe research on the theory, applications and experimental implementations of nature-inspired computational models and techniques. A dual direction of research in natural computing is one in which the main goal becomes understanding nature by viewing processes that take place in nature as information processing.
This dual aspect can be seen in systems biology, and especially in computational systems biology, wherein the adjective “computational” has two meanings. On one hand it means the use of quantitative algorithms for computations, or simulations that complement experiments in hypothesis generation and validation. On the other hand, it means a qualitative approach that investigates processes taking place in cells through the prism of communications and interactions, and thus of computations. We shall herein address mostly the second aspect, whereby systems biology aims to understand the complex interactions in biological systems by using an integrative as opposed to a reductionist approach. The reductionist approach to biology tries to identify all the individual components of functional processes that take place in an organism, in such a way that the processes and the interactions between the components can be understood. In contrast, systems biology takes a systemic approach in focusing instead on the interaction networks themselves, and on the properties of the biological systems that arise because of these interaction networks. Hence, for example, at the cell level, scientific research on organic components has focused strongly on four different interdependent interaction networks, based on four different “biochemical toolkits:” nucleic acids (DNA and RNA), proteins, lipids, carbohydrates, and their building blocks (see Cardelli,10 whose categorization we follow here).
The genome consists of DNA sequences, some of which are genes that can be transcribed into messenger RNA (mRNA), and then translated into proteins according to the genetic code that maps 3-letter DNA segments into amino acids. A protein is a sequence over the 20-letter alphabet of amino acids. Each gene is associated with other DNA segments (promoters, enhancers, or silencers) that act as binding sites for proteins that activate or repress the gene’s transcription. Genes interact with each other indirectly, either through their gene products (mRNA, proteins), which can act as transcription factors to regulate gene transcription—either as activators or repressors—or through small RNA species that directly regulate genes.
These gene-gene interactions, together with the genes’ interactions with other substances in the cell, form the most basic interaction network of an organism, the gene regulatory network. Gene regulatory networks perform information processing tasks within the cell, including the assembly and maintenance of the other networks. Research into modeling gene regulatory networks includes qualitative models such as random and probabilistic Boolean networks, asynchronous automata, and network motifs.
As the natural sciences are rapidly absorbing ideas of information processing, and the meaning of computation is changing as it embraces concepts from the natural sciences, we have the rare privilege to take part in several such metamorphoses.
Another point of view,20 is that the entire genomic regulatory system can be thought of as a computational system, the “genomic computer.” Such a perspective has the potential to yield insights into both computation as humans historically designed it, and computation as it occurs in nature. There are both similarities and significant differences between the genomic computer and an electronic computer. Both perform computations, the genomic computer on a much larger scale. However, in a genomic computer, molecular transport and movement of ions through electrochemical gradients replace wires, causal coordination replaces imposed temporal synchrony, changeable architecture replaces rigid structure, and communication channels are formed on an as-needed basis. Both computers have a passive memory, but the genomic computer does not place it in an a priori dedicated and rigidly defined place; in addition, the genomic computer has a dynamic memory in which, for example, trancriptional subcircuits maintain given regulatory states. In a genomic computer robustness is achieved by different means, such as by rigorous selection: non (or poorly)-functional processes are rapidly degraded by various feedback mechanisms or, at the cell level, non (or poorly)-functional cells are rapidly killed by apoptosis, and, at the organism level, non (or poorly)-functional organisms are rapidly out-competed by more fit species. Finally, in the case of a genomic computer, the distinction between hardware and software breaks down: the genomic DNA provides both the hardware and the digital regulatory code (software).
Proteins and their interactions form another interaction network in a cell, that of biochemical networks, which perform all mechanical and metabolic tasks inside a cell. Proteins are folded-up strings of amino acids that take three-dimensional shapes, with possible characteristic interaction sites accessible to other molecules. If the binding of interaction sites is energetically favorable, two or more proteins may specifically bind to each other to form a dynamic protein complex by a process called complexation. A protein complex may act as a catalyst by bringing together other compounds and facilitating chemical reactions between them. Proteins may also chemically modify each other by attaching or removing modifying groups, such as phosphate groups, at specific sites. Each such modification may reveal new interaction surfaces. There are tens of thousands of proteins in a cell. At any given moment, each of them has certain available binding sites (which means that they can bind to other proteins, DNA, or membranes), and each of them has modifying groups at specific sites either present or absent. Protein-protein interaction networks are large and complex, and finding a language to describe them is a difficult task. Significant progress in this direction was made by the introduction of Kohn-maps, a graphical notation that resulted in succinct pictures depicting molecular interactions. Other approaches include the textual bio-calculus, or the recent use of existing process calculi (P-calculus), enriched with stochastic features, as the language to describe chemical interactions.
Yet another biological interaction network, and the last that we discuss here, is that of transport networks mediated by lipid membranes. Some lipids can self-assemble into membranes and contribute to the separation and transport of substances, forming transport networks. A biological membrane is more than a container: it consists of a lipid bilayer in which proteins and other molecules, such as glycolipids, are embedded. The membrane structural components, as well as the embedded proteins or glycolipids, can travel along this lipid bilayer. Proteins can interact with free-floating molecules, and some of these interactions trigger signal transduction pathways, leading to gene transcription. Basic operations of membranes include fusion of two membranes into one, and fission of a membrane into two. Other operations involve transport, for example transporting an object to an interior compartment where it can be degraded. Formalisms that depict the transport networks are few, and include membrane systems described earlier, and brane calculi.
The gene regulatory networks, the protein-protein interaction networks, and the transport networks are all interlinked and interdependent. Genes code for proteins which, in turn, can regulate the transcription of other genes, membranes are separators but also embed active proteins in their surfaces. Currently there is no single formal general framework and notation able to describe all these networks and their interactions. Process calculus has been proposed for this purpose, but a generally accepted common language to describe these biological phenomena is still to be developed and universally accepted. It is indeed believed that one of the possible contributions of computer science to biology could be the development of a suitable language to accurately and succinctly describe, and reason about, biological concepts and phenomena.18
While systems biology studies complex biological organisms as integrated wholes, synthetic biology is an effort to engineer artificial biological systems from their constituent parts. The mantra of synthetic biology is that one can understand only what one can construct. Thus, the main focus of synthetic biology is to take parts of natural biological systems and use them to build an artificial biological system for the purpose of understanding natural phenomena, or for a variety of possible applications. In this sense, one can make an analogy between synthetic biology and computer engineering.3 The history of synthetic biology can be arguably traced back to the discovery in the 1960s, by Jacob and Monod, of mathematical logic in gene regulation. Early achievements in genetic engineering using recombinant DNA technology (the insertion, deletion, or combination of different segments of DNA strands) can be viewed as the experimental precursors of today’s synthetic biology, which now extends these techniques to entire systems of genes and gene products. One goal can be constructing specific synthetic biological modules such as, for example, pulse generator circuits that display a transient response to variations in input stimulus.
Advances in DNA synthesis of longer and longer strands of DNA are paving the way for the construction of synthetic genomes with the purpose of building an entirely artificial organism. Progress includes the generation of a 5,386bp synthetic genome of a virus, by rapid (14-day) assembly of chemically synthesized short DNA strands.37 Recently an announcement was made of the near completion of the assembly of an entire “minimal genome” of a bacterium, Mycoplasma Genitalium.7 Smith and others indeed found about 100 dispensable genes that can be removed individually from the original genome. They hope to assemble a minimal genome consisting of essential genes only, that would be still viable but shorter than the 528-gene, 580,000bp genome of M.Genitalium. This human-made genome could then be inserted into a Mycoplasma bacterium using a technique wherein a whole genome can be transplanted from one species into another, such that the resulting progeny is the same species as the donor genome. Counterbalancing objections to assembling a semi-synthetic cell without fully understanding its functioning, the creation of a functionally and structurally understood synthetic genome was proposed,17 containing 151 genes (113,000bp) that would produce all the basic molecular machinery for protein synthesis and DNA replication. A third approach to create a human-made cell is the one pursued by Szostak and others, who would construct a single type of RNA-like molecule capable of self-replicating, possibly housed in a single lipid membrane. Such molecules can be obtained by guiding the rapid evolution of an initial population of RNA-like molecules, by selecting for desired traits.
Lastly, another effort in synthetic biology is toward engineering multi-cellular systems by designing, for example, cell-to-cell communication modules that could be used to coordinate living bacterial cell populations.
Research in synthetic biology faces many challenges, some of them of an information processing nature. There arguably is a pressing need for standardization, modularization, and abstraction, to allow focusing on design principles without reference to lower-level details.15
Besides systems biology that tries to understand biological organisms as networks of interactions, and synthetic biology that seeks to engineer and build artificial biological systems, another approach to understanding nature as computation is the research on computation in living cells. This is also sometimes called cellular computing, or in vivo computing, and one particular study in this area is that of the computational capabilities of gene assembly in unicellular organisms called ciliates.
Ciliates possess two copies of their DNA: one copy encoding functional genes, in the macronucleus, and another “encrypted” copy in the micro-nucleus. In the process of conjugation, after two ciliates exchange genetic information and form new micronuclei, they use the new micronuclei to assemble in real-time new macronuclei necessary for their survival. This is accomplished by a process that involves re-ordering some fragments of DNA (permutations and possibly inversions), and deleting other fragments from the micronuclear copy. The process of gene assembly is fascinating from both the biological and the computational point of view. From the computational point of view, this study led to many novel and challenging research themes.14 Among others, it was proved that various models of gene assembly have full Turing machine capabilities.23 From the biological point of view, the joint effort of computer scientists and biologists led to a plausible hypothesis (supported already by some experimental data) about the “bioware” that implements the process of gene assembly, which is based on the new concept of template-guided recombination.4,28
Other approaches to cellular computing include developing an in vivo programmable and autonomous finite-state automaton within E.Coli, and designing and constructing in vivo cellular logic gates and genetic circuits that harness the cell’s existing biochemical processes.
At the end of this spectrum of views of nature as computation, the idea was even advanced by Zuse and Fredkin in the 1960s that information is more fundamental than matter or energy. The Zuse-Fredkin thesis stated that the entire universe is some kind of computational device, namely a huge cellular automaton continuously updating its rules. Along the same lines, it has been recently suggested that the universe is a quantum computer that computes itself and its own behavior.
Natural Sciences: Ours to Discover
Science advances in ever-widening circles of knowledge. Sometimes it meticulously crawls. Other times it leaps to a new dimension of understanding and, in the process, it reinvents itself. As the natural sciences are rapidly absorbing ideas of information processing, and the meaning of computation is changing as it embraces concepts from the natural sciences, we have the rare privilege to take part in several such metamorphoses.
At this moment we and our natural scientist fellows are awash in wave after gigantic wave of experimental, especially biological, data. Just underneath this tumultuous surface lie ingenious algorithms waiting to be designed, elegant theorems waiting to be proven, natural laws waiting to be discovered that will put order into chaos. For, as Spinoza wrote, “nothing happens in nature that does not follow from her laws.”
Conversely, as this review shows, there is an abundance of natural phenomena that can inspire computing paradigms, alternative physical substrates on which to implement computations, while viewing various natural processes as computations has become more and more essential, desirable, and inevitable. All these developments are challenging our assumptions about computation, and indeed, our very definition of it.
In these times brimming with excitement, our task is nothing less than to discover a new, broader, notion of computation, and to understand the world around us in terms of information processing.
Let us step up to this challenge. Let us befriend our fellow the biologist, our fellow the chemist, our fellow the physicist, and let us together explore this new world. Let us, as computers in the future will, embrace uncertainty. Let us dare to ask afresh: “What is computation?”, “What is complexity?”, “What are the axioms that define life?”
Let us relax our hardened ways of thinking and, with deference to our scientific forebears, let us begin anew.
Literature
The upper-bound placed on the number of references was a real limitation for this review, since the literature on natural computing is vast. For a more complete list of references the reader is referred to the full version of this article at www.csd.uwo.ca/~lila/Natural-Computing-Review.pdf.
Almost each of the areas we mentioned here has an extensive scientific literature as well as a number of specialized journals and book series. There are also journals and book series aimed at the general natural computing community, among them the journals Natural Computing, Springer, Theoretical Computer Science, Series C: Theory of Natural Computing, Elsevier, the Natural Computing book series, Springer, and the upcoming Handbook of Natural Computing (G. Rozenberg, T. Bäck, J. Kok, editors, Springer).
Acknowledgments
We gratefully acknowledge comments on early drafts of this paper by T. Bäck, D. Bentley, G. Brassard, D. Corne, M. Hirvensalo, J. Kari, P. Krishna, H. Lipson, R. Mercer, A. Salomaa, K. Sims, H. Spaink, J. Timmis, C. Torras, S. Watt, R. Weiss.
This work was supported by NSERC Discovery Grant and Canada Research Chair Award to L.K., and NSF grant 0622112 to G.R.
Figures
Figure. Neri Oxman, an architect and researcher currently working for her Ph.D. in design and computation at MIT, formed an interdisciplinary research initiative called Materialecology that undertakes design research in the intersection between architecture, engineering, computation, biology and ecology. Here, she illustrates how plants often grow in fashion to maximize the surface area of their branching geometries while maintaining structural support. This work was done in collaboration with W. Craig Carter, a professor in MIT’s Department of Material Science and Engineering. For more images, see http://www. materialecology.com/.
Figure. From Archimorph, where work is continuing on their L-System and Evolutionary Algorithm, including new images of L-Systems growths as well as diagrams explaining the process of the overall design. For more images, see archimorph. wordpress.com/.
Figure. McGill University’s Laboratory for Natural and Simulated Cognition (LNSC) investigates human cognition through a combination of psychological and computational approaches. Using the Cascade-correlation algorithm, LNSC researchers created a program that outputs a 2D display of random output values of neural networks. The results are sometimes quite phenomenal and artistic. For more, see www.psych.mcgill.ca/labs/lnsc/.
Figure. Paul W.K. Rothemund, a senior research associate at California Institute of Technology, has developed a method of creating nanoscale shapes and patterns using DNA. The smiley faces are actually giant DNA complexes called “scaffolded DNA origami.” Rothemund notes that while the smiley face shape may appear silly, there is serious science behind it. He hopes to use this DN A origami (and other DNA nanotechnologies) to build smaller, faster computers and devices. For more on his work, visit http://www.dna.caltech.edu/~pwkr/.
Figure. Artist Jonathan McCabe’s interests include theories of biological pattern formation and evolution and their application to computer art. He writes computer programs that measure statistical properties of images for use in artificial evolution of computer art. For more, see www.jonathanmccabe.com/.
Figure. European artist Leonel Moura works with AI and robotics. The Swarm Paintings, produced in 2001, were the result of several experiments with an “Ant Algorithm” where he tried to apply virtual emergent pheromone trails to a real space pictorial expression. In this case, a computer running an ant algorithm was connected to a robotic arm that “translated” in pencil or brush strokes the trails generated by the artificial swarm of ants. For more images, see www.leonelmoura.com/.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment