Look around you, and you will be stunned by the work of evolution. According to Nobel Laureate Jacques Monod, a strange thing about evolution is that all educated persons think they understand it fairly well, and yet very fewif any, one may grumbleactually do. Understanding evolution is essential: "Nothing in biology makes sense except in the light of evolution," famously said the eminent 20th century biologist Theodosius Dobzhansky. And evolution is closer to home than black holes and other mysteries of scienceit feels almost like your family history.
More so than most scientific fields, the theory of evolution has a sharp beginning: the publication of Charles Darwin's The Origin of Species in 1859.8 But of course nothing is that simple: during the first half of the 19th century, several scientists were convinced the diversity of life we see around us must be the result of evolution (see the sidebar on Charles Babbage and the accompanying figure). Darwin's immense contribution lies in three things: His identification of natural selection as the engine of evolution; his articulation of the common descent hypothesis, stating that different species came from common ancestors, and further implying that all life came from a common source; and the unparalleled force of argument with which he empowered his theory. But of course The Origin was far from the last word on the subject: Darwin knew nothing about genetics, and had no clue about the role of sex in evolution, among several important gaps. On the ultimate reason for sex, for instance, he wrote, "the whole subject is as yet hidden in darkness."
Mathematics has informed the theory early on. Mendel discovered genetics by appreciating mathematical patterns in the ratios of sibling pea plants exhibiting different characteristics and model building. When his laws were rediscovered 40 years later, their discrete nature was misunderstood as being at loggerheads with Darwin's continuous perception of traits. A deep scientific crisis raged for two decades, and was eventually resolved with the help of mathematics: discrete alleles (see the glossary) can result, through cumulative contributions, in continuous phenotypic traits. During the 1920s and the next two decades, R.A. Fisher, J.B.S. Haldane, and S. Wright developed mathematical equations for predicting how the genetic composition of a population changes over the generations. This mathematical theory of population genetics is introduced briefly in the sidebar "The Equation that Reconciled Darwin and Mendel." It is key to what is called the "modern evolutionary synthesis"the 20th-century view of evolution, because it proposed one way of unifying Darwinism and Mendelism.
Not only is sex essentially universal, but it seems to be very much center stage in life, the basis of a fantastic variety of behavior and structure.
During the near-century since then, the study of evolution has flourished into a mature, comprehensive, and prestigious scientific discipline, while over the past two decades it has been inundated by a deluge of molecular data, a vast scientific gold mine that informsand often challengesits tenets. And yet, despite the towering accomplishments of modern evolutionary biology, there are several important questions that are beyond our current understanding:
What is the role of sex in evolution? Reproduction with recombination is almost ubiquitous in life (even bacteria exchange genetic material), while obligate asexual species appear to be rare evolutionary dead ends. And yet there is no agreement among the experts as to what makes sex so advantageous.
What exactly is evolution optimizing, if anything? With all the evolution-inspired optimization heuristics coded by computer scientists (as we discuss next), this questionalso very much contemplated by biologists over the decadescomes to the fore.
The paradox of variation. Genetic variation in humans, and in many other species, is much higher than the theory predicted due to selection; and assuming the variation is neutral has its problems too.
Are mutations completely random? We know they are far from uniformly distributed in the genome, but can they be the results of elaborate genetic mechanisms?
As we recount in this article, recent joint research by computer scientists and biologists, bringing ideas and concepts from computation into biology, has made quite unexpected progress in these questions. Additional background and literature in the online appendix is available in the ACM Digital Library (dl.acm.org) under Source Material.
Over the past 70 years, computer scientists, starting with von Neumann,37 have been inspired and intrigued by evolution. During the 1950s, computer scientists working in optimization developed local search heuristics: Start with a random solution and repeat the following step: If there is a "mutation" of this solution that is better than the current one, change to that, until a local optimum is discovered. By "mutation" (much more often the word "neighbor" is used), we mean a solution differing from the present one in a very small number of features; in the traveling salesman problem, for example, a mutation could change two, or three, edges of the tour to form a new tour. This process is repeated many times from random starts, a stratagem that can be seen as a sequential way of maintaining a population (see Papadimitriou and Steiglitz32 for a survey on the 1980s).
This basic idea of local search was enhanced in the 1980s by a thermodynamic metaphor,20 to help the algorithm escape local optima and barriers: Simulated annealing, as this variant of local search is called, allows the adoption of even a disadvantageous mutation, albeit with a probability decreasing with the disadvantage, and with time. A further variant called go with the winners1 is closer to evolution in that it keeps a population of solutions, teleporting the individuals that are stuck at local optima to the more promising spots. Notice that all these heuristics are inspired by asexual evolution (no recombination between solutions happens); heuristics of this genre have been used successfully in many realms of practice, and there are several practically important hard problems, such as graph partitioning, for which such heuristics are competitive with the best known.
During the 1970s, a different family of heuristics called genetic algorithms was proposed by John Holland:14 A population of solutions evolves through mutations and recombination. Recombination is much more difficult to apply to optimization problems, because it pre-supposes a genetic code, mapping the features of the problem to recombinable genes (to see the difficulty, think of the traveling salesman problem: tours can mutate, but they cannot recombine). The evolutionary fitness of each individual solution in the population (that is, the number of children it will have) is proportional to the quantity being maximized in the underlying problem. After many generations, the population will presumably include some excellent solutions. Holland's idea had instant appeal and immense following, and by now there is a vast bibliography on the subject (see, for example, Mitchell28 and Goldberg12). The terms evolutionary algorithms and evolutionary computation are often used as rough synonyms of "genetic algorithms," but often they describe more general concepts, such as the very interesting algorithmic work also categorized as research in artificial lifewhose purpose is not to find good solutions to a practical optimization problem, but instead to understand evolution in nature by exposing novel, complex evolutionary phenomena in silico, for an example, see Jong.16
Coming back to the genetic algorithms, several successes in solving actual optimization problems have been reported in the literature, but the general impression seems to prevail that genetic algorithms are far less successful in practice than local search and simulating annealing. If this is true, it is quite remarkablea great scientific mysterybecause in nature the exact opposite happens: Recombination is successful and ubiquitous, while obligate asexual reproduction is extremely rare and struggling.
The authors started collaborating on a computational understanding of evolution in 2006, precisely in order to investigate this mystery, and we recount our findings in this article. At exactly the same year, Leslie Valiant first wrote on his theory of evolvability, another attempt at understanding evolution in computational terms.36 Valiant sees evolution as an approximation of an ideal fitness function by a polynomially large population of genotypes in polynomially many generations35 through learning by mutations. Notably, there is no recombination (sex) in his theory, even though it can be added for a modest advantage.17 Several natural classes of functions are evolvable in this sense; in fact, functions susceptible to a limited weak form of learning called statistical learning.18 In the section "A Game between Genes," we discuss another interesting connection between machine learning algorithms and evolution.
Sex is nearly universal in life: it occurs in animals and plants by the coming together of sperm and egg, in fungi by the fusion of hyphae, and even in bacteria:34 Two bacterial cells can pair up, for example, and build a bridge between them through which genes are transferred. Many species engage in asexual reproduction, or in selfing, at some times, but also engage in sexual reproduction at other times, keeping their genotypes well shuffled. In contrast, species that do not exchange genes in any form or manner, called "obligate asexuals," are extremely rare, inhabiting sparse, recent twigs of the tree of life, coming from sexual ancestral species that lost their sexuality, and heading toward eventual extinction without producing daughter species.22
Not only is sex essentially universal, but it seems to be very much center stage in life, the basis of a fantastic variety of behavior and structure: from bacterial conjugation to the intense molecular machinery of meiosis (cell division producing gametes), from flower coloration to bird courtship dances, from stag fights all the way to the drama of human passion, much of life seems to revolve around sex. So why? What role might sex play in evolution?
One common answer is that sex generates vast genetic diversity, and hence it must help evolution. But, just as sex puts together genetic combinations, it also breaks them down: a highly successful genotype will be absent from the next generation, as children inherit half their genes from each parent. To say the role of sex is to create particular, highly favorable genetic combinations is like watching a man catch fish only to toss them back to sea, and concluding that he wants to bring food to his family's table. (Incidentally, the designers of genetic algorithms are well aware of this downside of sex, and often allow the most successful individuals into the next generation, a stratagem known as "elitism," which however cannot be easily imitated by nature.)
Evolutionary theorists have labored for about a century to find other explanations for the role of sex in evolution, but all 20th century explanations are valid only under specific conditions, contradicting the prevalence of sex in nature.9,a
This is not a small problem. Imagine, for example, that even though much of the terrestrial world is green, we had no clue why leaves exist. That would have been a pretty big gap in our understanding of nature. Not knowing the role of sex is an even bigger gap, because far more life forms exchange genes than photosynthesize. It is no wonder the role of sex has been called "the queen of problems" in evolutionary biology.6
Since sex breaks down genetic combinations, it has been mainly thought in evolution that effective selection acts on individual alleles,38 that is, each (non-neutral) allele is either beneficial or detrimental on its own. According to this line of thinking, two main forces drive allele frequencies: selection acting on alleles as independent actors (where alleles are often assumed to be making additive contributions to fitness), and random genetic drift (chance sampling effects on allele frequency, as discussed previously). The interaction between alleles, within and between locieven though it has been of interest in population genetics from the start10,39has played a secondary role, often being treated as a mere correction to the above, under the term "epistasis." A few years ago, while working with biologists Marcus Feldman and Jonathan Dushoff and computer scientist Nicholas Pippenger, we asked whether interactions between alleles could be crucial to the understanding of the role of sex in a yet unexplored manner.23,24,25,26 Based on the standard equations used to describe how genotype frequencies change over generations (see the Darwin and Mendel sidebar), we demonstrated an important difference between sex and asex: In asexual evolution, the best combination of alleles always prevails. In the presence of sex, however, natural selection favors "mixable" alleles, those alleles that, even though they may not participate in any truly great genetic combinations, perform adequately across a wide variety of different combinations.23-26 To put it differently, in the hypothetical three-by-three fitness landscape in the sidebar, the winner of asexual evolution will be the largest entry of the fitness matrix (in this case, 1.05). In contrast, sexual evolution will favor, roughly speaking, those alleles (rows and columns) with larger average value; where "average" takes into account the prevalence of these genotypes in the population, as we will explore.b
One of the most central and striking themes of algorithms research in the past few decades has been the surprising power of randomization.29 Paradoxically, evoking chance is often the safest and most purposeful and effective way of solving a computational problem. For one, it helps avoid the worst case, as in Quicksort. Second, sampling from a distribution D helps decide between competing hypotheses about D: Randomized algorithms for software testing, as well as for testing primality, or validity of polynomial identities, are like this.
Evolution under sex can be seen as an instance of a randomized algorithm of the latter type. Suppose we want to design a hypothetical evolutionary experiment for determining whether a new allele of a particular gene performs better than its alternative, across all genetic combinations. If the population is asexual, this could be done by inserting this mutation in the genome of one individual, and gauging the lineage thus founded to see if it thrives. This kind of sampling is very inefficient, because we sample from a small pool (the genotypes that happen to be available in the population), and must repeat the insertion many timesin many individuals. But if the population is sexual, then by inserting the mutation once, after log n generations, where n is the number of genes with which this particular allele interacts,33 we will be sampling from all possible genetic combinations that could in principle be constructed. Sex enables evolution to sample quickly from the entire space of genetic combinations, in the distribution under which they appear in the population. What is more, evolution under sex not only decides among the competing hypotheses (which allele performs better), but also implements this decision (eventually, and with high probability, it will fix the winner).
Finally, for yet another take on explaining the ubiquity of sex in life, in terms and concepts familiar to computer scientists, view The Red-Blue Tree Theorem sidebar.
The search in population genetics for a quantity that is optimized by natural selection has a long history. Fisher wanted a theory for evolution with a mathematical law as clean and central as the second law of thermodynamics,10 while Wright pointed out that the frequency of an allele in a diploid locus changes in the direction that increases the population's mean fitness.40 Later, investigators tried their hand again at looking for a Lyapunov function that will describe evolution, albeit with little success.
Our search for an analytical maximization principle involving mixability ended with a surprise: We did not answer the question "What is evolution optimizing?" but, perhaps more interestingly, we identified the quantity that each gene seems to be optimizing during evolution under sex. Together with Erick Chastain and Umesh Vazirani,7 we focused on the standard equations described in the Darwin Mendel sidebar, in a particular evolutionary regime known as weak selection.30 Weak selection is the widely held assumption that fitness differences between genotypes are small. The fitness of a genotype g in this regime is written Fg = 1 + sg, where s is small and g is the differential fitness of the genotype, ranging in [ 1, 1].
Working with the weak selection assumption, and after some algebraic manipulation, we noticed the equations of the evolution of a population under sex are mathematically equivalent to a novel process, which entails an entirely different way of looking at evolution:
The process is a game between the genes of the organism. Recall that a game is a mathematical model of the interaction between several players, each player having a set of available actions or pure strategies, and a utility function: the objective the player is trying to maximize in the game, a mapping from choices of actions, one for each player, to a real number. The game here is repeated, that is, the same precise game is played over and over for many rounds, with each round corresponding to a generation of the population.
The available actions of each gene/player correspond to the gene's alleles.
At every generation, each gene chooses and plays a mixed strategy: it randomizes over its actions, that is to say, its alleles. The probability assigned to each allele in this mixed strategy corresponds to the frequency of the allele in the population during this generation.
The intricacy of game theory is mostly due to conflicts between the objectives of the players. But in the game played by the genes there are no conflictsthis is not Dawkins's selfish gene metaphor: All players have the same utility function, which is the fitness of the genotype resulting from the players' choices. Games of identical utilities are called coordination games in game theory, and are the simplest possible kind. They are of interest only in cases in which the players are cognitively weak, or cannot communicate effectively.
In a repeated game, the players must update their mixed strategies from one round to the next, based on the outcome of the previous rounds. Here lies the biggest surprise: The update rule used by the genes in this game is identical to a venerable learning algorithm, well known to computer scientists for its prowess in working surprisingly well in a multitude of difficult problems and contexts: the multiplicative weights update algorithm (MWU),2 also known in machine learning as "no-regret learning" or "hedge" (see more in box "The Experts Problem" in the online appendix). MWU changes the frequencies xi of the i-th allele of the gene as follows
where mi is the expected differential fitness, positive or negative, of the i-th allele in the current gene pool. This quantity mi is a measure of what we have called the mixability of allele i, its ability to form fit combinations with alleles of other genes in the current genetic mix. To summarize, at each generation in sexual evolution, each gene boosts the frequency of each of its alleles by a factor that increases with the mixability of this allele in the current generation. Naturally, the quantities resulting from the equation are normalized appropriately so as to add to one.
This is a completely new way of looking at evolution. And it is a productive view, because it gets more interesting: Let us look back at the update Equation 1 and ask once again the question: This choice of the new probabilities for the alleles by the gene, is it optimizing something? For once, the answer is very clean: Yes, the choice of allele frequencies by the gene shown in Equation 1 optimizes the following function, specific to this gene, of the allele frequencies:
Here Mi denotes the cumulative relative fitness of allele i, that is, the sum of the mi in Equation 1 over all generations up to and including t 1. It is easy to notice that is a strictly concave function, and thus has one maximum, and this maximum can be checked by routine calculation to be exactly the new frequencies as updated in Equation 1! Now notice that the second term of is plainly the entropy of the distribution x, a well-known measure of a distribution's diversity.
There is much that is unexpected and evocative here, but perhaps most surprising of all is that this radically new interpretation of evolution was lurking for almost a century so close to the surface of these well-trodden equations. That an algorithm as effective as MWU is involved in evolution under sex is also significant. It was pointed out in a commentary on our paper5 that the MWU is also present in asex. Indeed, asexual evolution can be trivially thought of as MWU helping nature select genotypes. However, our point is that in sexual evolution, the picture is far more sophisticated and organic, occurring deeper in the hierarchy of life: Individual genes interact, each "managing its investments in alleles" using MWU, in a context created by the other genes and, of course, by the environment.
The function is a rare explicit optimization principle in evolution. The second term, and especially its large constant coefficient (recall that the selection strength s is small, and |mi| 1) suggests that attention to diversity is an important ingredient of this mechanism, a remark that may be relevant to the question of how genetic diversity is maintained. But there is a mystery in the non-diversity terms: the cumulative nature of the fitness coefficients M suggests that performance during any previous generation is as important as the current generation for the determination of the genetic make-up of the next generation. How can this be?
This surprising connection between evolution and algorithms, through game theory and machine learning, as well as the maximization principle , are very recent, and their full interpretation is a work in progress.
The insights about sex as we have discussed shed some light on the mystery of genetic algorithms, as explained previously. There is a mismatch between heuristics and evolution. Heuristics should strive to create populations that contain outstanding individuals. In contrast, evolution under sex seems to excel at something markedly different: at creating a "good population." So, it is small wonder that genetic algorithms are not the best heuristics around. On the other hand, these insights also suggest that genetic algorithms may be valuable when the robustness of solutions is sought, or when the true objective is unknown or uncertain or subject to change.
What is a mutation? Point mutations (changes in a single base such as an A turning into a C) are only part of the story, as many important mutations are rearrangements of small stretches as well as large swaths of DNA: duplications, deletions, insertions, inversions, among others.13 For a long time it has been believed that mutations are the results of accidents such as radiation damage or replication error. But by now we have a deluge of evidence pointing to involved biological mechanisms that bring about and affect mutations.22
We know, for example, that the chance of a mutation varies from one region of the genome to another and is affected by both local and remote DNA.22 Nearly a quarter of all point mutations in humans happen at a C base which comes before a G after that C is chemically modified (methylated);11 methylation is know to be the result of complex enzymatic processes. As to rearrangement mutations, there are powerful agents of mutagenicity (creation of mutations) in the genome, such as transposable elements: DNA sequences prone to "jump" from one place of the genome to another, carrying other DNA sequences with them.13 A key step in mammalian pregnancy (decidualization), for instance, was the result of massive evolutionary rewiring of about 1,500 genes mediated in part by transposable elements.27 Furthermore, the genetic sequences that participate in a rearrangement mutation are likely to be functionally related, since they are likely to be close to each other in 3D space and to bear sequence similarity, both of which allow interaction through recombination-based mutational mechanisms (see Livnat22 and references therein). Indeed, the same machinery that effects sexual recombination is also involved in mutations, and in fact produces different types of rearrangement mutations, depending on the genetic sequences that are present.13 Finally, different human populations undergo different kinds of mutations resulting in the same favorable effect, such as malaria resistance, suggesting that genetic differences between populations cause differences in mutation origination.22
There is a mismatch between heuristics and evolution. Heuristics should strive to create populations that contain outstanding individuals. Evolution under sex seems to excel at something markedly different: creating a "good population."
The idea that mutations may be non-accidental is still confronted with suspicion due to the legacy of Lamarck, who believed around 1800 that organisms can sense, through interaction with the environment, what is needed for an evolutionary improvement, and are able to make the correct heritable change. Since in the light of modern biology this seems impossibleto a computer scientist it sounds like reversing a one-way function, or a hash functionthe accidental mutation notion prevailed. But in science one must not assume that the only relevant alternatives are the familiar, inside-the-box ones. Mutations are randombut it may be more productive to think of them as random in the same way that the outputs of randomized algorithms are random. Indeed, mutations are biological processes, and as such they must be affected by the interactions between genes. This new conception of heredity is exciting, because it creates an image of evolution that is even more explicitly algorithmic. It also means that genes interacting in one organism can leave hereditary effects on the organism's offspring.22 It no longer matters that a lucky genetic combination created by sex is doomed to vanish from the face of the earth (that the fisherman of our earlier metaphor throws the fish away): It may have achieved a lasting effect on the population through mutagenicity. Finally, the biological mechanisms affecting mutations may themselves evolve.
If there is one idea that permeates all the various aspects of computational thinking about evolution, as explained in the past sections, it is this: Interactions between genes are crucial for understanding evolution. Gene interactions also come up in our recent work on the following question: How is variation within a species preserved?
Classic data indicates that, for a large variety of plants and animals taken together, the percentage of protein-coding loci that are polymorphic (in the sense that more than one protein variant exists that appears in more than 1% of the individuals in a population), and the percentage of such loci that are heterozygous in an individual, average around 30% and 7% respectively.31 Such a number is far greater than could be explained by traditional selection-based theories.21 Genetic variation is fed by mutations, and, according to the equations of populations genetics, it is decreased by fixation, the eventual triumph of one allele and the extinction of all other alleles of the same gene. In much of the discourse on the subject, selection is assumed to act on individual genes, and therefore fitness is additive. The equations tell us that fixation will happen after generations, where is the differ- ence in relative fitness between the most fit and the second most fit allele. Fixation looks rather speedy.
In the spring of 2014, the Simons institute brought to Berkeley 60 biologists and computer scientists to exchange ideas on evolution. It was during that time the authors, with Costis Daskalakis and Albert Wu, explored how our understanding of the speed of fixation would be affected if one takes into account gene interactions. We approached the subject through some decades-old work on the complexity of local search;15 this theory examines how difficult it is for a local search process to reach a local optimum, and the conclusion has in many cases been: "pretty hard." By applying this point of view to selection on interacting genes, we showed that there are n-gene systems, in which the fitness is the sum total of contributions of certain pairs of allelesthat is, the next step beyond selection on single allelesfor which fixation takes a number of generations proportional to 2n to happen. A stronger result can also be obtained under the well-accepted complexity assumption that local search is intractable in general (see Johnson et al.15 for details). The implication is that, if gene interactions are taken into account, fixation may take much longer than in the regime of selection on individual genes.
Does this insight explain the mystery of variation? Not yet, because our analysis so far has been disregarding two other powerful forces in evolution, besides mutation and selection, acting on variation: the finiteness of the population, and heterozygosity (a diploid organism carrying two different alleles of a gene.).
First, finiteness. Because the number of individuals carrying the alleles in question is finite, say N, the number of individuals carrying each allele at each generation evolves as in a kind of a random walk within the confines of [0, N], and, ignoring selection, this results in fixation after O(N) generations.19 Second, diploidy introduces the possibility of overdominance, in which organisms with two different alleles of a gene are more fit than organisms with two copies of one allele or two copies of the other. In overdominance, the equations of selection point to stable variation, with both alleles enjoying stably high frequency in the population.
How these three effects, of finite population, of heterozygosity, and of selection acting on combinations of alleles across loci, interact with one another is an important subject for further research.
A computer scientist marvels at the brilliant ways in which evolution has achieved so much: Systems with remarkable resource efficiency, reliability and survivability, adaptability to exogenous circumstances, let alone ingenious and pristine solutions to difficult problems such as communication, cooperation, vision, locomotion, and reasoning, among so many more. One is tempted to ask: What algorithm could create all this in just 1012 steps? The number 1012one trillioncomes up because this is believed to be the number of generations since the dawn of life 3.5 · 109 years ago (notice that most of our ancestors could not have lived for much more than a day). And it is not a huge number: cellphone processors do many more steps in an hour.
Over the past decade, computer scientists and evolutionary biologists working together have come up with new insights about central open problems surrounding evolutionincluding, rather surprisingly, a proposed answer to the "algorithm" questionby looking at evolution from a computational point of view. And, of course many more questions, inviting similar investigation, were opened up in the process.
1. Aldous, D. and Vazirani, U. 'Go with the winners' algorithms. In Proceedings of the 35th Annual IEEE Symposium on Foundations of Computer Science (1994). 492501.
2. Arora, S., Hazan, E. and Kale, S. The multiplicative weights update method: A meta-algorithm and applications. Theory of Computing 8, 1 (2012), 121164.
3. Athreya, K. and Ney, P. Branching Processes. Springer, 1972.
4. Babbage, C. The Ninth Bridgewater Treatise. 2nd edn. John Murray, London, 1838.
5. Barton, N.H., Novak, S. and Paixão, T. Diverse forms of selection in evolution and computer science. In Proceedings of the National Academy of Sciences 111, 29 (2014), 1039810399.
6. Bell, G. The Masterpiece of Nature: The Evolution and Genetics of Sexuality. University of California Press, Berkeley, CA, 1982.
7. Chastain, E., Livnat, A., Papadimitriou, C. and Vazirani, U. Algorithms, games, and evolution. In Proceedings of the National Academy of Sciences 111, 29 (2014), 1062010623.
8. Darwin, C. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. Murray, London, 1859.
9. Feldman, M.W. Otto, P. and Christiansen, F.B. Population genetic perspectives on the evolution of recombination. Annual Review of Genetics 30 (1997), 261295.
10. Fisher, R.A. The Genetical Theory of Natural Selection. The Clarendon Press, Oxford, U.K., 1930.
11. Fryxell, K.J. and Moon, W.-J. CpG mutation rates in the human genome are highly dependent on local GC content. Molecular Biology and Evolution 22 (2005), 650658.
12. Goldberg, D. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA, 1989.
13. Graur, D. and Li, W.-H. Fundamentals of Molecular Evolution. Sinauer Associates, Sunderland, MA, 2000.
14. Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. U Michigan Press, 1975.
15. Johnson, D.S., Papadimitriou, C.H., and Yannakakis, M. How easy is local search? J. Computer and System Sciences 37, 1 (1988), 79100.
16. Jong, K.A.D. Evolutionary Computation: A Unified Approach. MIT Press, Cambridge MA, 2006.
17. Kanade, V. Evolution with recombination. In Proceedings of the 52nd Annual IEEE Symposium on Foundations of Computer Science, (2011), 837846.
18. Kearns, M. Efficient noise-tolerant learning from statistical queries. J. ACM 45, 6 (1998), 9831006.
19. Kimura, M. and Ohta, T. The average number of generations until fixation of a mutant gene in a finite population. Genetics 61, 3 (1969), 763.
20. Kirkpatrick, S., Gelatt, Jr,, C.D. and Vecchi, M.P. Optimization by simulated annealing. Science 220 (1983), 671680.
21. Lewontin, R.C. and Hubby, J.L. A molecular approach to the study of genic heterozygosity in natural populations; amount of variation and degree of heterozygosity in natural populations of Drosophila pseudoobscura. Genetics 54 (1966), 595609.
22. Livnat, A. Interaction-based evolution: How natural selection and nonrandom mutation work together. Biology Direct 8, 1 (2013), 24.
23. Livnat, A., Feldman, M.W., Papadimitriou, C. and Pippenger, N. On the advantage to sexual species in diversification rates. Unpublished manuscript.
24. Livnat, A., Papadimitriou, C., Dushoff, J. and Feldman, M.W. A mixability theory for the role of sex in evolution. In Proceedings of the National Academy of Sciences 105, 50 (2008), 1980319808.
25. Livnat, A., Papadimitriou, C. and Feldman, M.W. An analytical contrast between fitness maximization and selection for mixability. J. Theoretical Biology 273, 1 (2011), 232234.
26. Livnat, A., Papadimitriou, C., Pippenger, N. and Feldman, M.W. Sex, mixability, and modularity. In Proceedings of the National Academy of Sciences 107, 4 (2010), 14521457.
27. Lynch, V.J., Leclerc, R.D., May, G. and Wagner, G.P. Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nature Genetics 43 (2011), 11541159.
28. Mitchell, M. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, 1996.
29. Motwani, R. and Raghavan, P. Randomized Algorithms. Cambridge University Press, 1995.
30. Nagylaki, T., Hofbauer, J. and Brunovský, P. Convergence of multilocus systems under weak epistasis or weak selection. J. Mathematical Biology 38, 2 (1999), 103133.
31. Nevo, E., Beiles, A. and Ben-Shlomo, R. The evolutionary significance of genetic diversity: Ecological, demographic and life history correlates. Lecture Notes in Biomathematics 53 (1984), 13213.
32. Papadimitriou, C. and Steiglitz, K. Combinatorial Optimization: Algorithms and Complexity. Dover, 1998.
33. Rabani, Y. Rabinovich, Y. and Sinclair, A. A computational view of population genetics. In Proceedings of the 27th Annual ACM Symposium on Theory of Computing, (1995), 8392.
34. Stearns, S.C. and Hoekstra, R.F. Evolution: An Introduction. Oxford University Press, New York, 2005.
35. Valiant, L. Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World. Basic Books, 2013.
36. Valiant, L.G. Evolvability. J. ACM 56, 1 (2009), 3.
37. Von Neumann, J. and A. W. Burks, A.W. Theory of self-reproducing automata. IEEE Transactions on Neural Networks 5, 1 (1966), 314.
38. Williams, G.C. Adaptation and Natural Selection, 8th edition. Princeton University Press, 1996.
39. Wright, S. Evolution in Mendelian populations. Genetics 16 (1931), 97159.
40. Wright, S. The distribution of gene frequencies in populations. In Proceedings of the National Academy of Sciences of the United States of America, 23, 6 (1937), 307320.
Charles Babbage, the grandfather of computer science, was undoubtedly the first to think of evolution in algorithmic terms. In 183722 years before the publication of The Origin of Species and the same year that Darwin jotted down a sketch of the tree of life in his notebook (see the figure below)Babbage wrote the Ninth Bridgewater Treatise.4 The treatise was ninth in a series of philosophical essays on the subject of the deity, in which he refers extensively to his Difference Engine and how it can be programmed to go on for days calculating unimaginably large integers, with no further intervention by the programmer. Babbage used this image to argue that a benevolent creator can design not necessarily species, but an algorithm (he did not use the word) for creating species.
Allele: Variant of a gene. Some are known to affect our susceptibility to diseases.
Diploidy: The state of having two instances of each chromosome, and thus each gene, in a cell. organisms with one instance of each chromosome and gene are called haploid.
Fitness: The ability of an individual to survive and reproduce in its natural environment, manifested in its expected number of surviving offspring.
Gene: A unit of heredity and a region of the DNA that encodes a functional product. It is thought that humans have more than 20,000 of these. However, now that coding is known to be far more complex than originally thought, it is no longer clear how to define these units and their boundaries.
Genotype: The whole or a part of the genetic make-up of an individual or that is common to a group of individuals.
Heterozygous: An individual having two different alleles of a certain gene.
Mutation: A change in the hereditary material.
Phenotype: The characteristics of an individual other than its genetic code.
Recombination: In sexual reproduction, taken broadly, this term means that the genetic material of an offspring is a bricolage of the genetic material of its parents, due to both the independent assortment of chromosomes during the halving of chromosome numbers and the crossing over of chromosomesthe shuffling of segments between two corresponding chromosomes (in humans, typically 2-3 per chromosome)that occur in the generation of gametes. More generally, DNA recombination in the sense of exchange of genetic material between two DNA molecules or between segments of the same DNA molecule is an important part of mutational mechanisms.
Sidebar: The Equation that Reconciled Darwin and Mendel
After Fisher showed how a continuous trait can result from adding up the contributions of multiple discrete Mendelian factors, he, Wright, and Haldane established the classic population genetic framework for studying how a population of genotypes evolves from one generation to the next. A central idea is fitness, a mathematical articulation of Darwin's conception of evolutionary advantage, a useful summary of an organism's phenotype. Formally, this fitness of a genotype can be thought of as the number of surviving offspring an organism with this genotype will have, in expectation. For simplicity, imagine a haploid organism (one copy of each gene) in which fitness depends only on two genes, and these genes come in three alleles each. The fitness function of this species is perhaps captured by this 3 x 3 table and denoted Wij : i = 1; 2; 3;j = 1; 2; 3:
Suppose that at some point in time, in the tth generation of this species, the nine genotypes have these frequencies, denoted Ptij:
Assume reproduction by recombination (sex); this means that a child inherits each allele from either of its parents; assume that it inherits it from either with equal probability, independently for the two genes (this holds, for example, if the two genes reside in different chromosomes). Then the genotype frequencies in the next generation, , to the fourth decimal place, will be as shown in this table:
while the frequencies in 10 and 100 generations will be as follows:
The formula for going from the to the is the following:
where is the sum of the numerators for all ijs. Of course, this equation is based on certain crucial assumptions: It is assumed that generations do not overlap (as if all members of one generation procreate simultaneously just before death), that the population is so large that it can be thought of as being infinite, that mating is completely random and without separate sexes, and that expectations can be added and divided with impunity. However, for the sake of simplicity, these assumptions have been used very widely and successfully in evolutionary theory for many decades. So far we have been assuming sexual reproduction. If this species were asexual, the next generation, the 10th and 100th hence would be like this:
and the equation is much simpler:
with playing the same role. Naturally, much of evolution involves events not entering these equations at all: mutations creating new alleles, speciation events (the creation of a new branch in the tree of life with its own equations), and changes in the environment. These equations should be thought of as a mathematical description of a piece of the puzzle, still studied after nearly a century of research.
A completely different take on the ubiquity of sex in evolution, based on unpublished work, uses concepts and images very familiar to computer scientists.
Imagine a tree with nodes of two kinds: red and blue. The tree starts from a root, which is blue, and is generated through a random process. At each step (depth of the tree), the nodes at this depth are replaced as shown here, each independently of the others according to the probabilities as shown below.
This is a very crude model for generating the tree of life (the probabilities here are indicative, and not supposed to reflect some biological reality). Red nodes are sexual species, and blue nodes are distinct asexual types (the term "species" does not perfectly apply to the latter). While there are reasons to believe that life started sexually,22 and not with a well-delineated species as exist today, we make the assumption that the tree begins with a blue node, which for our current purposes is conservative. According to the chosen parameters, sexuals and asexuals become extinct with equal probability. If extinction does not happen, sexual species branch out 50% more. With small probability, one of the new species can be of the other kind.
How would the leaves of the tree (that is, the life we see around us) look like, many steps down? After only 50 steps, the expected ratio of asexuals to sexuals is roughly 2:1000 (the probability of the top middle event).
Asexual nodes are rare! And, to a computer scientist, this stands to reason: We know that branching factors (in trees, computer virus propagation, recursive algorithms, among others) can dwarf in ultimate significance all other ingredients of a situation. Interestingly, this simple model immediately explains the main characteristics of the distribution of sexuality: It predicts that asexual forms are rare, sparsely distributed across the tree of life, and recently derived from sexual ancestors. Empirical evidence shows that all three points are true.22
The precise relevant theorem is the following23 (it is a consequence of well known properties of branching, see Athreya and Ney3):
Consider a continuous random process of generating a red-blue tree in which the rates of the events B BB, B BR, B extinction, R RR, R BR and R extinction are, respectively, A, , A, S, and s Let s = Ss = A = A, A and s > A. Suppose that > 0 and > 0 be each small in comparison to SA. Then the asymptotic ratio of asexual to sexual species at the leaves is /(SA).
This argument suggests that, conceivably, a key advantage of sex might be its "branching factor" in the tree of life. Even if sex were a burden for species and organisms (which is at least a part of the truth), the mere fact that it promotes diversification and an exploration of new evolutionary niches might be enough for this strange trait to dominate the planet.
Copyright held by authors. Publication rights licensed to ACM.
The following letters were published in the Letters to the Editor in the February 2017 CACM (http://cacm.acm.org/magazines/2017/2/212426).
In "Sex as an Algorithm: The Theory of Evolution Under the Lens of Computation" (Nov. 2016), Adi Livnat and Christos Papadimitriou argued eloquently that the extraordinary success of sexual evolution has not been adequately explained. Somewhat paradoxically, they concluded that sex is not particularly well suited to the task of generating "outstanding individuals." They also said that genetic algorithms are similarly ill suited to this task.
It should be noted that this critique of genetic algorithms widely used derivative free optimization heuristics modeled on recombinative evolution stands in counterpoint to a voluminous empirical record of practical successes. It also speaks to the long-standing absence of consensus among evolutionary computation theorists regarding the abstract workings of genetic algorithms and the general conditions under which genetic algorithms outperform local search. A consensus on these matters promises to shed light on the question the authors originally aimed to answer: Why does recombinative evolution generate populations with outstanding individuals?
Generative hypomixability elimination(1) is a recent theory that addresses this question, positing that genetic algorithms efficiently implement a decimation heuristic that generates fitter populations over time by iteratively eliminating the joint entropy of small collections of "hypomixable loci," or loci in which alleles do not mix well. Recombination, or mixing, allows such loci to go to fixation even as it safeguards the marginal entropy of non-interacting loci.
Taking a step back, one might ask how this theory and the theory proposed by Livnat and Papadimitriou might be evaluated. Proof of soundness, wherever possible, is always desirable, but end-to-end proof can be elusive when analyzing computation in biological systems like brains and evolving populations. We must instead use the scientific method(2), an approach undergirded by the following rule:
Unlike the foundations of, say, physics, the foundations of computer science are logically verifiable; hypotheses play no part. So, while computer scientists have seen engineering revolutions aplenty, they have seen nothing like the transition from a Newtonian universe to an Einsteinian universe or from the phlogiston theory of combustion to Lavoisier's oxygen-based theory or any of the other foundational shifts described in Thomas Kuhn's Structure of Scientific Revolutions. Theoretical physicists, chemists, and biologists trained informally, if not formally, in the application of the scientific method know how to evaluate and work with competing hypotheses. The same cannot be said of theoretical computer scientists today. For them, the scientific method is unfamiliar terrain, with different rules and alternate notions of rigor. For example, assumptions must be weak, and hypotheses testable.
For all computer science as a field has to contribute to the natural sciences, it also has much to learn.
(1) Burjorjee, K.M. Hypomixability elimination in evolutionary systems. In Proceedings of the 13th Foundations of Genetic Algorithms Conference (Aberystwyth, U.K., Jan. 1720). ACM Press, New York, 2015, 163175.
(2) Popper, K. The Logic of Scientific Discovery. Routledge, London, U.K., 2007.
While Adi Livnat and Christos Papadimitriou's article (Nov. 2016) provided the rationale for a provocative magazine cover, the article itself began with a false claim and ignored a much simpler explanation for the success of sexual evolution. Shortly after life appeared on Earth, approximately 3.8 billion years ago, evolution began diversifying lifeforms in a very pragmatic way, with mutations that increased the ability of individuals to survive and reproduce being passed along to future generations, whereas those that were disadvantageous were naturally dropped. This process soon discovered that sexual reproduction worked better than simply subdividing, in that it allows advantageous mutations that occur in different families to be combined, allowing evolution to proceed more rapidly, whereas subdividing does not allow it. Sexual reproduction thus became dominant.
Nevertheless, the article said, "What is the role of sex in evolution? Reproduction with recombination is almost ubiquitous in life (even bacteria exchange genetic material), while obligate asexual species appear to be rare evolutionary dead ends. Yet there is no agreement among the experts as to what makes sex so advantageous."
How can there be no agreement when the reason for sexual evolution is so obvious? In order for sexual evolution to work, each generation must die, which some people view as inconvenient, prompting them to imagine an afterlife. Subdividing, on the other hand, produces potential immortals who are naturally less diverse because they mutate less radically than the sexy species.
P.S. I do not hold any of this against Christos Papadimitriou, who I have known for 50 years.
Earnest's idea, first proposed by R.A. Fisher (1930) and H.J. Muller (1932), does not solve the problem and is referenced in our online appendix where the interested reader can begin to explore this fascinating topic. The debate among experts is ongoing, and our recent article contributed a fresh idea to it. Burjorjee did not back up with evidence his claim of empirical success of genetic algorithms, compared to, say, simulated annealing. And a propos philosophy of science, he may refer to Papadimitriou's 1995 article "Database Metatheory: Asking the Big Queries" (http://dl.acm.org/citation.cfm?id=211547) with its sections on T. Kuhn, K.R. Popper, and P. Feyerabend, and their relevance to computer science.
January 31, 2017 11:38
The following letter was published in the Letters to the Editor in the February 2017 CACM (http://cacm.acm.org/magazines/2017/2/212426).
I am writing to express my dismay and disappointment at the cover of the November 2016 issue introducing the article "Sex as an Algorithm: The Theory of Evolution Under the Lens of Computation" by Adi Livnat and Christos Papadimitriou, finding it offensive and attention-grabbing in a way that is inconsistent with ACM's public mission.
While I would guess that most readers either do not care or thought the cover "funny" or "cute," I have talked to enough of my colleagues, who describe their reaction as "shocked," "appalled," "offended," and "embarrassed," to believe it is a serious issue that warrants further reflection.
Specifically, is it really appropriate for ACM, a professional organization that purports to represent and support all its members and all members of the computing discipline, to distribute an issue that some are embarrassed to receive in our mailbox, display on our desks or conference tables, or look at on our computers if somebody might be looking over our shoulders?
First, the research in question is not about sex but about sexual reproduction and its effect on diversity in populations. There is a major difference, and conflating the two in this way comes across as juvenile. I cannot help think of "locker room talk."
Second, placing the huge, bold-faced word "Sex" on a hot pink cover creates an obvious and immediate association with women. Given the under-representation of women in the field, this kind of message is completely counterproductive and particularly reminds young women, who may be less certain about how welcome they are in the field, that they are to be associated with sex, not science.
Third, the unfortunate timing of this issue, which arrived during National Breast Cancer Awareness Month, was undoubtedly unintentional, but to those of us who have lost loved ones to breast cancer, the hot pink cover felt disrespectful and insensitive.
This may not seem like a big deal, and I am sure some readers are thinking I am overly sensitive and humorless. But quite honestly, it is tough enough being a woman in an extremely male-dominated field without feeling embarrassed and awkward about displaying my own professional organization's magazine in public.
In the end, I dropped it into the recycling bin without reading it.
The cover in question, for which I am ultimately responsible, was meant to be humorous. Since several readers were offended by it, it is clear in retrospect the humor was misguided. For that, I sincerely apologize. This has been discussed by the design team, and we hope to learn from this mistake.
Moshe Y. Vardi
March 28, 2017 02:08
The following letter was published in the Letters to the Editor in the March 2017 CACM (http://cacm.acm.org/magazines/2017/3/213824).
Adi Livnat and Christos Papadimitriou review article "Sex as an Algorithm" (Nov. 2016) was fascinating but mistitled. It discussed the benefits of conjugality. George C. Williams in Sex and Evolution distinguished the more general concept conjugality from (eu)sexuality, in which the number of conjugal strains in the species is equal to the number of individuals participating in conjugation two, in all conjugal species on this planet. This seems an important distinction, and I suggest the cover of Communications was misleading. In my own book Albatross I emphasized this and other distinctions, aiming to avoid nonsensical talk, as in that arising from "the gostak distims the doshes" in The Meaning of Meaning by C.K. Ogden and I.A. Richards.
Livnat's and Papadimitriou's reference to their non-coverage of heterozygosity was revealing. I rather suspect heterozygosity is a prerequisite for sexuality proper; certainly a lot of sexual species are haploid in the gametic generation and diploid in the others.
Some of the mathematics as to the binarity of conjugation might be interesting. What are the chances that on some other world there may have evolved life with a triple helix, ternary conjugation and so trisexuality?