Systems and Networking Review articles

Theory of Algorithmic Self-Assembly

The challenge of programming molecules to manipulate themselves.

Posted Dec 1 2012

Introduction
Key Insights
Parallelism in Molecular Computing: The Bad News
Computational Universality
Modeling Errors
Tile Complexity
Computational Complexity
Assembly Time Complexity
Intrinsic universality
Power of cooperative Binding
The Future
Acknowledgments
References
Author
Footnotes
Figures

Theory of Algorithmic Self-Assembly, illustrative photo

Self-assembly is the process by which small components automatically assemble themselves into large, complex structures. Examples in nature abound: lipids self-assemble a cell’s membrane, and bacteriophage virus proteins self-assemble a capsid that allows the virus to invade other bacteria. Even a phenomenon as simple as crystal formation is a process of self-assembly. How could such a process be described as “algorithmic?” The key word in the first sentence is automatically. Algorithms automate a series of simple computational tasks. Algorithmic self-assembly systems automate a series of simple growth tasks, in which the object being grown is simultaneously the machine controlling its own growth.

Key Insights

The idea of “molecules that can perform computation” is transforming the way we engineer self-assembling molecular systems.
A small number of simple types of molecules can grow into large, sophisticated nanoscale structures automatically.
Understanding the fundamental abilities and limitations of these systems is crucial for guiding experimental work. Many known theoretical results have drawn on the theorems, techniques, and paradigms of computer science.

Although large tracts of the theory presented in this article are applicable to non-molecular systems, much of the motivation arises from nanoscale self-assembly (as an engineering field, as opposed to the study of natural self-assembly systems). The broad goal of nanoscience is to manipulate molecules with nanoscale precision. Ambitious long-term applications include microscopic, chemical-detecting robots that move toward and metabolize pollutants, or the integration of human tissue with an implanted medical device.

Why should computer science have anything to do with nanoscience, beyond the obvious role of developing software-modeling tools? Luca Cardelli, in a panel discussion at the 2011 Conference on DNA Computing and Molecular Programming, observed that while the computing revolution was about the systematic manipulation of information, nanoscience is about the systematic manipulation of matter. Nanoscience provides a novel justification for studying computation. Many of the traditional forms of manual control are simply not possible at small scales. Automating the growth of molecular structures is not merely faster or more convenient than building such structures by hand. Our hands, and the machines they operate, are simply too large to manipulate individual molecules. We must learn to program molecules to manipulate themselves.

DNA is the molecule of choice in many labs, not for its biological properties but for its information-bearing properties. It is easy to synthesize, and its physical properties are well understood. DNA origami⁴⁵ is currently the most successful laboratory technique for self-assembling DNA. A long scaffold DNA strand is folded into a shape by mixing it with hundreds of shorter staple strands, each of which binds to one region of the scaffold with its left half and another region of the scaffold with its right half, bringing the two regions together. A more experimentally challenging technique, DNA tile assembly, is the physical basis for the theoretical work discussed in this article. A DNA tile is a DNA complex with four short single-stranded “sticky” ends protruding from it (see Figure 2a), which are intended to bind the tile to other tiles having sticky ends that are Watson-Crick complementary. Figure 1 shows images of some patterns that have been experimentally assembled with DNA tiles. Figure 2 shows how a DNA tile is modeled as a square with four sides having “specific glues.” DNA origami in its current incarnation is non-algorithmic self-assembly, whereas DNA tile assembly is potentially algorithmic. We now explain this distinction.

Turing⁵⁴ stated, “The ‘computable’ numbers may be described briefly as the real numbers whose expressions as a decimal are calculable by finite means.” Finite means is formally equated with algorithm (for example, a Turing machine, a λ-expression, a Python program, or an appropriately initialized configuration of Conway’s Game of Life). There are uncountably many real numbers but only a countable number of algorithms, so most real numbers are not computable. What if we wish to make sense of the question, which integers are computable? By Turing’s qualitative definition, they are all computable, but it is reasonable to object that some integers (for example, n = 10^10,000) are easier to compute than others (for example, m = a random sequence of 10,000 digits), in the sense that a much smaller program suffices to compute n than to compute m. To make formal sense of such intuition, the theory of Kolmogorov complexity³³ gives a rigorous quantitative measure of the “algorithmic complexity” of an integer (or any other “finite object” such as a finite string or a graph): the length in bits of the shortest algorithm that prints the integer and halts.

Both of these definitions adopt the view that an object is “algorithmically constructed” when there is an algorithm constructing the object that is much smaller than the object itself (infinitely smaller in the former case). From this perspective, algorithmic self-assembly describes the self-assembly of a structure whose total number of components (its “size”) is much greater than the number of different types of components. Since a component type is reused many times, it has no way of “knowing” where it is going to end up in the structure, so only local information is available to guide its attachment. DNA origami therefore does not constitute algorithmic self-assembly, since each staple strand “hardcodes” its position in the final structure. DNA tile assembly, however, has the potential to reuse a small number of tile types to create large structures.

Our combinatorial model of the dynamic behavior of these tiles is called the abstract Tile Assembly Model (aTAM), due to Winfree,⁵⁶ explained briefly in Figure 2. Figure 2c shows seven tile types that fill the entire second quadrant with a painting of the discrete Sierpinski triangle, showing that this pattern is “algorithmically self-assemblable.” The main computation is done by the bottom four rule tile types using cooperative binding, which refers to the fact that a tile with only strength-1 glues cannot bind to an assembly unless at least two of them match. (Sometimes this binding strength threshold is called the “temperature” τ, which is usually 2 but not always.) The rule tiles in this example always bind using their south and east glues. Thus the glue labels (bits in this case) can be imagined as inputs to a function computed by the rule tile types (analogous to a transition function in a Turing machine or cellular automaton). The output of the function in this example is the XOR of the bits, which is advertised on the north and west sides.

Parallelism in Molecular Computing: The Bad News

Algorithmic self-assembly is a subfield of molecular computing, highlighted in a news article by Kirk. L. Kroeker in the December 2011 issue of Communications. The field was initiated in 1994 when Adleman,¹ in a landmark proof-of-concept experiment, designed DNA molecules that interact to solve a 7-vertex case of the Hamiltonian path problem: executing a “DNA algorithm” whose basic operations are well-understood chemical interactions such as hybridization.

Let us state an unequivocal limitation to the power of molecular computing as a model of computation, which applies to algorithmic self-assembly systems as well. Molecular computing is not a magical potion that can be ladled over NP-complete problems to transubstantiate them into tractable problems. In its most generic form, the sorcery of DNA solutions to NP-complete problems proceeds by letting the DNA test all possible solutions in parallel in a single test tube. Assuming a modest solution length of 300 bits, squeezing 2³⁰⁰ molecules into a test tube is like installing 2³⁰⁰ processors in a parallel computer: there is not enough sand in the world to supply that much silicon.

Scientific journals continue to traffic in tales of DNA’s exponential search capability. As with Mjölnir, the mountain-crushing hammer of Thor, the time has come to place these tales into the dustbin of mythology, making room for the real—and far more exciting—scientific work that remains to be done in molecular computing. If there is a breakthrough in molecular engineering that will enable the solution of classically intractable problems, it will have to come from quantum mechanics or perhaps from some exotic physical theory yet to be discovered. It will not come about solely by replacing silicon with DNA. But enough pessimism; let’s find out what DNA can do.

Computational Universality

Winfree⁵⁶ showed that the aTAM is computationally universal, that is, able to simulate any algorithm. This one fact explains the richness of the theory presented in the rest of the article. What exactly does it mean?

The model as stated lacks one feature in common with other computational models: there is no input! One potential way to program a single tile set with different inputs that result in different behaviors is to generalize the idea of a single seed tile type to a larger seed assembly. Within the model, we allow the seed assembly to be any finite stable assembly (“stable” = all cuts of the assembly that separate it into two components must break bonds of total strength at least 2). In practice, seed assemblies are not necessarily made of tiles, but must simply have a perimeter compatible with the tiles, such as a DNA origami shape appropriately augmented with sticky ends on its side (Figure 1g-h). We use the term tile system to refer to a finite set of tile types, together with other parameters needed to determine its behavior, such as its seed assembly a and its temperature τ.

With this convention established, we can now state more formally what it means to claim that the aTAM is computationally universal. For every single-tape Turing machine M, there is a tile set T so that, for every input string x, there is a seed assembly σ of T so that T with seed σ uniquely assembles a space-time configuration transcript of M on input x.^a By “uniquely assembles,” we mean that although T with seed σ assembles many different partial assemblies, it has a unique terminal assembly α (terminal = no tile can attach to it). The t^th row of α represents the configuration of M(x) at time step t, with σ occupying the first row.

The computational universality of the aTAM implies that arbitrary algorithms may be executed by self-assembling tiles and therefore used to direct the growth of the tiles. However, the aTAM is not just another programming language. The subtle interplay of computation and geometry in self-assembly gives rise to unique characteristics such as the distinction between computable patterns and self-assembling patterns discussed later.

Modeling Errors

The aTAM is not a realistic model of how DNA tiles actually behave at the molecular level. In particular, it makes two assumptions known not to hold in practice: that tiles never detach from an assembly, and that tiles only attach when their binding strength exceeds the temperature threshold value τ. Winfree⁵⁶ introduced the kinetic Tile Assembly Model (kTAM) as a more realistic model of tile assembly that uses standard laws of chemical kinetics to relax these assumptions. Each tile type is assumed to attach to any binding site on an assembly α, producing a new assembly β, at a rate proportional to its concentration, regardless of its strength of attachment. If we assume that there is only a single seed tile, and that each other tile type is equally concentrated (and much higher count than 1, so that their rate of depletion is essentially 0), then this forward rate is equal for all tile types and approximately constant over time. Call this forward rate r_f. Each tile within β is assumed to detach at a rate proportional to e^−b, where b is the total strength with which the tile is bound. Call this reverse rate r_r,b. The constant of proportionality depends on factors such as temperature, salinity, and sticky end length. Winfree⁵⁶ showed that by setting these factors and the non-seed tile concentrations such that r_f = r_r,2 + ε then the probability that the seed grows into a correct assembly (an assembly producible under the aTAM growth rules) approaches 1 as ε → 0. However, the closer ε is to 0, the slower assembly proceeds, with ε = 0 representing an unbiased random walk with attachment equally likely as strength 2 detachment.

There is only so much that one can do in a laboratory to precisely control experimental conditions; errors are bound to happen, just as deep-space communication and large-scale data storage inevitably corrupt some of the bits they intend to send or store. What should we do about this? Why not do the same thing that NASA and Google do about it: build in redundancy to help correct the errors! Here, I discuss algorithmic error-correction in the kTAM, showing that under certain conditions, the probability of correct growth can be boosted to enable correct growth under more flexible experimental conditions.

Winfree and Bekbolatov⁵⁸ devised a scheme known as proofreading. It helps correct growth errors, shown in Figure 3a, in which an incorrect tile binds with strength 1 where a correct tile could bind with strength 2. A tile bound with only strength 1 may temporarily stick and be secured in place by a subsequent tile attachment. The basic proofreading scheme is shown in Figure 3b. Each tile type t is replaced by a k × k block of tile types, with glues internal to the block unique to t. This enforces that errors cannot happen in isolation: if there are any errors in the block, then there are ≥ k errors. Intuitively, errors happen only slowly (since insufficiently bound tiles fall off more quickly than new tiles attach), so if one error occurs, it is more likely for the erroneous tiles to detach before the block completes than for an additional k − 1 errors to occur and secure the block in place. To a very rough approximation, k × k proofreading changes a base error probability of ε to be ε^k. This scheme incurs two costs: the number of tile types increases by factor k², and the system suffers a resolution loss since k × k blocks in the new system represent individual tiles in the original system. Reif, Sahu, and Yin⁴⁴ demonstrated a compact proofreading scheme, in which there is no resolution loss, but in general the tile complexity blowup is exponential. To reduce error probability of ε to ε^k for tile system T, the new tile system may have up to |T|^k2 tile types. Soloveichik and Winfree⁵¹ showed that for all but a restricted class of tile systems, the tile complexity blowup of any compact proofreading scheme is necessarily exponential.

A facet error is the attachment of a tile with strength 1 where no tile should go because there is only one adjacent strength 1 glue. Both the proofreading constructions cited^44,58 correct growth errors but not facet errors. In practice, facet errors tend to dominate the overall behavior. Chen and Goel¹⁴ showed a proofreading scheme, also using a k × k block replacement as in Winfree-Bekbolatov,⁵⁸ which is resistant to both growth and facet errors, and which achieves error of O(ε^k) on a base error probability of ε. In fact, this scheme was experimentally demonstrated¹⁷ to reduce facet errors better than either no error-correction or the Winfree-Bekbolatov proofreading scheme. Chen, Goel, and Luhrs¹⁵ showed that two-dimensional tile systems may be proofread with no resolution loss and only polynomial increase in tile complexity by using the third dimension. It grows a larger structure than the original system, but when projected onto the plane there is no resolution loss. They also introduce combinatorial criteria to help in proving that a proofreading scheme works without needing to carry out the cumbersome Markov process analysis required to analyze kTAM systems.

Doty et al.²⁶ showed a stronger form of error correction using the hierarchical model of self-assembly, in which there is no seed and two assemblies that have assembled independently in parallel are allowed to aggregate together, as opposed to the standard seeded aTAM in which tiles attach one at a time to a growing assembly. In this model, they use the ability of geometry to enforce binding constraints (preventing two assemblies with matching glues from attaching if their shapes are not compatible to allow the glues to touch) to show how to assemble an n × n square from O(log n) tile types while guaranteeing the following constraint. Arbitrary strength 1 growth is allowed; however, any assembly that grows sufficiently to become stable at temperature 2 is guaranteed to assemble into the correct final assembly. Thus errors are prevented absolutely, rather than only with high probability.

The errors modeled so far are those of “insufficient attachment:” a tile binds with strength 1 and sticks around long enough to cause problems. We can consider other types of errors. For instance, we consider some tile types to have input and output sides, although this is not enforced from within the model. Consider the following error: after some time, a portion of the assembly is removed to make a hole. This could result in “backward” growth filling in the hole using output glues, possibly resulting in the wrong tiles attaching (this would happen in Figure 2d since XOR is not a 1-1 function). Winfree⁵⁷ introduced a 3 × 3 block transformation scheme called self-healing that enables the assembly to regrow correctly even when holes are blown out of the assembly, so long as the seed remains present. Soloveichik, Cook, and Winfree⁵⁰ showed how to combine self-healing simultaneously with proofreading. Chen et al.¹⁶ showed that a more extreme form of self-healing was possible for the particular task of assembling an n × n square from O(log n) tile types: the entire square is able to grow from any subassembly of width or height 2 log n, without requiring the seed. They also generalize the idea to arbitrary finite shapes (with some resolution loss).

Another type of error is spurious nucleation, in which tiles attach to each other without a seed, using strength 1 growth to form a stable assembly from which an incorrect assembly could grow. Schulman and Winfree showed both theoretically⁴⁸ and experimentally⁴⁹ that a certain class of tile systems can be made resistant to nucleation errors, in the sense that with the addition of O(k) extra tile types, in the absence of the seed, the probability is at most O(2^−k) that sufficiently many tiles aggregate to create a stable assembly from which further growth can occur.

The conclusion to draw from these error correction techniques is that, even though the aTAM does not accurately describe the behavior of DNA tiles, under certain assumptions, it is an implementable “programming language” for tile self-assembly.

Tile Complexity

We have identified the algorithmic aspect of self-assembly with the ability of a small number of tile types to assemble a large structure. This ability will prove crucial to programming large, complex molecular self-assembling systems, since the total number of types of molecular components dominates the cost (in money and time) of implementation.

Even though the aTAM does not accurately describe the behavior of DNA tiles, under certain assumptions, it is an implementable “programming language” for tile self-assembly.

To quantitatively formalize this idea, Rothemund and Winfree⁴⁷ defined the tile complexity of a shape S (a finite, connected subset of ²) to be the minimum number of tile types in any tile system—with a single seed tile—that uniquely assembles a single terminal assembly with shape S. The requirement of a size-1 seed avoids cheating by simply letting the seed assembly have the desired shape. They studied the tile complexity of n × n squares, which has since become a canonical benchmark problem for studying various other aspects of self-assembly, since n × n squares are in a sense the simplest shape with non-trivial tile complexity. The simplest shape is a single point, whose tile complexity is clearly 1. The next simplest shape might be a 1 × n line. An easy argument shows that its tile complexity is n. Since it is one-dimensional, any stable assembly with that shape must use entirely double-strength glues to hold it together. If fewer than n tile types are used, then one must repeat, but this tile system would then be able to grow an infinite line by repeating the segment between the repetitions. Only in two dimensions is tile complexity nontrivial.

Rothemund and Winfree showed that for most values of n (all algorithmically random n), the tile complexity of an n × n square is Ω( ). Why does this hold? A tile system of size k can be described using O(k log k) bits (O(log k) bits per tile type). If the tile system uniquely self-assembles an n × n square, then that description, together with a constant size aTAM simulator, constitute a program of length O(k log k) that outputs n. Since for most n, the shortest program outputting n is log n bits long, we must have k log k = Ω(log n), that is, k = Ω( ). They then show that the tile complexity of all n × n squares is O(log n), in which log n tile types each represent a particular bit of n that assemble to form a 1 × log n row whose north glues represent n in binary. From this assembly, a O(1) tile types attach to assemble an n × n square, first by growing a counter—essentially a binary-to-unary converter—that counts from n down to 0, extending the 1 × log n row into a n × log n rectangle, together with O(1) tile types that extend this rectangle into the full square.

The gap between these upper and lower bounds arises from the fact that encoding one bit of n per tile type is wasteful, in an information-theoretic sense. In principle, a set of size k can encode log k bits per element. Letting k = , Adleman et al.² showed how to encode ≈ log k bits of n per tile type, showing the tile complexity of any n × n square is O( ). The trick is to encode n in a larger base b. Letting b be the unique power of two satisfying k ≤ b < 2k, k tile types are required to encode n in base b. An additional O(k) tile types then convert n to base 2, from which the constant set of tile types of Rothemund and Winfree⁴⁷ self-assemble an n × n square.

Intuitively, most of the complexity of an n × n square is captured by its width; geometry does not play much of a role. What about shapes with more complicated geometry? One would expect that even if such a shape has a compact algorithmic description, self-assembly that simulates this algorithm may not be possible to execute within the shape. Soloveichik and Winfree⁵² showed that in fact, if we ignore scaling factors, then the tile complexity of every shape S is closely related to its Kolmogorov complexity K(S), the length in bits of the shortest program outputting the points in the shape. More formally, given a finite shape S ⊂ ² and positive integer c ⁺, let be S scaled up by factor c. They showed that for every finite shape S, there is a scaling factor c such that the tile complexity of S^c is . This is a tight bound, since no smaller tile system could uniquely self-assemble any scaling of S without contradicting the Kolmogorov complexity of S. The scaling factor c is proportional to the running time of the shortest program for S. A constant set of tile types T produces S^c from a 1 × row encoding the program for S. Thus, T is a single tile set that is universally programmable (by seeding with an appropriate program) for building any finite shape S, if scaling factors are ignored.

What is the lesson behind of all these bounds? The ability of tiles to execute algorithms translates directly into the ability to assemble large, complex shapes from a very small number of different types of molecular components.

The ability of tiles to execute algorithms translates directly into the ability to assemble large, complex shapes from a very small number of different types of molecular components.

Tile complexity has also been studied in many interesting variants of the standard aTAM. Each of these models allows a single tile system to be reused for assembling different structures by programming it with different environmental conditions affecting the behavior of the tiles, thus showing that O(1) tile complexity suffices for assembly of complex structures if additional laboratory steps are used.

In the step assembly model introduced by Reif,⁴⁵ tile types are added to a test tube in a series of steps, with each step assumed to run to completion before washing away remaining unbound tiles and nonterminal assemblies, before adding new tile types for the next step. Maňuch, Stacho, and Stoll³⁷ showed that for a large class of shapes, including arbitrary shapes scaled by factor 2 or any other shape with a Hamiltonian path, O(1) tile types can assemble the shape in the step assembly model. There is a natural generalization of step assembly known as staged assembly, in which the order of test tube mixing is not monotonic and hierarchical assembly (attachment of two large assemblies to each other) is allowed. A directed graph describes the order of test tube mixing. Under this model, Demaine et al.²⁰ showed that O(1) tile types can be used to assemble any shape, and that certain classes of shapes require only O(log n) parallel mixing stages (source-to-sink distance in the mixing graph). They also showed a number of trade-offs between this measure (stage complexity) and bin complexity, the total number of test tubes required. In the same model in one dimension, Demaine et al.²¹ showed that for each alphabet Σ, there is a constant set of tile types—each labeled by a symbol from Σ —so that, given any string x Σ*, the tiles can be mixed to uniquely assemble a linear assembly spelling x. Furthermore, the number of mixing stages required is within a constant of the size of the smallest context-free grammar that produces the singleton language {x}, if each intermediate stage is required to produce a unique terminal assembly (if not, then there are strings for which more efficient mixings exist).

Another model relaxes the assumption of constant temperature over time: temperature programming, in which the temperature is raised and lowered over many stages, with each stage assumed to reach a terminal state before changing the temperature for the next stage. Aggarwal et al.⁵ proved that with one temperature change, a thin rectangle—an n × k rectangle where k < —can be assembled from tile types, compared to a provable lower bound of Ω( ) in the standard aTAM. Kao and Schweller³¹ showed that O(1) tile types suffice to assemble any n × n square using O(log n) temperature changes. Summers⁵³ showed that O(1) tile types suffices to assemble any shape (scaled up) using temperature programming, using two constructions. In the first, the number of temperature changes is O(|S|), and the scaling factor is O(1). In the second, the number of temperature changes is O(K(S)) (the Kolmogorov complexity of S), and the scaling factor is proportional to the running time of the shortest program for S.

Randomized self-assembly uses the inherent nondeterminism in the aTAM in which multiple tile types compete to bind to a single binding site, and tile type concentrations determine the probability of a tile type attaching to a binding site when it competes with others sharing the same input glues. Chandran, Gopalkrishnan, and Reif¹² showed that the tile complexity of a 1 × n line is Θ(log n) in the randomized model, that is, Θ(log n) tile types are sufficient and necessary to assemble a line of expected length n (compared to n tile types that are provably necessary in the deterministic aTAM). In fact, their tile system is equimolar: all tile types have equal concentrations. Becker, Rapaport, and Rémila⁹ introduced the model of concentration programming, in which concentrations of tile types are used to program input to the system. Kao and Schweller³² showed that for each δ,ε > 0, there is a single tile system T so that, for every n ⁺, there is a setting of concentrations that cause T to assemble an n’ × n’ square, where (1 − ε) n ≤ n’ ≤ (1 + ε)n with probability at least 1 – δ. That is, the square is probably approximately assembled. Doty²³ improved this result, showing that for each δ > 0, a single tile system can be used to assemble an exactly n × n square for any n ⁺ with probability at least 1 – δ using concentration programming.

Demaine et al.²² studied the RNase model (first suggested by Rothemund and Winfree⁴⁷), in which some tile types are assumed to be made of RNA, which is compatible to bind with DNA but has a different chemical structure. An enzyme called ribonuclease—abbreviated RNase—can dissolve RNA but not DNA, so adding RNase after assembly completes allows one to controllably separate the assembly into parts that can now interact with each other. They use this model to improve on the Soloveichik/Winfree⁵² result, showing that for every shape S, the optimal O( ) tile types suffice to assemble a scaled version S^c of S, but with scaling factor c = O(log |S|), rather than c depending on the running time of S‘s shortest program.

In the flexible glue model studied by Aggarwal et al.,⁵ unequal glues are allowed to interact with non-zero strength. One could imagine this being implemented, for instance, by multiple sticky ends on a tile’s side, in which the strength of interaction depends on the number of sticky ends that match. They showed that with flexible glues, O( ) tile types are necessary and sufficient to assemble an n × n square.

Computational Complexity

Rather than manually analyzing each new shape or class of shapes to determine its tile complexity, it would be beneficial to develop an algorithm that automates the task: given a finite shape S, it outputs the smallest tile system that uniquely assembles S. Unfortunately, an efficient algorithm is unlikely to exist: Adleman et al.³ showed that the problem is NP-complete. Here, “uniquely assembles S” means, assembles one unique terminal assembly, whose shape is S. One could object that a tile system deterministically assembles S if it produces many terminal assemblies that all have the shape S. Under this definition, Bryans et al.¹⁰ showed that the problem, in a sense, is even harder: NP^NP-complete, that is, NP-complete for algorithms that have access to a constant-time subroutine for an NP-complete problem such as SAT.

In addition to optimization problems, there are important verification problems of interest in tile assembly. One easy problem is this: given a tile system T and a (possibly nonterminal) assembly α, can T produce α? One can simply add tiles to the seed that are consistent with α and have sufficient binding strength until either α is complete, or until no more tiles can be added; the answer is “yes” in the former case and “no” in the latter. A more challenging question: Is α the unique terminal assembly produced by T? A tile system could produce α through an exponential number of different pathways (orders of tile addition), so it would naïvely seem to require that we check all of them to ensure none of them create a different terminal assembly. However, Adleman et al.³ showed this problem is solvable in quadratic time. On the other hand, in the hierarchical aTAM (in which two large assemblies are allowed to attach), the problem of determining whether α is uniquely assembled is coNP-complete.¹¹ This is discouraging, since such decision problems are a formalization of the practically important task of “write a simulator for the hierarchical aTAM.”

Shape building is one goal of self-assembly; another is pattern painting. Briefly, we paint some tile types black (for example, by attaching a streptavidin marker molecule) and say the pattern assembled is the set of positions in the final assembly with a black tile. Such a definition is appropriate for modeling practical goals such as self-assembled circuit layouts, such as those studied by Cook, Winfree, and Rothemund.¹⁹ Although the proof techniques for these verification problems generalize easily to pattern assembly, it remains open to prove or disprove pattern assembly analogues of the optimization results noted here. The optimization problems are uncomputable if the tile system is allowed to grow arbitrarily far outside the pattern, using the same argument that shows Kolmogorov complexity is uncomputable. If the minimum tile system to assemble a given finite pattern could be computed, then this computation could be simulated in the second quadrant by a tile system of size O(1) + log n searching for the first pattern on the positive x-axis that requires more than n tile types to assemble, at which point the tile system could grow back to the x-axis assemble that pattern, a contradiction. So to establish (for instance) NP-completeness requires also specifying a bounding box as part of the input and requiring that the assembled shape is a subset of the bounding box. Göös and Orponen³⁰ and Lempiäinen, Czeizler, and Orponen³⁵ developed branch-and-bound algorithms for a variant of this problem, but these algorithms require exponential time.

One can also consider the assembly of infinite patterns (as in Figure 2d) as the “computability-theoretic” version of self-assembly. Lathrop et al.³⁴ showed that not every computable pattern (subset of ²) can be self-assembled, since each time step of computation with tiles irreversibly occupies a point in space, and for some patterns, it requires strictly more time to compute whether that point is part of the pattern. However, with regard to one-dimensional patterns, Patitz and Summers⁴² showed that every computable subset of can be self-assembled on the x-axis (crucially using the fact that the assembly occupies a large area outside the x-axis where the actual computation happens). Lathrop et al.³⁴ showed a weaker result for computably enumerable sets: there is a simple quadratically bounded function f: → such that, for every computably enumerable A ⊆ , f(A) can be self-assembled on the x-axis.

Assembly Time Complexity

Another resource bound to consider is the time required to build a shape (as determined by stochastic chemical kinetics). There is a linear amount of parallelism available for assembling a shape of size N in fewer than N steps. This is because the frontier (set of potential binding sites) of an assembly of size N is potentially as large as O(N), and in a given unit of time, a constant fraction of the frontier will have tiles attach if tile types are sufficiently concentrated. Adleman et al.² showed that any shape of diameter D requires time Ω(D) to assemble from a deterministic tile system in the aTAM, later shown by Chen and Doty¹³ to hold for nondeterministic systems. Adleman et al.² showed that this lower bound is tight for the assembly of n × n squares, that is, they can be assembled in time O(n), achievable using the information-theoretically optimal O(log n/log log n) tile types. Reif⁴³ also studied local parallelism within the seeded aTAM, for the purpose of computation rather than shape-building. Reif showed that for many problems such as prefix sum and bitonic merge, it is possible to design a tile system that solves the problem using the optimal parallel speedup. Reif used a generalization of the model known as step assembly, in which tile types are added to a test tube in a series of steps, with each step assumed to run to completion before washing away remaining unbound tiles, before adding new tile types.

In a variant of the model known as the hierarchical aTAM, two assemblies that have assembled independently in parallel are allowed to aggregate together, as opposed to the standard seeded aTAM in which tiles attach one at a time to a growing assembly. The distributed parallelism of the hierarchical aTAM dwarfs the linear local parallelism of the seeded aTAM, so a natural conjecture is that hierarchical assembly could be much faster than seeded assembly. Chen and Doty¹³ showed there is a hierarchical tile system with the optimal tile types that assembles an n × n square using O(log² n) parallel stages, which is close to the optimal log n stages, forming the final n × n square from four n/2 × n/2 squares, which are themselves recursively formed from n/4 × n/4 squares, and so on. However, despite this nearly maximal parallelism, the system actually requires superlinear time to assemble the square. They extended the definition of partial order tile systems studied by Adleman et al. in a natural way to hierarchical assembly and show that no hierarchical partial order tile system can build any shape with diameter N in less than time Ω(N). Intuitively, although attaching an assembly β of size k to another assembly α increases the size of α by k tiles in one step, a conservation of mass argument shows that the concentration of β is at most 1/k, so we expect to wait at least k time steps for this attachment to occur. However, Chen and Doty also showed the partial order hypothesis to be necessary, since for infinitely many n, a tile system can assemble a diameter-Θ(n) rectangle in time O(n^4/5 log n), breaking the linear-time lower bound that applies to all seeded systems and partial order hierarchical systems. It is an open question to determine precisely the extent to which hierarchical assembly imparts a speedup over single-tile assembly.

Intrinsic universality

Doty et al.²⁴ showed a different notion of universality, with respect to growth rather than computation, constructing a single tile set U that simulates any tile system T, by choosing an appropriate seed assembly to encode T using tiles from U. The simulation is intrinsic in the sense that the self-assembly process carried out by U is exactly that carried out by T, with each tile of T represented by an m × m block of U tiles. Therefore there is a single tile set that can (modulo rescaling) carry out any self-assembly process that can be achieved by any tile system. The challenge is to program all this activity without ever blocking a path that may later be needed for intra-block communication, since no block representing a tile is necessarily aware of the identities of its neighboring blocks, whether they are even present, and if not, whether they will ever arrive.

Power of cooperative Binding

Cooperative binding is used crucially in all non-trivial constructions in the aTAM. It is also the most challenging aspect of the model to enforce experimentally, so it is reasonable to question its necessity in algorithmic self-assembly. Doty, Patitz, and Summers²⁷ investigated the power of temperature 1 self-assembly in the aTAM, in which all positive glue strengths have sufficient energy to bind a tile, which effectively inhibits the use of cooperative binding. They show that a wide class of deterministic tile systems, satisfying a condition known as pumpability, are incapable of universal computation in the sense that they self-assemble only “periodic” (semilinear) sets. They conjecture that all deterministic, temperature 1 tile systems are pumpable. Maňuch, Stacho, and Stoll³⁸ obtained similar results studying temperature 1 tile systems that assemble finite assemblies and guarantee no glue mismatches, showing that Ω(n) tile types are required to assemble shapes of diameter n.

However, with slight changes to the model, universal computation becomes possible. Adleman et al.⁴ devised a nondeterministic temperature 1 tile system that simulates a Turing machine in the following sense: the system has many terminal assemblies, but the unique largest terminal assembly represents the computation. Cook, Fu, and Schweller¹⁸ showed that deterministic tile systems can achieve universal computation if allowed to grow in three dimensions. Patitz, Schweller, and Summers⁴¹ showed that if negative glue strengths are allowed then universal computation is possible, essentially using negative-strength glues to enforce cooperative binding.

But the original question stands: what is the computational power of deterministic, planar, positive-strength, temperature 1 self-assembly? Is cooperative binding truly necessary to compute by planar self-assembly?

The Future

In the early history of computability theory, the optimistic term “elementary” was applied to problems computable in time bounded by a tower of exponentials such as 2^{2^2ⁿ}. It was not until the 1960s, once faster and faster computers had been built to execute actual algorithms, that polynomial time came to be seen as a more realistic notion of feasibly implementable. Today we confront a similar situation in algorithmic self-assembly, where experimental work has not yet caught up to theory. When it does, the theory will doubtlessly take unanticipated turns. One concrete way that theory can help experiment today is to further develop error correction, such as reducing tile complexity of error-correction constructions or handling broader classes of tile systems than previous constructions.

The complexity of today’s operating systems and other software is made possible only by the development of software engineering techniques to help manage the complexity. Eventually molecular self-assembly systems will scale to the point that no one person could manually keep track of all the types of molecular components and how they are all supposed to fit together. Software simulators for the aTAM exist^40,55 but require the user to think at the level of individual tile types, which quickly begins to feel like assembly code programming with a nightmarish instruction set. There is some preliminary work^8,25 developing higher-level languages for describing aTAM tile systems, but we are far from having a “Python of self-assembly.”

Eventually molecular self-assembly systems will scale to the point that no one person could manually keep track of all the types of molecular components and how they are all supposed to fit together.

DNA tile assembly is an example of passive self-assembly, in which the tiles are mobile but otherwise stateless. Active self-assembly, using components with a changeable state, may enable sophisticated structures to be built at a lower cost in terms of time, tile complexity, or error-resistance. Majumder, LaBean, and Reif,³⁶ as well as Fujibayashi et al.,²⁹ and Padilla, Liu, and Seeman,³⁹ have proposed theoretical suggestions to augment tiles with a physical mechanism known as strand displacement to enable active signaling across tiles. It remains to see what will actually work in the laboratory, but whatever works will provide a rich theory to explore. As in any nascent field, the most exciting aspect is not knowing what the future holds.

Acknowledgments

I thank Damien Woods, Matt Patitz, and Scott Summers for helpful suggestions.

The author was supported by a Computing Innovation Fellowship under NSF grant 1019343 and NSF grants CCF-1219274 and CCF-1162589, and by the Molecular Programming Project under NSF grant 0832824.

Figures

Figure 1. Experiments with double-crossover tiles.

Figure 2. Abstract Tile Assembly Model (aTAM).

Figure 3. a) Growth error. b) 2 × 2 Proofreading. Each tile type t is replaced by a k × k block of tile types, with glues internal to the block unique to t.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Theory of Algorithmic Self-Assembly

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2380656.2380675

December 2012 Issue

Published: December 1, 2012

Vol. 55 No. 12

Pages: 78-88

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Oct 28 2025

The Reasons AI May Act Secretive

Jennifer Goforth Gregory

Artificial Intelligence and Machine Learning

humanoid robot looking through window blinds

News Oct 28 2025

AIs Capture Trust Cues

R. Colin Johnson

Artificial Intelligence and Machine Learning

BLOG@CACM Oct 27 2025

Remember Your First Time (in Philosophy)?

Robin K. Hill

Architecture and Hardware

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Key Insights

Parallelism in Molecular Computing: The Bad News

Computational Universality

Modeling Errors

Tile Complexity

Computational Complexity

Assembly Time Complexity

Intrinsic universality

Power of cooperative Binding

The Future

Acknowledgments

Figures

Theory of Algorithmic Self-Assembly

DOI

December 2012 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.