Sign In

Communications of the ACM

Review articles

Algebraic Fingerprints For Faster Algorithms


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Algebraic Fingerprints for Faster Algorithms, illustration

Credit: Iwona Usakiewicz / Andrij Borys Associates

It was a major surprise when, in 2010, Andreas Björklund discovered what many previously thought impossible: a significantly improved algorithm for the famous Hamiltonian path problem, also known as Hamiltonicity.4 Hamiltonicity asks if a given graph contains a path that goes through each vertex exactly once, as illustrated in Figure 1.

Back to Top

Key Insights

ins01.gif

Hamiltonicity was one of the first problems shown to be NP-complete by Karp.20 The only known algorithms for NP-complete problems require time scaling exponentially with the size of the input. It is believed they cannot be solved much faster in general, although the possibility has not been ruled out.17 Given an undirected graph on n vertices, Björklund's algorithm can find a Hamiltonian path or report that no such path exists in O*(1.657n) time.a The algorithm still runs in exponential time but it is much faster than the O*(2n) running time of the previously fastest algorithm, known since the 1960s.3,19

Hamiltonicity is a prominent algorithmic problem. Many researchers before Björklund have tried their hand at it, without success. But some of the tools Björklund used did not become available until 2009 thanks to progress in the k-path problem, a related problem in the context of parameterized algorithms.

A parameterized race and a conjecture. The k-path problem is a natural parameterized analogue of Hamiltonicity. The goal now is to find in the given graph a path of length k for some specified value of k, rather than a path of length n. It could be argued that, from a practical standpoint, the k-path problem is better motivated relative to Hamiltonicity. Algorithms designed specifically for the k-path problem have been actually used to detect signaling pathways in large protein interaction networks.29

Most NP-complete problems have similar parameterizations, where the parameter k measures the solution size to the given instance. This is natural because in practice, the solution length to an NP-hard problem is often short relative to the length of the overall instance. Downey and Fellows12 observed that some parameterized problems appear to be "easier" than others. To capture that, they defined the class of fixed parameter tractable (FPT) problems. The definition of FPT is rather simple. The running time of a parameterized algorithm should clearly depend on the input size n, and the parameter k. But the dependence should be somewhat special: a problem is defined to be FPT if it can be solved by an algorithm in O*(f(k)) time.

The FPT notion has proven to be extremely insightful and has led to a more detailed understanding of problem hardness. It is believed that not all NP-hard problems are FPT; a prominent example is the k-clique problem, where the goal is to detect a k-subset of the network nodes that are all pairwise connected (see Figure 2). The best known algorithm for k-clique is not much better than exhaustive search, which takes time O(nk). However, experience shows that when a problem is shown to be FPT, algorithmic progress has just begun as it is often the case that the function f(k) in the running time can be improved.

The k-path problem is a prime example. The problem was shown to be FPT by Monien25 who described an algorithm running in O*(k!) time. This was later improved to O*((2e)k) by Alon et al.2 and to O*(4k) by Chen et al.10 Each of these steps represents the introduction of a significant new idea. Similarly, for many FPT problems there are now races for faster parameterized algorithms,27 which have enriched the field of algorithm design with new techniques.

However, with each improvement, one is faced with the question: Is further progress possible? Known algorithms for the "parent" NP-complete problem can be a valuable guide to answering this question. This is because their running time sets a clear target for the parameterized algorithm. In the case of Hamiltonicity, we know of an algorithm that runs in O*(2n) time. It is then reasonable to conjecture that there is an algorithm for k-path that runs in O*(2k) time, "matching" the Hamiltonian path algorithms for the extreme value of the parameter.

Roadmap. The development of the algorithmic techniques we are about to discuss was largely driven by the k-path conjecture. But for the sake of simplicity we will illustrate them in the context of another important NP-complete pro blem known as the 3D-matching problem, which has also played a central role. In fact, the Hamiltonicity algorithm was preceded by another record-breaking algorithm for the 3D-matching problem, also due to Björklund.5

In this article, our first step toward faster algorithms is what we call an algebraization of the combinatorial problem, that is, capturing the problem as a question about the existence of certain monomials in a polynomial. We discuss one algebraization of the 3D-matching problem later. Given the polynomial, the original combinatorial question becomes now an issue of algebraic detection of monomials; that is, extracting information from the polynomial by assigning values into its variables. Looking for assignments that fit our purpose will lead us to calculations modulo 2, over an algebra in which sums of pairs cancel out. This in turn will present us with a problem of unwanted cancellations of monomials. We will solve it with the method of fingerprints that augments the 3D-matching polynomial in order to ensure no unwanted cancellations occur. The k-path conjecture was answered in the positive via a generalization of algebraic detection with fingerprints, which also yielded many other faster parameterized algorithms. These ideas are reviewed later, as is Björklund's key twist that helped unlock the faster Hamiltonicity algorithm. While still employing in a relaxed way the method of fingerprints, Björklund treated computation modulo 2 not just as a nuisance, but also as a resource, by presenting algebraizations where many unwanted monomials cancel out, because they come in pairs. In order to derive these sharper algebraizations that exploit cancellations, Björklund tapped into the power of algebraic combinatorics that study graphs via linear algebra. Faster algorithms for several other parameterized problems followed.

Back to Top

Algebraization

Consider the following fictional airline scheduling problem: on any given day there is a set X of designated captains, a set Y of designated second officers, and a set Z of destinations. Each captain and second officer declare a number of preferred destinations. The airline would like to accommodate as many preferences as possible; but at the same time, they would like to match on the same flight a pilot and second officer who have flown together before. This creates a number of triples: if captain x has previously flown with second officer y and they would both like to fly to destination z, they define a triple {x, y, z}. Of course, any given captain, second officer or destination can appear in many triples. But to maximize utility, the airline would like to select a big matching, that is, a subset of triples that are pairwise disjoint; based on it they can then schedule the personnel. The NP-complete problem asks for a matching of maximum size, while the parameterized problem asks for a k-matching with at least k triples, which we also call a k-3D matching. An example of the problem is shown in Figure 3.

A polynomial for 3D-matching. It was probably understood by many before it was made explicit in Koutis21 that we can view the k-3D-matching problem through an algebraic lens. Let us explain how, by means of the example in Figure 3.

First, we view as a variable each element of the sets X, Y, Z. That is we have the variables X = {x1, x2, x3}, Y = {y1, y2, y3} and Z = {z1, z2, z3, z4}. Then for each triple we construct a monomial, by taking the product of the corresponding variables. In our example of Figure 3, we have the monomials.

ueq01.gif

We also define the instance polynomial P1 to be the sum of these monomials. Finally, we set &cacm5901_a;; we will call Pk the kth encoding polynomial. For instance, for the 2nd encoding polynomial, we have

ueq02.gif

To see the motivation behind these definitions, consider the expansion of P2 into a sum-product form.

eq01.gif

The monomials in the sum-product expansion fall into two basic classes. In the solution part of the expansion, monomials that correspond to solutions of the problem are linear; the product does not contain squares or other high powers of variables. As an example, the monomial x2x3y1y3z1z4 in Equation (2.1) is linear, and corresponds to selecting the second and fourth triples. In the complementary non-solution part, each nonlinear monomial corresponds to a "non-solution," that is, a choice of non-disjoint triples.

This construction can be generalized to any instance of the problem; the only difference would be that if we want to check for a k-matching, we would have to look at the sum-product expansion of the polynomial Pk. To summarize:

Observation. An instance of the 3D-matching problem contains a matching of size k if and only if the sum-product expansion of the kth encoding polynomial Pk contains a linear monomial.

Therefore, to solve the 3D matching problem, it would suffice to expand the polynomial Pk into a sum of products and check if the sum contains a linear monomial. However, an O*(2N) dynamic programming algorithm is known, for N = |X| + |Y| + |Z|. On the other hand, the number of possible monomials in N variables is much larger than O*(2N). Thus, the simple idea of expanding Pk into a sum of products does not give a good algorithm.

The dynamic programming algebra. Dynamic programming often involves inductive definitions that are tedious to state; the analogue in our algebraic setup is perhaps more intuitive. It can be naturally viewed as a "truncated" expansion of the polynomial Pk into a sum-product form. More concretely, imagine expanding Pk slowly, one multiplication at a time. We observe that monomials containing squared variables can be thrown away as soon as they are formed, because they do not affect the presence of linear monomials in the full sum-product expansion of Pk. There are 2N linear monomials in N variables. This implies the "truncated" expansion can be carried out in O*(2N) time. If there is at least one monomial left in the final truncated sum-product form of Pk, we can conclude the given instance contains a 3D matching of size k. Otherwise, it does not.

More formally, we are computing the sum-product expansion of Pk in an extended dynamic programming algebra of polynomials that has all the usual rules for multiplication and addition of polynomials, plus the additional rule that all squared variables evaluate to zero. Notice this implies that all nonlinear monomials also evaluate to zero. We can thus recast in algebraic terms our observation regarding linear monomials of Pk:

An instance of the 3D-matching problem contains a matching of size k if and only if the kth encoding polynomial Pk is not identical to zero in the dynamic programming algebra.

In general we can have N = 3k, since all elements of X Y Z could participate in a 3D matching. However, if we are merely looking for a 3D matching containing only k triples, then we might set as our target an O*(23k) time parameterized algorithm.

ins02.gif

Parameterizing via assignments. The goal here is to describe a first parameterized algorithm for the 3D-matching problem, which will demonstrate the use of assignments of the variables of the polynomial in order to extract information from it.

The problem with "parameterizing" the O*(2N) algorithm lies clearly in the number of variables N. Trying to deal with this problem Alon, Yuster and Zwick2 came up with a method known as color coding. Their solution can be understood as an assignment as follows. Going back to the earlier example, suppose we are interested in finding a matching of size 2. Then, as discussed, all linear monomials in the sum-product expansion of P2 have degree six; they are products of six distinct variables. Alon et al. proposed the introduction of a new set W containing exactly six variables (more generally 3k variables for a k-matching). So, let

ueq03.gif

We then perform a random assignment of the variables in W into the variables in X, Y, Z. We can also think of this assignment as a random coloring of the N variables with k colors. Let us consider one such assignment:

ueq04.gif

By just substituting in Equation (2.1), P2(X, Y, Z) becomes now a polynomial in the variables W:

ueq05.gif

It can be seen that monomials in Pk(X, Y, Z) get mapped to monomials in Pk(W). If a monomial in Pk(X, Y, Z) is not linear, the same holds for the corresponding monomial in Pk(W). In contrast, a linear monomial of Pk(X, Y, Z) may or may not survive as a linear monomial Pk(W). In our example all but one linear monomials are mapped to a linear monomial in P2(W). So detecting a linear monomial in Pk(W) allows us to infer its presence in Pk(X, Y, Z) as well, and consequently the existence of a k-matching in our original problem. The inverse may be not true, as it may be the case that no linear monomial survives the assignment.

We can evaluate Pk(W) in the dynamic programming algebra as we did with Pk(X, Y, Z). Because now W contains only 3k variables, the evaluation takes O*(23k) time. However, the probability that any given linear monomial survives through a random color coding assignment is very small; it can be calculated to be around e3k where e 2.72. This means one has to try O(e3k) random assignments in order to have the monomial survive with an acceptable constant probability, independent from k. So, the overall running time is O*((2e)3k). This is still a factor of e3k away from our O*(23k) target.

ins03.gif

Back to Top

Algebraic Detection

The methods mentioned previously for detecting a linear monomial are essentially combinatorial, but we presented them in algebraic guise. Can we find an even faster algorithm if we use a genuinely algebraic method?

A matrix assignment. Earlier, we considered the idea of an assignment into the variables of the encoding polynomial P and its subsequent evaluation in the dynamic programming algebra. The dynamic programming algebra and the color-coding assignment are only one possibility. There are numerous possible algebras and assignments. Matrices in particular seem to offer rich possibilities. An attractive feature of matrix algebra is that the square of a matrix can be 0, offering a genuinely algebraic way of "implementing" the dynamic programming algebra, and specifically its rule that squares of variables should be zero. As a simple example:

ueq06.gif

This leads us to the following list of requirements for a set of random matrices to be used for detecting linear monomials of degree k, that is, products of k distinct variables.

  1. To simplify reasoning about monomials, the matrices must be pairwise commutative, that is, the order in which we multiply them should not affect the value of the product.
  2. The square of each matrix must be equal to zero. Along with commutativity, this implies that nonlinear monomials will evaluate to zero as well.
  3. Linear monomials of degree k must "survive" the assignment, that is, evaluate to a non-zero matrix, with constant probability, say at least 1/4. It would be convenient if this non-zero matrix contains only ones in its diagonal.
  4. Since we are looking for speed, the matrices must be as small as possible, in order to support fast evaluation. Ideally, matrix operations should take time O*(2k) or less.

In Koutis22 an efficient randomized construction of such matrices was given, showing it is possible to satisfy all these requirements if we are willing to relax our notion of zero:

By "zero," we now mean zero modulo 2

We will refer to this as the The Mod-2 Matrix Fact.

We would not discuss the technical details of how to prove the Mod-2 Matrix Fact, since all we need here is to understand its algorithmic power and consequences.

Comparing with the color coding method, we see that evaluation of the polynomial takes the same time, O*(2k) for detecting a linear monomial with k variables. The gain is in the probability that a linear monomial survives the assignment; from the exponentially small probability ek, we went to a constant probability of 1/4. This appears to suggest we got rid of the undesired ek factor and have reached our target of detecting degree-k linear monomials in O*(2k) time and therefore a size k 3D matching in O*(23k) time.

Unwanted cancellations. However, a more careful look reveals we are not done yet due to our relaxed notion of what is zero. It is clear the polynomial P(X) evaluates to zero if it does not contain linear monomials. However, the Mod-2 Matrix Fact does not guarantee P(X) evaluates to non-zero, even if some of its linear monomials evaluate to non-zero. The most significant source of this problem is not in the evaluation, but rather in the polynomial P itself: by taking a look at the sum-product expansion in Equation (2.1) we see that the solution part of P that consists of the linear monomials is already equal to 0 mod 2, just because the monomials are multiplied by 2.

The method of fingerprints. We tackle this problem of multiple copies of linear monomials by ensuring each monomial is unique, through the use of a set of "fingerprint" variables A = {a1, a2, ...,}. We illustrate the idea on the k-3D-matching polynomial P2 which we will now define as follows:

ueq07.gif

All we are doing here is multiplying each occurrence of a triple in P2 with a "fresh" variable from the set A. Consider then what happens to the two copies of the monomial x2x3y1y3z1z4; they are now replaced by two separate linear monomials: a2a8x2x3y1y3z1z4 and a6a4x2x3y1y3z1z4.

It appears the introduction of the auxiliary variables comes at the cost of getting linear monomials of higher degree, which would require larger matrices in order to survive through an evaluation, resulting in a slower algorithm.

However, an alternative idea is to use a different assignment for the A variables. For example, we can assign random {0, 1} values in the variables of A. In essence we wish to create an odd number of copies for the linear monomials, so that the solution part of the encoding polynomial is not trivially zero modulo 2. In Koutis22 it was shown this idea is enough for linear monomial detection to go through in the 3D-matching case. That is, by randomly assigning matrices to the variables X, Y, Z, and {0, 1} values to the fingerprint variables, the polynomial will evaluate to non-zero with constant probability if its sum-product expansion contains a linear monomial and always to zero otherwise.

Target reached. We have designed an O*(23k) time algorithm for the k-3D-matching problem.

A general algebraic framework. We have come awfully close to a fast parameterized algorithm for a problem much more general than the k-3D-matching problem: that is the detection of degree-k linear monomials in the sum-product expansion of arbitrary polynomials P(X) that are given to us in some concise, "unexpanded" representation. The importance of this problem was established in Koutis22,24 where it was observed that a fast algorithm for it implies faster algorithms for several parameterized problems, including the k-path problem. So, what do we need to do in order to generalize the method?

All steps described here carry over to arbitrary polynomials, including generating unique linear monomials with an appropriate pre-processing of the polynomial and a placement of fingerprint variables in it. However, we are still stuck with the multiple copies problem because the random {0, 1} assignment to the fingerprint variables is not guaranteed to work for any polynomial. We will thus need to generalize to some other type of assignment that works in arbitrary situations, while still being compact and easy to handle computationally.

Here is the solution proposed in Williams.32 Imagine evaluating P(X, A) in two stages. First we assign random matrices &cacm5901_c; from the Mod-2 Matrix Fact into the X variables and compute P(&cacm5901_c;, A), leaving the A variables unevaluated. The result is a matrix whose diagonal is a polynomial Q(A). The Mod-2 Matrix Fact implies that Q(A) is zero modulo 2 if P(X) does not contain a linear monomial. If P(X) does contain a linear monomial then Q(A) is non-zero with probability at least 1/4. This consideration crystallizes the problem: we wish to be able to test whether Q(A) is identical to zero modulo 2, and we would like to do so by evaluating Q(A) on some "compact" assignment to its variables. This problem, however, has been studied and solved earlier. It is known as identity testing.

A solution to identity testing, known as SchwartzZippel Lemma,26 is simple and intuitive. The reason we picked the values {0, 1} in our assignment for the 3D-matching polynomial is that with them we can perform multiplication and addition that respects modulo 2 arithmetic. But there are other algebraic "fields" that allow us the same kind of operations, with the bonus fact they are larger, in the sense they contain more values. If instead of {0, 1} we pick a field consisting of O(k) values the number of possible assignments to the fingerprint variables by far outnumbers the number of possible roots of Q(A), that is, the number of assignments that make it evaluate to zero. Hence a random assignment from this larger field will result with high probability in Q(A) evaluating to a non-zero value in the field. This allows us to claim the following result.

Fast linear monomial detection. The problem of detecting a degree-k square-free monomial in an arbitrary polynomial P(X) can be solved in O*(2k) time.

ins04.gif

Back to Top

Breaking Barriers

A faster parameterized algorithm for linear monomial detection would have a tremendous impact, as it would imply faster algorithms not only for many parameterized problems, but also for the corresponding non-parameterized NP-hard problems. Thus, an intriguing and natural question is whether the problem can be solved faster by evaluating the input polynomial over a more exotic algebra supporting faster operations. Unfortunately, the question has been answered in the negative; it is impossible to find a better algebra.24

This algebraic barrier suggests new techniques are required for further progress in the linear detection problem, if progress is possible at all. However, one can be more optimistic about specific problems. Linear monomial detection is very general and completely agnostic to combinatorial properties of the underlying problem, so taking advantage of specific problem properties may sometimes get around the algebraic barrier. On the other hand, attempting to make progress on well-studied NP-complete problems, such as the Hamiltonian path problem or the 3D-matching problem, one faces a perhaps more significant psychological barrier: a lack of progress in nearly 50 years.

In brilliant work, Andreas Björklund broke the psychological barrier with an O*(2N/3) time algorithm for the "exact" X3D-matching problem5 and an astonishing O*(1.657n) time algorithm for the Hamiltonicity.4 These running times were later almost matched by parameterized algorithms.6 The reader can find in Fomin et al.14 an excellent exposition of the algorithm for the Hamiltonian path problem for bipartite graphs.

Central to Björklund's work are sharper algebraic tools that draw from the large pool of algebraic combinatorics (for example, Royle28) to produce lower degree polynomials. More crucially, Björklund proposed the idea of relaxing the method of fingerprints in order to exploit cancellations of pairs of non-solution monomials. Here, we will see how these ideas got applied in the case of the X3D-matching problem.

Counting weighted matchings Mod 2. Consider a restriction of our 3D-matching problem to a 2D-matching problem in which we are given pairs consisting of a captain and a second officer, rather than triples. We will assume the two sets X and Y are of equal size n.

The problem has a graph representation as shown in Figure 4. A solution of n edges/pairs that covers all vertices of the graph is called a perfect matching.

The perfect matching problem can be solved in polynomial time. But counting the number of perfect matchings is known to be a hard problem; it is as hard as counting the number of solutions for any NP-complete problem.30 In fact, it is also hard to compute the number of perfect matchings modulo any prime p, except when p = 2. If p = 2 the number of perfect matchings modulo 2 is equal to the determinant of the incidence matrix of the bipartite graph.28 This key fact is going to play a significant role, because the matrix determinant can be computed with a polynomial number of arithmetic operations.

To define the incidence matrix A, we arbitrarily number the nodes in X with numbers from 1 to n, and do the same with the nodes in Y, as shown in Figure 4. The entry A(i, j) is 1 if node i X is connected to node j Y, and 0 otherwise.

Given this definition, if N2 denotes the number of perfect matchings, we have what we call the Matching Lemma:

ueq08.gif

We now consider an extension of the above lemma that will be particularly useful. The perfect matching counting problem has a natural generalization to edge-labeled multi-graphs, where (i) we allow multiple "parallel" edges between any two given nodes and (ii) each edge e is labeled with a unique monomial le.

The signature of a matching is defined to be the monomial formed by taking the product of the labels of the edges in the matching . For example, the signature of the matching in Figure 4 is l1,4l2,1l3,2l4,3. More generally,

ueq09.gif

Given a labeling of the edges we can extend N2 to a solution polynomial N2(L), which is simply the sum of the signatures of all perfect matchings, such as if M is the set of perfect matchings in the graph, then

ueq10.gif

Finally, we can extend the incidence matrix A to a matrix AL over L in a natural way: if nodes i and j are connected by one or more edges, we let AL(i, j) be the sum of their labels. We have the following Generalized Matching Lemma:

eq02.gif

Monomial detection for X3D-matching. We now return to the X3D-matching problem, which is the 3D-matching problem augmented with the constraint that |X| = |Y| = |Z| = n. In this case we want to decide if there is a solution that consists of exactly n = N/3 triples, that is, a solution that covers all elements of X, Y, Z.

The algorithm involves a transformation of the X3D matching problem into a labeled matching instance. The sets X and Y define the two parts of the bipartite graph. In order to determine the labels, we use two sets of variables, Z and U. Each destination corresponds to a distinct variable in Z and each triple corresponds to a distinct variable in U.

To see now how we form the labeled bipartite graph we give an example that illustrates the edges between two given vertices of the graph, along with their labels. Suppose the X3D-matching instance contains the triples

eq03.gif

and no other triples with {W, K} as a captain and second officer. Also suppose we have associated with these triples the variables u1, u2, u3, respectively. Then, nodes W and K will be connected by three edges, labeled respectively with the monomials u1zSJU, u2zSFO, and u3zPIT.

Consider now the polynomial N2(Z, U) mod 2. Because each edge of the bipartite graph corresponds to a triple, any monomial of N2(Z, U) mod 2 corresponds to the selection of n triples, because it is the signature of a matching. By definition of N2(Z, U) mod 2 these n triples cover exactly and without overlaps the sets X and Y. In the case there is an overlap in destinations, the monomial contains a squared variable, corresponding to that destination. On the other hand if a set of triples forms a 3D matching, the monomial is linear with respect to the variables in Z because no destination appears twice. So, assigning the random matrices from the Mod-2 Matrix Fact to the variables in Z we zero-out the nonlinear monomials. In addition, we observe the coefficient of each monomial in N2(L) mod 2 is 1, because the U variables specify each set of n triples. Thus the U variables are our fingerprint variables and we will assign to them values from an appropriate algebraic field of size O(n). With these assignments, the outcome of the evaluation will be non-zero with a good probability if and only if N2(L) contains a monomial corresponding to a solution of the instance. Hence we can detect an exact 3D matching if there exists one. The running time of the algorithm is O*(2n) because the degree of N2(Z, U) in terms of the Z variables is n.

Determinants and cycles. The monomials that are canceled when the determinant is computed modulo 2 as in Equation (4.2) correspond to cycle covers. A cycle is a path plus one edge that closes the loop. A cycle cover is a set of cycles in the graph such that each vertex participates in exactly one cycle. In particular, a Hamiltonian cycle, that is, a cycle containing all the vertices, is a cycle cover. A perfect matching is also a cycle cover albeit consisting only of "degenerate" cycles, that is, edges. Cycle covers that contain at least one non-degenerate cycle come in pairs basically because each such cycle can be traversed in two possible ways: clockwise and counterclockwise. Björklund's Hamiltonicity algorithm, which actually targets Hamiltonian cycles rather than paths, breaks this symmetry by introducing direction to some edges in the graph. His algorithm still relies crucially on the remaining symmetries and cancellations. It remains open whether Hamiltonicity has a faster than O*(2n) time algorithm for general directed graphs.

ins05.gif

Back to Top

More Recent Advances

Since its appearance, the general framework of algebraic fingerprints with modulo computations has been used in the design of faster algorithms for several parameterized problems. Examples include: finding subgraphs that are more complicated than paths,16,31 finding functional motifs in biological networks,8,18,23 finding the shortest cycle through specified nodes in a given graph,7 and the repetition-free longest common subsequence problem for strings, with applications in computational biology.9 Among the many examples, we highlight remarkable progress in algorithms for NP-complete problems on graphs, parameterized by the so-called treewidth of the input graph.

Several NP-complete problems on graphs are tractable if the graph is a tree, that is, if it contains no cycles. The NP-completeness of graph problems is usually proved via the construction of very intricate graphs that are arguably artificial comparing to graphs arising in practical situations. In particular, real-world graphs often have a more tree-like structure.

The notion of treewidth offers a way to quantify how much a given graph "deviates" from a being tree. An example is given in Figure 5. The vertices of the graph are arranged in a tree structure. Each tree node lists copies of some of the graph vertices. Each edge of the graph connects two vertices that are listed together at some tree node. In addition, any given graph vertex can appear only in contiguous tree nodes. The treewidth tw of a graph is defined as the maximum number of graph vertices hosted in a tree node; in our example tw = 3. As a more general example, the natural class of planar graphs, that is, graphs that can be drawn on the plane without crossings has treewidth &cacm5901_d;, where n is the number of vertices.

Cygan et al.11 gave faster algorithms for many graph problems parameterized by treewidth. Previous algorithms had running time of the form O*(twtw), while the new algorithms have dramatically improved running times of the form O*(ctw) for small constants c; for example the Hamiltonian path cycle can be now solved in O*(4tw) time.

Back to Top

Open Problems

The algorithms in this article are all randomized; on any given run they have a small probability of reporting that an instance does not have a solution, when the opposite is true. But running them O(log(1/p)) times will make the overall probability of failure smaller than p, for any p > 0. The fastest known deterministic algorithm for the k-path problem requires O*(2.85k) time.15 Finding a deterministic algorithm that solves the problem in O*(2k) time remains an open problem. The way things stand, it would seem to require a deterministic version of the SchwartzZippel Lemma: a deterministic polynomial-time algorithm for testing whether a polynomial given as an arithmetic circuit is identically zero.

The method of algebraic fingerprints can be adapted to solve the weighted k-path problem; here the goal is to find a k-path of minimum total cost, assuming a cost is attached to every edge of the graph. The running time is O*(2kW) where W is the largest cost associated with an edge in the graph. On the other hand, dynamic programming can handle the problem in O*(2n log W) time, when k = n. It is reasonable to conjecture that k-path has an O*(2k log W) algorithm. It should be noted though that, unlike the algebraic fingerprints method that can be implemented in memory of size polynomial in n, color coding for weighted k-path requires O*(2k) memory, which can be a very limiting factor in practice.

Dynamic programming can also be used to count the number of Hamiltonian paths. Counting the number of k-paths exactly is probably not FPT,13 but color coding can be used to count k-paths approximately, in O*((2e)k) time.1 This stands as the fastest known algorithm for this problem, but again it is reasonable to conjecture there is an O*(2k) algorithm for approximate counting.

Solving any of these open questions may require fresh ideas that will start a new cycle of discovery.

Back to Top

Acknowledgments

I. Koutis is supported by NSF CAREER award CCF-1149048. R. Williams is supported by a David Morgenthaler II Faculty Fellowship, NSF CCF-1212372, a Sloan Fellowship, and a Microsoft Research Fellowship. The final phase of this work occurred at the Simons Institute for the Theory of Computing.

Back to Top

References

1. Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F., Sahinalp, S.C. Biomolecular network motif counting and discovery by color coding. In Proceedings 16th International Conference on Intelligent Systems for Molecular Biology (ISMB) (Toronto, Canada, July 1923, 2008), 241249.

2. Alon, N., Yuster, R., Zwick, U. Color coding. J. ACM 42, 4 (1995), 844856.

3. Bellman, R. Dynamic programming treatment of the travelling salesman problem. J. ACM 9 (1962) 6163.

4. Björklund, A. Determinant sums for undirected hamiltonicity. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010 (Las Vegas, Nevada, USA, October 2326, 2010). IEEE Computer Society, 173182.

5. Björklund, A. Exact covers via determinants. In 27th International Symposium on Theoretical Aspects of Computer Science, STACS 2010, J. Marion and T. Schwentick, eds. Volume 5 of LIPIcs (Nancy, France, March 46, 2010). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 95106.

6. Björklund, A., Husfeldt, T., Kaski, P., Koivisto, M. Narrow sieves for parameterized paths and packings. CoRR, abs/1007.1161, 2010.

7. Björklund, A., Husfeld, T., Taslaman, N. Shortest cycle through specified elements. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Y. Rabani, ed. (Kyoto, Japan, January 1719, 2012). SIAM, 17471753.

8. Björklund, A., Kaski, P., Kowalik, L. Probably optimal graph motifs. In 30th International Symposium on Theoretical Aspects of Computer Science, STACS 2013, N. Portier and T. Wilke, eds. Volume 20 of LIPIcs (Kiel, Germany, February 27March 2, 2013). Schloss Dagstuhl Leibniz-Zentrum fuer Informatik, 2031.

9. Blin, G., Bonizzoni, P., Dondi, R., Sikora, F. On the parameterized complexity of the repetition free longest common subsequence problem. Inform. Process. Lett. 112, 7 (2012), 272276.

10. Chen, J., Lu, S., Sze, S.-H., Zhang, F. Improved algorithms for path, matching, and packing problems. In Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms (SODA) (2007), 298307.

11. Cygan, M., Nederlof, J., Pilipczuk, M., Pilipczuk, M., van Rooij, J.M.M., Wojtaszczyk, J.O. Solving connectivity problems parameterized by treewidth in single exponential time. In IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, R. Ostrovsky, ed. (Palm Springs, CA, USA, October 2225, 2011), 150159.

12. Downey, R.G. and Fellows, M.R. Parameterized Complexity. Texts in Computer Science. Springer, 1999.

13. Flum, J., Grohe, M. The parameterized complexity of counting problems. SIAM J. Comput. 33, 4 (2004), 892922.

14. Fomin, F.V., Kaski, P. Exact exponential algorithms. Commun. ACM 56, 3 (Mar. 2013), 8088.

15. Fomin, F.V., Lokshtanov, D., Panolan, F., Saurabh, S. Representative sets of product families. In Proceedings of the Algorithms ESA 2014 22th Annual European Symposium, Wroclaw, Poland, September 810, 2014, A.S. Schulz and D. Wagner, eds. Volume 8737 of Lecture Notes in Computer Science (2014). Springer, 443454.

16. Fomin, F.V., Lokshtanov, D., Raman, V., Saurabh, S., Rao, B.V.R. Faster algorithms for finding and counting subgraphs. J. Comput. Syst. Sci. 78, 3 (2012), 698706.

17. Fortnow, L. The status of the P versus NP problem. Commun. ACM 52, 9 (2009), 7886.

18. Guillemot, S., Sikora, F. Finding and counting vertex-colored subtrees. Algorithmica 65, 4 (2013), 828844.

19. Held, M., Karp, R. A dynamic programming approach to sequencing problems. J. Soc. Indust. Appl. Math. 10 (1962) 196210.

20. Karp, R.M. Reducibility among combinatorial problems. In Complexity of Computer Computations, R.E. Miller and J.W. Thatcher, eds. (1972). Plenum Press, 85103.

21. Koutis, I. A faster parameterized algorithm for set packing. Inform. Process. Lett. 94, 1 (2005), 47.

22. Koutis, I. Faster algebraic algorithms for path and packing problems. In Proceedings of the 35th International Colloquium on Automata, Languages and Programming Colloquium, ICALP 2008, Part I: Tack A: Algorithms, Automata, Complexity, and Games, L. Aceto, I. Damgård, L.A. Goldberg, M.M. Halldórsson, A. Ingólfsdóttir, and I. Walukiewicz, eds. Volume 5125 of Lecture Notes in Computer Science (Reykjavik, Iceland, July 711, 2008). Springer-Verlag, 575586.

23. Koutis, I. Constrained multilinear detection for faster functional motif discovery. Inf. Process. Lett. 112, 22 (2012), 889892.

24. Koutis, I., Williams, R. Limits and applications of group algebras for parameterized problems. In Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I, ICALP '09 (Berlin, Heidelberg, 2009). Springer-Verlag, 653664.

25. Monien, B. How to find long paths efficiently. Ann. Dis. Math. 25 (1985), 239254.

26. Motwani, R., Raghavan, P. Randomized Algorithms. Cambridge University Press, New York, NY, USA, 1995.

27. Rosamond, F., Fellows, M. Table of FPT Races. http://fpt.wikidot.com/fpt-races/.

28. Royle, G., Godsil, C. Algebraic Graph Theory. Graduate Texts in Mathematics. Springer Verlag, 1997.

29. Scott, J., Ideker, T., Karp, R.M., Sharan, R. Efficient algorithms for detecting signaling pathways in protein interaction networks. J. Comput. Biol. 13, 2 (2006), 133144.

30. Valiant, L.G. The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 3 (1979), 410421.

31. Williams, R. Finding paths of length k in O* (2k) time. Inf. Process. Lett. 109 (February 2009), 315318.

32. Williams, V.V., Wang, J.R., Williams, R.R., Yu, H. Finding four-node subgraphs in triangle time. In Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, P. Indyk, ed. (San Diego, CA, USA, January 46, 2015). SIAM, 16711680.

Back to Top

Authors

Ioannis Koutis (i.koutis@gmail.com) is an assistant professor at the University of Puerto Rico, Rio Piedras.

Ryan Williams (rrw@cs.stanford.edu) is an assistant professor at Stanford University, Stanford, CA.

Back to Top

Footnotes

a. O*(f(k)) means a function smaller than p(n) · f(k) for some polynomial p(n).

Back to Top

Figures

F1Figure 1. A Hamiltonian path.

F2Figure 2. A 3-path and a 3-clique. Finding a k-clique appears to be much more difficult than finding a k-path.

F3Figure 3. A set of triples and a matching. [Source: Wikipedia]

F4Figure 4. A perfect bipartite matching.

F5Figure 5. A graph and its treewidth decomposition. [Source: Wikipedia]

UF1Figure. Watch the authors discuss their work in this exclusive Communications video. http://cacm.acm.org/videos/algebraic-fingerprints-for-faster-algorithms

Back to top


©2016 ACM  0001-0782/16/01

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.


 

No entries found