It is rare and rewarding to connect two vastly different areas of computer science. Fast randomized algorithms in model counting were discovered in the early 1980s, while the area of streaming algorithms did not take off in the theory community until the late 1990s. Only recently were these disparate areas connected in the accompanying paper, where it was observed that the algorithmic techniques developed in the two areas were strikingly similar. This connection has given us exciting streaming algorithms used in database design and in network monitoring, as well as a unified perspective on existing algorithms.
What exactly is model counting? Given a function ϕ, which might be specified as a formula on variables, a circuit, a neural network, among others, we can think of ϕ as a mapping from inputs to TRUE or FALSE. What model counting asks for is the number of inputs that ϕ maps to TRUE; each such input is called a model. From the applications side, various probabilistic inference problems, such as Bayesian net reasoning, can be translated into model counting problems. From the theoretical side, the number of solutions to a hard combinatorial problem often provides further insight into the problem. In general, model counting is what is known as #P-hard, and even checking if there is any input which satisfies ϕ is NP-hard. One is often interested in randomized approximation algorithms that approximate the number of models for a given 9. Obtaining such approximations can sometimes be done efficiently in practice with the aid of fast solvers for the Satisfiability problem (SAT).
Seemingly unrelated to model counting is the problem of computing over data streams. Here, an algorithm is given a long stream of elements, each drawn from a universe of possible items, and the goal is to estimate statistics of the stream using low memory with a single pass over the data stream. This setting captures network traffic monitoring. For example, a network router may see a high-speed sequence of packets going through it and may want to approximate the distribution of destination IP addresses it sees; unusual behavior may indicate a denial-of-service attack. Also, if one does not store the traffic coming through a network router, then it may be lost forever, as one cannot ask the Internet to rewind itself. A statistic of particular importance is the number F0 of distinct elements. Computing F0 exactly is known to require a prohibitive amount of memory. However, there are extremely practical approximation algorithms that use a very small amount of memory.
The notion of efficiency for model counting is very different than that for estimating the number of distinct elements in a data stream.
The notion of efficiency for model counting is very different than that for estimating the number of distinct elements in a data stream. In model counting, one seeks to minimize the time complexity, or the number of calls to a SAT solver. Contrastly, in the data stream model, the main goal is to minimize the space complexity, that is, the memory required.
The following paper gives a surprising connection between model counting and streaming, providing a generic transformation of data stream algorithms for F0 estimation to algorithms for model counting. The authors also show a partial converse, namely, by framing F0 estimation as a special case of model counting, the authors obtain a very general algorithm for F0 estimation and variants. The resulting algorithms can be used to select a minimum cost query plan in database design and are also a key tool for detecting denial-of-service attacks in network monitoring.
The starting point of the paper is the observation that a hashing-based technique for model counting1,3 uses the same techniques as an F0 estimation data stream algorithm.2 The idea behind both is to reduce the counting problem to a detection problem. For model counting, one chooses random subsets of possible solutions of geometrically varying size and checks if there is any satisfying assignment to ϕ in each subset. For F0 estimation in data streams, one chooses random subsets of universe items of geometrically varying size and checks if there is an item in one’s subset that occurs in the stream. In both cases, by finding the size of the smallest set for which there is a satisfying assignment (for model counting) or an element occurring in the stream (for F0 estimation), one can scale back up by the reciprocal of that set’s size to obtain a decent approximation to the number of solutions (for model counting) or number of distinct elements (for data streams).
Join the Discussion (0)
Become a Member or Sign In to Post a Comment