Home/Magazine Archive/December 2010 (Vol. 53, No. 12)/CoSaMP: Iterative Signal Recovery From Incomplete.../Full Text

Research highlights
## CoSaMP: Iterative Signal Recovery From Incomplete and Inaccurate Samples

Compressive sampling (CoSa) is a new paradigm for developing data sampling technologies. It is based on the principle that many types of vector-space data are *compressible*, which is a term of art in mathematical signal processing. The key ideas are that randomized dimension reduction preserves the information in a compressible signal and that it is possible to develop hardware devices that implement this dimension reduction efficiently. The main computational challenge in CoSa is to reconstruct a compressible signal from the reduced representation acquired by the sampling device. This extended abstract describes a recent algorithm, called, `CoSaMP`

, that accomplishes the data recovery task. It was the first known method to offer near-optimal guarantees on resource usage.

In many applications, the ambient dimension of a data vector does not reflect the actual number of degrees of freedom in that vector. In mathematical signal processing, this property is captured by the notion of a *compressible signal*. Natural images provide a concrete example of compressible signals because we can approximate them accurately just by summarizing the solid areas (local averages) and the edges (local differences). This representation allows us to compress images dramatically without a substantial loss of visual qualityan idea implemented in the JPEG image coders that we use every day.

*Sampling* is a generic term for the process of acquiring data about an object, either in the physical or the virtual world. There are many different technologies that we use to collect samples. A ubiquitous piece of sampling hardware is the CCD array inside a digital camera, which measures the average light intensity at each small square in the focal plane of the camera. Other sampling equipment includes X-ray machines, magnetic resonance imaging (MRI), and analog-to-digital converters.

In most cases, we use sampling to acquire as much information as possible about an object of interest. Indeed, when purchasing a digital camera, we often think that more pixels are better. But as soon as the image is captured, the camera invokes compression algorithms to reduce the amount of information it needs to store. This fact suggests an awkward question: if there is a limited amount of information in the object, why are we taking so many samples? Is there some way to obtain measurements that automatically sieve out the information in the object?

One way to exploit the gap between nominal dimension and degrees of freedom is to perform *dimension reduction*. This idea has a long tradition in computer science. Indeed, to solve geometric problems involving high-dimensional data, we often map the data to a lower dimension while approximately preserving the geometric properties that are relevant to the computation. One standard method for achieving this goal is to use a random projection to the data.

The main idea in *compressive sampling* (CoSa) is to develop sampling technologies based on dimension reduction. The mathematical foundation for this approach is the fact that an appropriate random projection of a compressible signal retains most of the information in that signal. The coordinates of this randomly projected signal are interpreted as samples, and the number of samples we need is comparable with the information content of the object we are sampling. Because we have a lot of flexibility when designing projections, we have flexibility in engineering the sampling hardware.

An intriguing example of a CoSa sensing device is the Rice University single-pixel camera. This camera uses a digital micromirror device (DMD) to selectively black out a random subset of pixels from an image and to measure the total intensity of the remaining pixels. By applying many random masks in sequence, we obtain a collection of samples that captures the information in the image.

Another key fact about CoSa is that we cannot use simple linear techniques to reconstruct the data vector from the random samples. Instead, we must develop more sophisticated algorithms. One approach that has received a lot of attention is convex programming based on _{1} minimization. This abstract describes an algorithm, called `CoSaMP`

, that is based on the greedy pursuit schema.^{18, 19} `CoSaMP`

was the first computational technique that provably achieved near-optimal guarantees on resource usage. Using the techniques from our paper, researchers later demonstrated that other algorithms offer comparable performance guarantees. Another algorithm similar to `CoSaMP`

, called Subspace Pursuit,^{9} was published at the same time, but the accompanying analysis was less complete.

The remainder of the article is organized as follows. Section 2 describes the mathematical framework for CoSa as well as the theoretical results for the `CoSaMP`

algorithm. Section 3 presents a description of the algorithm and its implementation. Section 4 discusses related work and a comparison to our method. We conclude in Section 5 with the analysis of the algorithm and sketch the proof of the main result.

We continue with a description of the mathematical model for signals, sampling technologies, and signal reconstruction algorithms. The main theoretical results for the `CoSaMP`

algorithm appear at the end of the section.

**2.1. Notation**

Let us instate some notation that is used throughout the paper. A *signal* is a vector **x***C*^{N}. For *p* [1, ], we write ||·||_{p} for the usual _{p} vector norm: . We reserve ||·|| for the spectral norm. For **x***C*^{N}, we write * x_{r}* for the signal in

For *T* {1, 2, ..., *N*}, we denote the restriction of the signal to the set *T* as * x*|

**2.2. Sparse and compressible signals**

We say that a signal is *s-sparse* when it has *s* *N* nonzero entries:

This definition encapsulates the idea that the signal contains far less information than its ambient dimension suggests. Observe that it takes roughly bits to write down the locations of the nonzero entries in an *s*-sparse signal in *C*^{N}.

While sparse signals are important for theory, they rarely arise in practice, so we consider the larger class of *compressible signals*. For *p* (0, 1), we say that * x* is

Compressible signals are well approximated by sparse signals. The smaller the number *p*, the better the approximation:

Let us emphasize that we do not know in advance *which* coordinates of a compressible signal are significantonly that few coordinates matter.

We limit our attention to signals that are sparse or compressible in the standard coordinate basis, although the ideas apply equally well to signals that are sparse or compressible with respect to other orthonormal bases. The generalization is important in applications. For instance, natural images are usually not compressible in the standard basis but they are compressible in an appropriate wavelet basis.

**2.3. Modeling the sampling operation**

We can model each of the sampling technologies mentioned in the introduction as a linear map from a signal of interest to a collection of samples. We use an *m × N* matrix to describe the sampling operation. Thus, we write ** u = x** to summarize the process of collecting a vector

**Restricted Isometry Constants:** In this section, to build intuition, we focus on the class of *s*-sparse signals. It is important to understand what properties of the sampling matrix ensure that *s*-sparse signals can be distinguished based on their samples. In other words, we need to make sure that is a linear embedding (i.e., a bijective structure-preserving map) of the set of *s*-sparse signals into *C*^{m}. A necessary and sufficient condition is that each 2*s*-column submatrix of the sampling matrix forms a linearly independent set. Certain Vandermonde matrices with *m* = 2*s* rows, arising from ReedSolomon codes, have this property. Unfortunately, these matrices contain nearly singular 2*s*-column submatrices, so the sampling matrix is not stably invertible on the image of the set of *s*-sparse signals. Noise and signal perturbations become a serious problem.

To avoid this difficulty, Candès and Tao^{4} insist that each 2*s*-column submatrix of the measurement matrix should be well conditioned. More formally, they define the *r*th *restricted isometry constant* of the matrix to be the least number _{r} such that

The condition _{2s} 1 implies that the measurement matrix preserves the geometry (with respect to the Euclidean metric) of the set of all *s*-sparse signals. This condition is sufficient to ensure that sampling is stably invertible. It has become standard practice in CoSa to assume that the measurement matrix satisfies a condition of this type.

**How Many Samples?**: The next question is *how many* samples *m* do we need to ensure that the sampling map embeds the set of *s*-sparse signals into *C*^{m}. Because an *s*-sparse signal contains only about *s* log(*N/s*) bits of information, we hope that *m* = O(*s* log(*N/s*)) samples suffice for the embedding. In fact, many random sampling schemes allow us to achieve this sampling rate, or one that is just slightly worse. (It is *necessary* in some sense to take at least this many samples, so these sampling schemes are optimal.) Furthermore, there are technologies that implement these sampling schemes. Here are two examples.

A *random sign matrix* has independent, identically distributed entries that take the values ±1 with equal probability. Its restricted isometry constant^{5} satisfies

In words, we can embed the set of *s*-sparse signals into *m* dimensions, where *m* is proportional to the sparsity and *logarithmic* in the ambient dimension. A random sign matrix can be used to construct the random masks required by the Rice single-pixel camera.

The *subsampled Fourier matrix* consists of a collection of *m* rows chosen uniformly at random from the discrete Fourier transform matrix. The best theoretical bound for the restricted isometry constant^{23} is

It is conjectured that the actual scaling is log(*N*). The subsampled Fourier matrix describes measurements that can be acquired by an MRI machine. Note that a subsampled Fourier matrix can be applied to a vector using O(*N* log *N*) arithmetic operations, and it requires only O(*m* log *N*) bits of storage. These computational properties are very important in practice, as we will see.

Certain types of deterministic matrices also satisfy bounds on their restricted isometry constants. At present, all known examples require that the number *m* of samples satisfy . Randomness seems to provide a significant advantage when designing schemes for CoSa.

**2.4. Signal reconstruction via CoSaMP**

When the sampling matrix stably embeds the set of *s*-sparse signals, it is possible in principle to recover an arbitrary *s*-sparse signal from its samples. The major computational question in CoSa is to determine how to reconstruct the sparse signal *using a tractable algorithm*. For practical applications, we also need to understand how the reconstruction is affected when the signal is merely compressible and the samples are contaminated by noise. The following issues are crucial:

- Can the algorithm recover the signal under a variety of sampling schemes? In other words, is the class of measurement maps for which the algorithm can succeed large?
- Can the algorithm reconstruct the signal from a minimal number
*m*of samples? - Is signal recovery robust when samples are contaminated with noise or the signal is not exactly sparse? Is the error bound optimal for each target signal?
- Does the algorithm offer provably efficient computational cost and storage?

This abstract discusses a reconstruction algorithm called Compressive Sampling Matching Pursuit (`CoSaMP`

) that satisfies all of the foregoing criteria. Our original publications^{18, 19} were the first to prove that a CoSa signal reconstruction algorithm could achieve essentially optimal resource usage.

**Properties of the CoSaMP Algorithm:** Let us state and explain our main result. We postpone the details of the algorithm and the analysis until later.

THEOREM A (`CoSaMP`

). *Suppose that is an m × N sampling matrix with restricted isometry constant _{2s} c. Let u = x + e be a vector of samples of an arbitrary signal, contaminated with arbitrary noise*.

*The* `CoSaMP`

*algorithm takes as input a routine to multiply and * by a vector, the vector u of samples, the sparsity parameter s, and a precision parameter . The output is an s-sparse signal approximation a that satisfies*

*The running time is* O( .log(||** x**||

How does this result deliver the desired performance?

- The algorithm requires only that the matrix satisfy a bound on the restricted isometry constant. Many types of random matrices (and certain deterministic matrices) have this property. As a consequence, the algorithm can be used with many measurement technologies.
- The number
*m*of samples required to obtain the bound_{2s}*c*depends on the type of sampling matrix. For many types of random matrices,*m*= O(*s*log) for a small integer . Since this value is optimal up to the constants, the algorithm can recover signals from a minimal number of samples.^{}N - The error bound demonstrates that the algorithm succeeds for all signals, even when there is noise in the samples. As we expect, the algorithm performs best for sparse and compressible signals. Indeed, Equation 2 shows that
`CoSaMP`

produces an approximationof a**a***p*-compressible signalthat satisfies**x**

which is the best possible approximation error we could hope for in general.^{6} - The computational cost of the algorithm depends on the type of signals we are approximating. For a compressible signal, we may take the precision parameter
*= R · s*^{1/2-1/p}to see that the number of iterations required to produce a minimal approximation error is at most O(log*s*). See Section 1.2 of Needell and Tropp^{19}for details.

The running time also depends on the type of sampling matrix we use. In particular, we can apply a subsampled Fourier matrix to a vector using = O(*N*log*N*) operations.

It follows that if we acquire a compressible signal with a structured sampling matrix, we can perform signal recovery in time O(*N*log^{2}*N*). This runtime bound is essentially optimal when the sparsity level is roughly the same order as the dimension (which is the setting in most practical problems).

The `CoSaMP`

algorithm is, at heart, a greedy iterative method for reconstructing a signal from compressive samples. This section provides an overview of the algorithm and its implementation. We also explain the basic idea behind the analysis of the algorithm, which demonstrates that each iteration reduces the error in the current signal approximation.

**3.1. Description of algorithm**

The input to the algorithm consists of (access to) the sampling operator , the samples ** u**, the target sparsity level

- The sparsity level
*s*is fixed, and the*m × N*sampling operator has restricted isometry constant_{4s}0.1. - The signal
**x***C*^{N}is arbitrary, except where noted. The noise vector**e***C*^{m}is also arbitrary. - The vector of samples
.*u = x + e*

The algorithm is initialized with a trivial signal estimate, meaning that the initial residual is the entire unknown target signal. Each iteration then consists of five major steps.

**Identification**. Using the current samples, the algorithm computes a vector that is highly correlated with the signal, called the signal proxy. From the proxy, components of the signal that carrry a lot of energy are located.**Support Merger**. The set of newly identified components is united with the set of components that appear in the current approximation.**Estimation**. The algorithm solves a least-squares problem to approximate the target signal on the merged set of components.**Pruning**. The algorithm produces a new approximation by retaining only the largest entries in this leastsquares signal approximation.**Sample Update**. Finally, the samples are updated so that they reflect the residual, the part of the signal that has not been approximated.

These five steps are repeated until the halting criterion is satisfied. In our analysis, we assume that the method uses a fixed number of iterations, but Needell and Tropp^{18, 19} discuss other alternatives. The `CoSaMP`

algorithm can be summarized by the pseudocode presented as Algorithm 1.

REMARK 1. *We emphasize that the method we present is a specific example from a more general* framework. *The articles ^{18, 19} discuss a number of variations. In particular, note that the term 2s in the identification step can be replaced by s for other values of . This type of tuning is valuable in practice, but it makes the proof more complicated*.

REMARK 2. *Some knowledge about the sparsity level is required as input to the algorithm. There are several strategies for estimating this parameter if it is not known a priori. One such method would be to use the number m of measurements to deduce the sparsity level. Since m* 2*s* log *N is often sufficient, the estimate s m*/2 log *N is reasonable. A second approach would be to run the* `CoSaMP`

*algorithm using a variety of sparsity levels and select the best approximation a obtained using the empirical error ||a - u||_{2}. If one varies s according to a geometric progression (e.g., s = 1, 2, 4, 8, ..., m), this variation increases the runtime by at most* O(log

**3.2. Implementation**

`CoSaMP`

was designed to be a practical algorithm for signal recovery, so it is worth spending some time on implementation issues. The least-squares step in the algorithm presents the largest contribution to the runtime. Fortunately, we have constructed _{T} to ensure it is well conditioned, so we can apply its pseudoinverse quickly using an iterative method. Section 5 of Needell and Tropp^{19} contains a thorough analysis of iterative least-squares methods as modules in `CoSaMP`

. This analysis shows that the cost of solving the least-squares problem is O(), where bounds the cost of a matrixvector multiply with _{T} or .

The remaining steps of the algorithm involve standard techniques, and we can estimate the operation counts as follows.

**Proxy**. Forming the proxy is dominated by the cost of the matrixvector multiply **v*.

**Identification**. We can locate the largest 2*s* entries of a vector in time O(*N*) using the approach in Cormen et al.^{7, Ch. 9}. In practice, it may be faster to use an O(*N* log *N*) sorting algorithm (e.g., quicksort, mergesort, heapsort^{7, Sec. 11}) on the entries of the signal and then select the first 2*s* of them.

**Support Merger**. We can merge two support sets of size O(*s*) in expected time O(*s*) via randomized hashing methods,^{7, Ch. 11} or by sorting both sets first and using the elementary merge procedure^{7, p. 29} for a total cost O(*s* log *s*).

**LS estimation**. We can use the conjugate gradient method or the `LSQR`

algorit (see e.g. Paige and Saunders^{22}) to compute ** u**. Initializing the least-squares algorithm requires a matrixvector multiply with , and each iteration requires one matrixvector multiply each with

**Pruning**. As in the identification step, pruning can be implemented in time O(*s*), but it may be preferable to sort the components of the vector and then select the first *s* at a cost of O(*s* log *s*).

**Sample Update**. This step is dominated by the cost of the multiplication of with the *s*-sparse vector * a^{k}*.

Table 1 summarizes the cost of each step for the cases in which the standard and fast multiply is used.

Finally, we may analyze the storage requirements of the algorithm. Aside from the sampling matrix storage requirement, the algorithm constructs only one vector of length *N*, the signal proxy. The sample vectors * u* and

The following result summarizes these observations.

THEOREM 1 (RESOURCE REQUIREMENTS). *Each iteration of* `CoSaMP`

*requires* O() *time, where bounds the cost of a multiplication with the matrix or *. The algorithm uses working storage* O(*N*).

**3.3. Error reduction bound**

The theoretical analysis of the `CoSaMP`

algorithm is based on an estimate for how much the algorithm reduces the approximation error in each iteration. This result demonstrates that the algorithm makes significant progress whenever the error is substantially larger than a certain baseline value. We call this baseline the *unrecoverable energy v* in the signal:

The unrecoverable energy is due to the noise in the samples and the fact that the signal is not exactly sparse. For a detailed discussion, see Section 2.5 of Needell and Tropp.^{19}

The main technical result is the following iteration invariant, which we establish in Section 5.

THEOREM 2 (ERROR REDUCTION). *For each iteration k 0, the signal approximation a^{k} is s-sparse and*

*In particular*,

The main consequence of Theorem 2 is the fact that, after log_{2}(||** x**||

The statement of Theorem A depends on a simple upper bound for the _{2} term in the unrecoverable energy:

This estimate is a consequence of Lemma 7 of Gilbert.^{15}

Finally, we can replace the assumption that _{4s} 0.1 by a stronger bound on _{2s} because of Corollary 3.4 in Needell and Tropp,^{19} which states that _{cr} *c* · _{2r} for any positive integers *c* and *r*.

`CoSaMP`

is inspired by algorithmic ideas and analytic techniques that have appeared previously. We briefly summarize the major algorithmic approaches and compare them with `CoSaMP`

. This work can be classified in three rough categories: convex relaxation,^{4, 11} greedy pursuits,^{12, 20, 24} and combinatorial algorithms.^{8, 1416}

The convex optimization methods^{1, 10} attempt to reconstruct signals by solving the mathematical program

where we assume that ||* e*||

whenever has restricted isometry constant _{4s} 0.2. These restricted isometry constant requirements continue to be improved.^{2, 13} The error bound for `CoSaMP`

is equivalent with Equation 7, up to the precise value of the constants.

Many algorithms have been proposed to optimize Equation 6. In particular, interior-point methods^{1, 17} are guaranteed to solve the problem to a fixed precision in O(*m*^{2}*N*^{1.5}) time.^{21} The runtime for `CoSaMP`

is much better than these interior-point methods.

Greedy algorithms such as OMP,^{24} StOMP,^{12} and ROMP^{20} are iterative methods for approximating a signal from compressive samples. In each iteration, they use a greedy rule to identify new elements in the support of the signal. These methods provide fast runtimes. On the other hand, to the extent that their behavior is understood, greedy algorithms require more measurements or produce worse errors than the convex optimization Equation 6.

Tropp and Gilbert proposed the greedy algorithm *orthogonal matching pursuit* (OMP) for signal recovery.^{24} OMP is similar to `CoSaMP`

in that it uses a signal proxy to select large components of the target signal, but it selects one such component at each iteration. However, it does not provide the same uniform guarantees as convex relaxation, and it is unknown whether it succeeds in the noisy setting.

Other algorithms based on OMP have been formulated, such as *regularized OMP*, or ROMP,^{20} developed by Needell and Vershynin. ROMP is noteworthy because the authors establish that under the restricted isometry property, the algorithm can approximately recover compressible signals from noisy samples. The error bound and measurement requirements provided by these results are analogous with those of the convex optimization method, modulo parasitic logarithmic terms. Our results for `CoSaMP`

remove all the extraneous factors, so the performance of `CoSaMP`

is essentially optimal.

Finally, we mention that there is a class of algorithms for signal recovery that have sublinear (in the signal dimension) runtime. Some examples are the Fourier sampling algorithm (FS) of Gilbert et al.,^{16} chaining pursuit,^{14} and HHS pursuit.^{15} These methods are very fast, but they require highly structured measurements.

Table 2 summarizes the behavior of the above algorithms in terms of the following five criteria.

**General samples**. Does the algorithm succeed for a variety of sampling schemes, or does it require structured samples? The designation "RIP" implies that a bound on a restricted isometry constant suffices. "Gauss" means that the algorithm succeeds when the sampling matrix has iid subgaussian entries.

**Optimal number of samples**. Can the algorithm reconstruct *s*-sparse signals from a near-optimal number of measurements, *m* = O(*s* log *N*)? Or are its sampling requirements higher (by logarithmic factors)?

**Uniformity**. Given a fixed sampling matrix, does the algorithm recover all signals? Or do the results hold with high probability only for each fixed signal?

**Stability**. Does the algorithm succeed when the signal is compressible (but not sparse) and the samples are noisy?

**Running time**. In the worst case (without fast multiplies), how long does it take the algorithm to recover a real-valued *s*-sparse signal within a fixed relative precision, given a sampling matrix with no special structure? The designation LP(*N, m*) represents the cost of solving a linear program with *N* variables and *m* constraints (O(*m*^{2}*N*^{1.5}) using an interior-point method).

Under all of these metrics, `CoSaMP`

achieves the best performance out of the linear and superlinear methods. Of course, `CoSaMP`

is slower than the sublinear algorithms, but it allows more general sampling matrices and demands fewer samples, which makes it more adaptable to practical applications. `CoSaMP`

delivers the same guarantees as the best convex optimization approaches as well as rigorous bounds on computational cost and storage that are comparable with faster greedy methods. Thus, `CoSaMP`

is essentially optimal in every important respect.

The `CoSaMP`

algorithm uses an approach inspired by the restricted isometry property to identify the largest *s* components of the target signal. Owing to the restricted isometry property, each set of *s* components of the proxy vector * y* = *

The algorithm repeats this idea at each iteration, updating the samples to reflect the residual (the part of the signal yet to be approximated). We use least squares to estimate the signal on this support set and repeat this process until the recoverable energy in the signal has been found.

We now outline the proof of Theorem 2.

**5.1. Consequences of the RIP**

We begin with some important consequences of the restricted isometry property. Omitted proofs appear in Needell and Tropp.^{19}

PROPOSITION 1 (CONSEQUENCES). *Suppose has restricted isometry constant _{r}. Let T be a set of r indices or fewer. Then*

*where the last two statements contain an upper and lower bound, depending on the sign chosen*.

A corollary of these bounds is the fact that disjoint sets of columns from the sampling matrix span nearly orthogonal subspaces.

PROPOSITION 2 (ORTHOGONALITY). *Suppose has restricted isometry constant _{r}. Let S and T be disjoint sets of indices whose combined cardinality does not exceed r. Then*

We apply this result through a corollary.

COROLLARY 1. *Suppose has restricted isometry constant _{r}. Let T be a set of indices, and let*

The last consequence of the restricted isometry property that we will need measures how much the sampling matrix inflates nonsparse vectors. Its proof uses arguments from functional analysis.

PROPOSITION 3 (ENERGY BOUND). *Suppose verifies the upper inequality of* (3), *viz*.

*Then, for every signal x*,

**5.2. Iteration invariant: sparse case**

In proving Theorem 2, we first operate under the assumption that the signal * x* is exactly

THEOREM 3 (SPARSE ERROR REDUCTION). *Assume x is s-sparse. For each k 0, the signal approximation a^{k} is s-sparse, and*

*In particular*,

REMARK 3. *This bound assumes the least-squares step in the algorithm is performed without error. In Section 5 of Needell and Tropp, ^{19} we study how many iterations of a least-squares solver are needed to make the resulting error negligible. We show that, if we use conjugate gradient for the estimation step of*

`CoSaMP`

The proof of the iteration invariant, Theorem 3 involves a sequence of short lemmas, one for each step in the algorithm. We fix an iteration *k*, and let * a = a^{k-1}* be the signal approximation at the beginning of the iteration. We define the residual as

The first step in the algorithm is the identification step, in which a set of components is found corresponding to locations where the residual signal has a lot of energy. This is summarized by the following lemma which is proven in Needell and Tropp.^{19}

LEMMA 1 (IDENTIFICATION). *The set* W = supp(**y**_{2s}), *where y = *v is the signal proxy, contains at most* 2

Next, the algorithm merges the support of the current estimate * a* with the new selection of components. The following simple lemma shows that components of the signal

LEMMA 2 (SUPPORT MERGER). *Let* W *be a set of at most* 2*s indices. The set T* = W supp(* a*)

The estimation step in the algorithm obtains values for coefficients in the set *T* by solving a least-squares problem. The next result bounds the error of this approximation.

LEMMA 3 (ESTIMATION^{19}). *Let T be a set of at most* 3*s indices, and define the least-squares signal estimate b by the formulae*

*where u* =

*Proof*. Note first that

Using the expression * u* =

The cardinality of *T* is at most 3*s*, and * x* is

Combine the bounds to reach

Finally, invoke the hypothesis that _{3s} _{4s} 0.1.

Lastly, the algorithm prunes the intermediate estimation to its largest *s* terms. The triangle inequality bounds the error in this pruned approximation.

LEMMA 4 (PRUNING^{19}). *The pruned approximation b_{s} satisfies*

At the end of the iteration, the new approximation is formed by setting * a^{k}* =

To obtain the second bound in Theorem 3, we solve the error recursion and observe that

This point completes the argument.

**5.3. Extension to general signals**

We now turn to the proof of the main result for `CoSaMP`

, Theorem A. We must remove the assumption that the signal ** x** is sparse. Although this job seems difficult at first, it turns out to have an elegant solution. We can view the noisy samples of a generic signal as samples of a sparse signal corrupted by additional noise that reflects the tail of the signal. We have the following reduction to the sparse case, of which Theorem 2 is a consequence.

LEMMA 5 (REDUCTION TO SPARSE CASE^{19}). *Let x be an arbitrary vector in*

This reduction will now allow us to prove the iteration invariant for general signals, Theorem 2. Let * x* be a general signal. By Lemma 5, we can write the noisy vector of samples as . Applying the sparse iteration invariant, Theorem 3, we obtain

We then invoke the lower and upper triangle inequalities to obtain

Finally, recalling the estimate for from Lemma 5 and simplifying, we reach

where *v* is the unrecoverable energy (Equation 4).

1. Candès E, Romberg J., Tao T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete Fourier information. *IEEE Trans. Info. Theory 52*, 2 (Feb. 2006), 489509.

2. Candès, E. J. The restricted isometry property and its implications for compressed sensing. *C. R. Math. Acad. Sci. Paris, Serie I 346* (2008), U589U592.

3. Candès, E. J., Romberg, J., Tao, T. Stable signal recovery from incomplete and inaccurate measurements. *Commun. Pur. Appl. Math. 59*, 8 (2006), 12071223.

4. Candès, E. J., Tao, T. Decoding by linear programming. *IEEE Trans. Info. Theory 51* (2005), 42034215.

5. Candès, E. J., Tao, T. Near optimal signal recovery from random projections: Universal encoding strategies? *IEEE Trans. Info. Theory 52*, 12 (Dec. 2006), 54065425.

6. Cohen, A., Dahmen, W., DeVore, R. Compressed sensing and best k-term approximation. *J. Am. Math. Soc. 22*, 1 (2009) 211231.

7. Cormen, T., Lesierson, C., Rivest, L., Stein, C. *Introduction to Algorithms*, 2nd edition, MIT Press, Cambridge, MA, 2001.

8. Cormode, G., Muthukrishnan, S. Combinatorial algorithms for compressed sensing. Technical report, DIMACS, 2005.

9. Dai, W., Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. *IEEE Trans. Info. Theory 55*, 5 (2009).

10. Donoho, D. L. Compressed sensing. *IEEE Trans. Info. Theory 52*, 4 (Apr. 2006), 12891306.

11. Donoho, D. L., Stark, P. B. Uncertainty principles and signal recovery. *SIAM J. Appl. Math. 49*, 3 (June 1989) 906931.

12. Donoho, D. L., Tsaig, Y., Drori, I., Starck, J.-L. Sparse solution of underdetermined linear equations by stagewise Orthogonal Matching Pursuit (StOMP). Submitted for publication (2007)

13. Foucart, S. A note on guaranteed sparse recovery via _{1}-minimization. *Appl. Comput. Harmon. Anal*. (2010). To appear.

14. Gilbert, A., Strauss M., Tropp J., Vershynin R. Algorithmic linear dimension reduction in the _{1} norm for sparse vectors. In *Proceedings of the 44th Annual Allerton Conference on Communication, Control and Computing* (Sept. 2006).

15. Gilbert, A., Strauss, M., Tropp, J., Vershynin, R. One sketch for all: fast algorithms for compressed sensing. In *Proceedings of the 39th ACM Symposium. Theory of Computing* (San Diego, June 2007).

16. Gilbert, A. C., Guha, S., Indyk, P., Muthukrishnan, S., Strauss, M. J. Near-optimal sparse Fourier representations via sampling. In *Proceedings of the 2002 ACM Symposium on Theory of Computing STOC* (2002), 152161.

17. Kim, S.-J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D. An interiorpoint method for large-scale _{1}-regularized least squares. *IEEE J. Sel. Top. Signa. 1*, 4 (2007) 606617.

18. Needell, D., Tropp, J. A. CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. ACM Technical Report 200801, California Institute of Technology, Pasadena, July 2008.

19. Needell, D., Tropp, J. A. CoSaMP: Iterative signal recovery from noisy samples. *Appl. Comput. Harmon. Anal. 26*, 3 (2008), 301321.

20. Needell, D., Vershynin, R. Signal recovery from incomplete and inaccurate measurements via regularized orthogonal matching pursuit. *IEEE J. Sel. Top. Signa*. (2007). To appear.

21. Nesterov, Y. E., Nemirovski, A. S. *Interior Point Polynomial Algorithms in Convex Programming*. SIAM, Philadelphia, 1994.

22. Paige, C. C., Saunders, M. A. LSQR: An algorithm for sparse linear equations and sparse least squares. *TOMS 8*, 1 (1982), 4371.

23. Rudelson, M., Vershynin, R. Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements. In *Proceedings of the 40th Annual Conference on Info. Sciences and Systems*, Princeton, Mar. 2006.

24. Tropp, J. A., Gilbert, A. C. Signal recovery from random measurements via Orthogonal Matching Pursuit. *IEEE Trans. Info. Theory 53*, 12 (2007), 46554666.

The original version of this paper appeared in *Applied and Computational Harmonic Analysis 26*, 3 (2008), 301321.

DOI: http://doi.acm.org/10.1145/1859204.1859229

**©2010 ACM 0001-0782/10/1200 $10.00**

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.

No entries found