Compressive sampling (CoSa) is a new paradigm for developing data sampling technologies. It is based on the principle that many types of vector-space data are *compressible*, which is a term of art in mathematical signal processing. The key ideas are that randomized dimension reduction preserves the information in a compressible signal and that it is possible to develop hardware devices that implement this dimension reduction efficiently. The main computational challenge in CoSa is to reconstruct a compressible signal from the reduced representation acquired by the sampling device. This extended abstract describes a recent algorithm, called, `CoSaMP`

, that accomplishes the data recovery task. It was the first known method to offer near-optimal guarantees on resource usage.

### 1. What is Compressive Sampling?

In many applications, the ambient dimension of a data vector does not reflect the actual number of degrees of freedom in that vector. In mathematical signal processing, this property is captured by the notion of a *compressible signal*. Natural images provide a concrete example of compressible signals because we can approximate them accurately just by summarizing the solid areas (local averages) and the edges (local differences). This representation allows us to compress images dramatically without a substantial loss of visual quality—an idea implemented in the JPEG image coders that we use every day.

*Sampling* is a generic term for the process of acquiring data about an object, either in the physical or the virtual world. There are many different technologies that we use to collect samples. A ubiquitous piece of sampling hardware is the CCD array inside a digital camera, which measures the average light intensity at each small square in the focal plane of the camera. Other sampling equipment includes X-ray machines, magnetic resonance imaging (MRI), and analog-to-digital converters.

In most cases, we use sampling to acquire as much information as possible about an object of interest. Indeed, when purchasing a digital camera, we often think that more pixels are better. But as soon as the image is captured, the camera invokes compression algorithms to reduce the amount of information it needs to store. This fact suggests an awkward question: if there is a limited amount of information in the object, why are we taking so many samples? Is there some way to obtain measurements that automatically sieve out the information in the object?

One way to exploit the gap between nominal dimension and degrees of freedom is to perform *dimension reduction*. This idea has a long tradition in computer science. Indeed, to solve geometric problems involving high-dimensional data, we often map the data to a lower dimension while approximately preserving the geometric properties that are relevant to the computation. One standard method for achieving this goal is to use a random projection to the data.

The main idea in *compressive sampling* (CoSa) is to develop sampling technologies based on dimension reduction. The mathematical foundation for this approach is the fact that an appropriate random projection of a compressible signal retains most of the information in that signal. The coordinates of this randomly projected signal are interpreted as samples, and the number of samples we need is comparable with the information content of the object we are sampling. Because we have a lot of flexibility when designing projections, we have flexibility in engineering the sampling hardware.

An intriguing example of a CoSa sensing device is the Rice University single-pixel camera. This camera uses a digital micromirror device (DMD) to selectively black out a random subset of pixels from an image and to measure the total intensity of the remaining pixels. By applying many random masks in sequence, we obtain a collection of samples that captures the information in the image.

Another key fact about CoSa is that we cannot use simple linear techniques to reconstruct the data vector from the random samples. Instead, we must develop more sophisticated algorithms. One approach that has received a lot of attention is convex programming based on
_{1} minimization. This abstract describes an algorithm, called `CoSaMP`

, that is based on the greedy pursuit schema.^{18, 19} `CoSaMP`

was the first computational technique that provably achieved near-optimal guarantees on resource usage. Using the techniques from our paper, researchers later demonstrated that other algorithms offer comparable performance guarantees. Another algorithm similar to `CoSaMP`

, called Subspace Pursuit,^{9} was published at the same time, but the accompanying analysis was less complete.

The remainder of the article is organized as follows. Section 2 describes the mathematical framework for CoSa as well as the theoretical results for the `CoSaMP`

algorithm. Section 3 presents a description of the algorithm and its implementation. Section 4 discusses related work and a comparison to our method. We conclude in Section 5 with the analysis of the algorithm and sketch the proof of the main result.

### 2. The Mathematics of CoSa

We continue with a description of the mathematical model for signals, sampling technologies, and signal reconstruction algorithms. The main theoretical results for the `CoSaMP`

algorithm appear at the end of the section.

Let us instate some notation that is used throughout the paper. A *signal* is a vector **x***C*^{N}. For *p*
[1, ∞], we write ||·||_{p} for the usual
_{p} vector norm:
. We reserve ||·|| for the spectral norm. For **x***C*^{N}, we write * x_{r}* for the signal in

*C*

^{N}that is formed by restricting

*to its*

**x***r*largest-magnitude components (with ties broken lexicographically).

For *T* ⊂ {1, 2, …, *N*}, we denote the restriction of the signal to the set *T* as * x*|

_{T}, which is the vector equal to

*on*

**x***T*and 0 elsewhere. We occasionally abuse the notation and treat

*|*

**x**_{T}as an element of

*C*

^{T}. We also define the restriction Φ

_{T}of the sampling matrix Φ to be the column submatrix whose columns are listed in the index set

*T*. Finally, we define the pseudoinverse of a tall, full-rank matrix

*by the formula*

**A**

*A*^{†}= (

**)**

*A*A*^{−1}

**.**

*A**
**2.2. Sparse and compressible signals**

We say that a signal is *s-sparse* when it has *s* ≪ *N* nonzero entries:

This definition encapsulates the idea that the signal contains far less information than its ambient dimension suggests. Observe that it takes roughly
bits to write down the locations of the nonzero entries in an *s*-sparse signal in *C*^{N}.

While sparse signals are important for theory, they rarely arise in practice, so we consider the larger class of *compressible signals*. For *p*
(0, 1), we say that * x* is

*p-compressible*with magnitude

*R*if its components taken in sorted order obey

Compressible signals are well approximated by sparse signals. The smaller the number *p*, the better the approximation:

Let us emphasize that we do not know in advance *which* coordinates of a compressible signal are significant—only that few coordinates matter.

We limit our attention to signals that are sparse or compressible in the standard coordinate basis, although the ideas apply equally well to signals that are sparse or compressible with respect to other orthonormal bases. The generalization is important in applications. For instance, natural images are usually not compressible in the standard basis but they are compressible in an appropriate wavelet basis.

**2.3. Modeling the sampling operation**

We can model each of the sampling technologies mentioned in the introduction as a linear map from a signal of interest to a collection of samples. We use an *m × N* matrix Φ to describe the sampling operation. Thus, we write ** u = Φx** to summarize the process of collecting a vector

**of**

*u**m*samples from an

*N*-dimensional signal

**.**

*x***Restricted Isometry Constants:** In this section, to build intuition, we focus on the class of *s*-sparse signals. It is important to understand what properties of the sampling matrix Φ ensure that *s*-sparse signals can be distinguished based on their samples. In other words, we need to make sure that Φ is a linear embedding (i.e., a bijective structure-preserving map) of the set of *s*-sparse signals into *C*^{m}. A necessary and sufficient condition is that each 2*s*-column submatrix of the sampling matrix Φ forms a linearly independent set. Certain Vandermonde matrices with *m* = 2*s* rows, arising from ReedSolomon codes, have this property. Unfortunately, these matrices contain nearly singular 2*s*-column submatrices, so the sampling matrix is not stably invertible on the image of the set of *s*-sparse signals. Noise and signal perturbations become a serious problem.

To avoid this difficulty, Candès and Tao^{4} insist that each 2*s*-column submatrix of the measurement matrix should be well conditioned. More formally, they define the *r*th *restricted isometry constant* of the matrix Φ to be the least number δ_{r} such that

The condition δ_{2s} ≪ 1 implies that the measurement matrix Φ preserves the geometry (with respect to the Euclidean metric) of the set of all *s*-sparse signals. This condition is sufficient to ensure that sampling is stably invertible. It has become standard practice in CoSa to assume that the measurement matrix satisfies a condition of this type.

**How Many Samples?**: The next question is *how many* samples *m* do we need to ensure that the sampling map embeds the set of *s*-sparse signals into *C*^{m}. Because an *s*-sparse signal contains only about *s* log(*N/s*) bits of information, we hope that *m* = O(*s* log(*N/s*)) samples suffice for the embedding. In fact, many random sampling schemes allow us to achieve this sampling rate, or one that is just slightly worse. (It is *necessary* in some sense to take at least this many samples, so these sampling schemes are optimal.) Furthermore, there are technologies that implement these sampling schemes. Here are two examples.

A *random sign matrix* has independent, identically distributed entries that take the values ±1 with equal probability. Its restricted isometry constant^{5} satisfies

In words, we can embed the set of *s*-sparse signals into *m* dimensions, where *m* is proportional to the sparsity and *logarithmic* in the ambient dimension. A random sign matrix can be used to construct the random masks required by the Rice single-pixel camera.

The *subsampled Fourier matrix* consists of a collection of *m* rows chosen uniformly at random from the discrete Fourier transform matrix. The best theoretical bound for the restricted isometry constant^{23} is

It is conjectured that the actual scaling is
log(*N*). The subsampled Fourier matrix describes measurements that can be acquired by an MRI machine. Note that a subsampled Fourier matrix can be applied to a vector using O(*N* log *N*) arithmetic operations, and it requires only O(*m* log *N*) bits of storage. These computational properties are very important in practice, as we will see.

Certain types of deterministic matrices also satisfy bounds on their restricted isometry constants. At present, all known examples require that the number *m* of samples satisfy
. Randomness seems to provide a significant advantage when designing schemes for CoSa.

**2.4. Signal reconstruction via CoSaMP**

When the sampling matrix Φ stably embeds the set of *s*-sparse signals, it is possible in principle to recover an arbitrary *s*-sparse signal from its samples. The major computational question in CoSa is to determine how to reconstruct the sparse signal *using a tractable algorithm*. For practical applications, we also need to understand how the reconstruction is affected when the signal is merely compressible and the samples are contaminated by noise. The following issues are crucial:

- Can the algorithm recover the signal under a variety of sampling schemes? In other words, is the class of measurement maps for which the algorithm can succeed large?
- Can the algorithm reconstruct the signal from a minimal number
*m*of samples? - Is signal recovery robust when samples are contaminated with noise or the signal is not exactly sparse? Is the error bound optimal for each target signal?
- Does the algorithm offer provably efficient computational cost and storage?

This abstract discusses a reconstruction algorithm called Compressive Sampling Matching Pursuit (`CoSaMP`

) that satisfies all of the foregoing criteria. Our original publications^{18, 19} were the first to prove that a CoSa signal reconstruction algorithm could achieve essentially optimal resource usage.

**Properties of the CoSaMP Algorithm:** Let us state and explain our main result. We postpone the details of the algorithm and the analysis until later.

THEOREM A (`CoSaMP`

). *Suppose that Φ is an m × N sampling matrix with restricted isometry constant δ _{2s} ≤ c. Let u = Φx + e be a vector of samples of an arbitrary signal, contaminated with arbitrary noise*.

*The* `CoSaMP`

*algorithm takes as input a routine to multiply Φ and Φ* by a vector, the vector u of samples, the sparsity parameter s, and a precision parameter η. The output is an s-sparse signal approximation a that satisfies*

*The running time is* O(ℒ .log(||** x**||

_{2}/η)),

*where ℒ bounds the cost of a matrixvector multiply with Φ or Φ*. Working storage is*O(

*N*).

*The constants c*< 1

*and C*> 1

*are absolute*.

How does this result deliver the desired performance?

- The algorithm requires only that the matrix satisfy a bound on the restricted isometry constant. Many types of random matrices (and certain deterministic matrices) have this property. As a consequence, the algorithm can be used with many measurement technologies.
- The number
*m*of samples required to obtain the bound δ_{2s}≤*c*depends on the type of sampling matrix. For many types of random matrices,*m*= O(*s*log) for a small integer^{α}N*α*. Since this value is optimal up to the constants, the algorithm can recover signals from a minimal number of samples. - The error bound demonstrates that the algorithm succeeds for all signals, even when there is noise in the samples. As we expect, the algorithm performs best for sparse and compressible signals. Indeed, Equation 2 shows that
`CoSaMP`

produces an approximationof a**a***p*-compressible signalthat satisfies**x**

which is the best possible approximation error we could hope for in general.^{6} - The computational cost of the algorithm depends on the type of signals we are approximating. For a compressible signal, we may take the precision parameter
*η = R · s*^{1/2-1/p}to see that the number of iterations required to produce a minimal approximation error is at most O(log*s*). See Section 1.2 of Needell and Tropp^{19}for details.

The running time also depends on the type of sampling matrix we use. In particular, we can apply a subsampled Fourier matrix to a vector using ℒ = O(*N*log*N*) operations.

It follows that if we acquire a compressible signal with a structured sampling matrix, we can perform signal recovery in time O(*N*log^{2}*N*). This runtime bound is essentially optimal when the sparsity level is roughly the same order as the dimension (which is the setting in most practical problems).

### 3. The CoSaMP Algorithm

The `CoSaMP`

algorithm is, at heart, a greedy iterative method for reconstructing a signal from compressive samples. This section provides an overview of the algorithm and its implementation. We also explain the basic idea behind the analysis of the algorithm, which demonstrates that each iteration reduces the error in the current signal approximation.

The input to the algorithm consists of (access to) the sampling operator Φ, the samples ** u**, the target sparsity level

*s*, and a halting criterion. We impose the following hypotheses:

- The sparsity level
*s*is fixed, and the*m × N*sampling operator Φ has restricted isometry constant δ_{4s}≤ 0.1. - The signal
**x***C*^{N}is arbitrary, except where noted. The noise vector**e***C*^{m}is also arbitrary. - The vector of samples
.*u = Φx + e*

The algorithm is initialized with a trivial signal estimate, meaning that the initial residual is the entire unknown target signal. Each iteration then consists of five major steps.

**Identification**. Using the current samples, the algorithm computes a vector that is highly correlated with the signal, called the signal proxy. From the proxy, components of the signal that carrry a lot of energy are located.**Support Merger**. The set of newly identified components is united with the set of components that appear in the current approximation.**Estimation**. The algorithm solves a least-squares problem to approximate the target signal on the merged set of components.**Pruning**. The algorithm produces a new approximation by retaining only the largest entries in this leastsquares signal approximation.**Sample Update**. Finally, the samples are updated so that they reflect the residual, the part of the signal that has not been approximated.

These five steps are repeated until the halting criterion is satisfied. In our analysis, we assume that the method uses a fixed number of iterations, but Needell and Tropp^{18, 19} discuss other alternatives. The `CoSaMP`

algorithm can be summarized by the pseudocode presented as Algorithm 1.

REMARK 1. *We emphasize that the method we present is a specific example from a more general* framework. *The articles ^{18, 19} discuss a number of variations. In particular, note that the term 2s in the identification step can be replaced by αs for other values of α. This type of tuning is valuable in practice, but it makes the proof more complicated*.

REMARK 2. *Some knowledge about the sparsity level is required as input to the algorithm. There are several strategies for estimating this parameter if it is not known a priori. One such method would be to use the number m of measurements to deduce the sparsity level. Since m*
2*s* log *N is often sufficient, the estimate s
m*/2 log *N is reasonable. A second approach would be to run the* `CoSaMP`

*algorithm using a variety of sparsity levels and select the best approximation a obtained using the empirical error ||Φa – u||_{2}. If one varies s according to a geometric progression (e.g., s = 1, 2, 4, 8, …, m), this variation increases the runtime by at most* O(log

*m*).

*See Appendix A of Needell and Tropp*.

^{19}for more details`CoSaMP`

was designed to be a practical algorithm for signal recovery, so it is worth spending some time on implementation issues. The least-squares step in the algorithm presents the largest contribution to the runtime. Fortunately, we have constructed Φ_{T} to ensure it is well conditioned, so we can apply its pseudoinverse quickly using an iterative method. Section 5 of Needell and Tropp^{19} contains a thorough analysis of iterative least-squares methods as modules in `CoSaMP`

. This analysis shows that the cost of solving the least-squares problem is O(ℒ), where ℒ bounds the cost of a matrixvector multiply with Φ_{T} or
.

The remaining steps of the algorithm involve standard techniques, and we can estimate the operation counts as follows.

**Proxy**. Forming the proxy is dominated by the cost of the matrixvector multiply Φ**v*.

**Identification**. We can locate the largest 2*s* entries of a vector in time O(*N*) using the approach in Cormen et al.^{7, Ch. 9}. In practice, it may be faster to use an O(*N* log *N*) sorting algorithm (e.g., quicksort, mergesort, heapsort^{7, Sec. 11}) on the entries of the signal and then select the first 2*s* of them.

**Support Merger**. We can merge two support sets of size O(*s*) in expected time O(*s*) via randomized hashing methods,^{7, Ch. 11} or by sorting both sets first and using the elementary merge procedure^{7, p. 29} for a total cost O(*s* log *s*).

**LS estimation**. We can use the conjugate gradient method or the `LSQR`

algorit (see e.g. Paige and Saunders^{22}) to compute
** u**. Initializing the least-squares algorithm requires a matrixvector multiply with
, and each iteration requires one matrixvector multiply each with Φ

_{T}and . Since Φ

_{T}is a submatrix of Φ, the matrixvector multiplies can also be obtained from multiplication with the full matrix. We show in Section 5 of Needell and Tropp

^{19}that a constant number of least-squares iterations suffices for our results to hold.

**Pruning**. As in the identification step, pruning can be implemented in time O(*s*), but it may be preferable to sort the components of the vector and then select the first *s* at a cost of O(*s* log *s*).

**Sample Update**. This step is dominated by the cost of the multiplication of Φ with the *s*-sparse vector * a^{k}*.

Table 1 summarizes the cost of each step for the cases in which the standard and fast multiply is used.

Finally, we may analyze the storage requirements of the algorithm. Aside from the sampling matrix storage requirement, the algorithm constructs only one vector of length *N*, the signal proxy. The sample vectors * u* and

*have length*

**v***m*, and thus require O(

*m*) storage. Since the signal approximations are sparse, they can be stored using sparse data structures which require at most O(

*s*log

*N*) storage. Similarly, the index sets that appear require only O(

*s*log

*N*) storage. The total working storage is therefore O(

*N*).

The following result summarizes these observations.

THEOREM 1 (RESOURCE REQUIREMENTS). *Each iteration of* `CoSaMP`

*requires* O(ℒ) *time, where ℒ bounds the cost of a multiplication with the matrix Φ or Φ*. The algorithm uses working storage* O(*N*).

The theoretical analysis of the `CoSaMP`

algorithm is based on an estimate for how much the algorithm reduces the approximation error in each iteration. This result demonstrates that the algorithm makes significant progress whenever the error is substantially larger than a certain baseline value. We call this baseline the *unrecoverable energy v* in the signal:

The unrecoverable energy is due to the noise in the samples and the fact that the signal is not exactly sparse. For a detailed discussion, see Section 2.5 of Needell and Tropp.^{19}

The main technical result is the following iteration invariant, which we establish in Section 5.

THEOREM 2 (ERROR REDUCTION). *For each iteration k ≥ 0, the signal approximation a^{k} is s-sparse and*

*In particular*,

The main consequence of Theorem 2 is the fact that, after log_{2}(||** x**||

_{2}/η) iterations, the approximation error is no greater than η + 20

*v*. See Needell and Tropp

^{18, 19}for more discussion.

The statement of Theorem A depends on a simple upper bound for the
_{2} term in the unrecoverable energy:

This estimate is a consequence of Lemma 7 of Gilbert.^{15}

Finally, we can replace the assumption that δ_{4s} ≤ 0.1 by a stronger bound on δ_{2s} because of Corollary 3.4 in Needell and Tropp,^{19} which states that δ_{cr} ≤ *c* · δ_{2r} for any positive integers *c* and *r*.

### 4. Related Work

`CoSaMP`

is inspired by algorithmic ideas and analytic techniques that have appeared previously. We briefly summarize the major algorithmic approaches and compare them with `CoSaMP`

. This work can be classified in three rough categories: convex relaxation,^{4, 11} greedy pursuits,^{12, 20, 24} and combinatorial algorithms.^{8, 1416}

The convex optimization methods^{1, 10} attempt to reconstruct signals by solving the mathematical program

where we assume that ||* e*||

_{2}≤ ε. The intuition behind minimizing the

_{1}norm is that this norm promotes sparsity, and so the solution to this program should approximate a compressible signal well. Candès et al.

^{3}establish that each minimizer

*of Equation 6 satisfies*

**a**whenever Φ has restricted isometry constant δ_{4s} ≤ 0.2. These restricted isometry constant requirements continue to be improved.^{2, 13} The error bound for `CoSaMP`

is equivalent with Equation 7, up to the precise value of the constants.

Many algorithms have been proposed to optimize Equation 6. In particular, interior-point methods^{1, 17} are guaranteed to solve the problem to a fixed precision in O(*m*^{2}*N*^{1.5}) time.^{21} The runtime for `CoSaMP`

is much better than these interior-point methods.

Greedy algorithms such as OMP,^{24} StOMP,^{12} and ROMP^{20} are iterative methods for approximating a signal from compressive samples. In each iteration, they use a greedy rule to identify new elements in the support of the signal. These methods provide fast runtimes. On the other hand, to the extent that their behavior is understood, greedy algorithms require more measurements or produce worse errors than the convex optimization Equation 6.

Tropp and Gilbert proposed the greedy algorithm *orthogonal matching pursuit* (OMP) for signal recovery.^{24} OMP is similar to `CoSaMP`

in that it uses a signal proxy to select large components of the target signal, but it selects one such component at each iteration. However, it does not provide the same uniform guarantees as convex relaxation, and it is unknown whether it succeeds in the noisy setting.

Other algorithms based on OMP have been formulated, such as *regularized OMP*, or ROMP,^{20} developed by Needell and Vershynin. ROMP is noteworthy because the authors establish that under the restricted isometry property, the algorithm can approximately recover compressible signals from noisy samples. The error bound and measurement requirements provided by these results are analogous with those of the convex optimization method, modulo parasitic logarithmic terms. Our results for `CoSaMP`

remove all the extraneous factors, so the performance of `CoSaMP`

is essentially optimal.

Finally, we mention that there is a class of algorithms for signal recovery that have sublinear (in the signal dimension) runtime. Some examples are the Fourier sampling algorithm (FS) of Gilbert et al.,^{16} chaining pursuit,^{14} and HHS pursuit.^{15} These methods are very fast, but they require highly structured measurements.

Table 2 summarizes the behavior of the above algorithms in terms of the following five criteria.

**General samples**. Does the algorithm succeed for a variety of sampling schemes, or does it require structured samples? The designation “RIP” implies that a bound on a restricted isometry constant suffices. “Gauss” means that the algorithm succeeds when the sampling matrix Φ has iid subgaussian entries.

**Optimal number of samples**. Can the algorithm reconstruct *s*-sparse signals from a near-optimal number of measurements, *m* = O(*s* log *N*)? Or are its sampling requirements higher (by logarithmic factors)?

**Uniformity**. Given a fixed sampling matrix, does the algorithm recover all signals? Or do the results hold with high probability only for each fixed signal?

**Stability**. Does the algorithm succeed when the signal is compressible (but not sparse) and the samples are noisy?

**Running time**. In the worst case (without fast multiplies), how long does it take the algorithm to recover a real-valued *s*-sparse signal within a fixed relative precision, given a sampling matrix with no special structure? The designation LP(*N, m*) represents the cost of solving a linear program with *N* variables and *m* constraints (O(*m*^{2}*N*^{1.5}) using an interior-point method).

Under all of these metrics, `CoSaMP`

achieves the best performance out of the linear and superlinear methods. Of course, `CoSaMP`

is slower than the sublinear algorithms, but it allows more general sampling matrices and demands fewer samples, which makes it more adaptable to practical applications. `CoSaMP`

delivers the same guarantees as the best convex optimization approaches as well as rigorous bounds on computational cost and storage that are comparable with faster greedy methods. Thus, `CoSaMP`

is essentially optimal in every important respect.

### 5. Proof of Results

The `CoSaMP`

algorithm uses an approach inspired by the restricted isometry property to identify the largest *s* components of the target signal. Owing to the restricted isometry property, each set of *s* components of the proxy vector * y* = Φ*Φ

*approximates the energy in the corresponding*

**x***s*components of

*. Since the samples are of the form*

**x****= Φ**

*u**, we can obtain the proxy by applying the matrix Φ* to the samples*

**x***. Once the set*

**u***T*of significant locations is identified, the signal coefficients can be recovered using .

The algorithm repeats this idea at each iteration, updating the samples to reflect the residual (the part of the signal yet to be approximated). We use least squares to estimate the signal on this support set and repeat this process until the recoverable energy in the signal has been found.

We now outline the proof of Theorem 2.

We begin with some important consequences of the restricted isometry property. Omitted proofs appear in Needell and Tropp.^{19}

PROPOSITION 1 (CONSEQUENCES). *Suppose Φ has restricted isometry constant δ _{r}. Let T be a set of r indices or fewer. Then*

*where the last two statements contain an upper and lower bound, depending on the sign chosen*.

A corollary of these bounds is the fact that disjoint sets of columns from the sampling matrix span nearly orthogonal subspaces.

PROPOSITION 2 (ORTHOGONALITY). *Suppose Φ has restricted isometry constant δ _{r}. Let S and T be disjoint sets of indices whose combined cardinality does not exceed r. Then*

We apply this result through a corollary.

COROLLARY 1. *Suppose Φ has restricted isometry constant δ _{r}. Let T be a set of indices, and let*

**x***be a vector. Provided that r ≥|T*supp(

**)|,**

*x*The last consequence of the restricted isometry property that we will need measures how much the sampling matrix inflates nonsparse vectors. Its proof uses arguments from functional analysis.

PROPOSITION 3 (ENERGY BOUND). *Suppose Φ verifies the upper inequality of* (3), *viz*.

*Then, for every signal x*,

**5.2. Iteration invariant: sparse case**

In proving Theorem 2, we first operate under the assumption that the signal * x* is exactly

*s*-sparse. We remove this assumption later.

THEOREM 3 (SPARSE ERROR REDUCTION). *Assume x is s-sparse. For each k ≥ 0, the signal approximation a^{k} is s-sparse, and*

*In particular*,

REMARK 3. *This bound assumes the least-squares step in the algorithm is performed without error. In Section 5 of Needell and Tropp, ^{19} we study how many iterations of a least-squares solver are needed to make the resulting error negligible. We show that, if we use conjugate gradient for the estimation step of*

`CoSaMP`

*, then initializing the least-squares algorithm with the current approximation*.

**a**^{k-1}, then Theorem 3 still holdsThe proof of the iteration invariant, Theorem 3 involves a sequence of short lemmas, one for each step in the algorithm. We fix an iteration *k*, and let * a = a^{k-1}* be the signal approximation at the beginning of the iteration. We define the residual as

*, which must be 2*

**r = x – a***s*-sparse since both

*and*

**a***are*

**x***s*-sparse. We also observe that the vector

*of updated samples can be interpreted as noisy samples of the residual:*

**v**The first step in the algorithm is the identification step, in which a set of components is found corresponding to locations where the residual signal has a lot of energy. This is summarized by the following lemma which is proven in Needell and Tropp.^{19}

LEMMA 1 (IDENTIFICATION). *The set* W = supp(**y**_{2s}), *where y = Φ*v is the signal proxy, contains at most* 2

*s indices, and*

Next, the algorithm merges the support of the current estimate * a* with the new selection of components. The following simple lemma shows that components of the signal

*outside this set have small energy.*

**x**LEMMA 2 (SUPPORT MERGER). *Let* W *be a set of at most* 2*s indices. The set T* = W
supp(* a*)

*contains at most*3

*s indices, and*

The estimation step in the algorithm obtains values for coefficients in the set *T* by solving a least-squares problem. The next result bounds the error of this approximation.

LEMMA 3 (ESTIMATION^{19}). *Let T be a set of at most* 3*s indices, and define the least-squares signal estimate b by the formulae*

*where u* = Φ

*.*

**x + e***Then*

*Proof*. Note first that

Using the expression * u* = Φ

*and the fact Φ*

**x + e**_{T}=

**I**

_{T}, we calculate that

The cardinality of *T* is at most 3*s*, and * x* is

*s*-sparse, so Proposition 1 and Corollary 1 imply that

Combine the bounds to reach

Finally, invoke the hypothesis that δ_{3s} ≤ δ_{4s} ≤ 0.1.

Lastly, the algorithm prunes the intermediate estimation to its largest *s* terms. The triangle inequality bounds the error in this pruned approximation.

LEMMA 4 (PRUNING^{19}). *The pruned approximation b_{s} satisfies*

At the end of the iteration, the new approximation is formed by setting * a^{k}* =

*. The sequence of lemmas above allows us to prove the iteration invariant for the sparse case, Theorem 3. Indeed, we have:*

**b**_{s}To obtain the second bound in Theorem 3, we solve the error recursion and observe that

This point completes the argument.

**5.3. Extension to general signals**

We now turn to the proof of the main result for `CoSaMP`

, Theorem A. We must remove the assumption that the signal ** x** is sparse. Although this job seems difficult at first, it turns out to have an elegant solution. We can view the noisy samples of a generic signal as samples of a sparse signal corrupted by additional noise that reflects the tail of the signal. We have the following reduction to the sparse case, of which Theorem 2 is a consequence.

LEMMA 5 (REDUCTION TO SPARSE CASE^{19}). *Let x be an arbitrary vector in*

*C*

^{N}.

*The sample vector*

**u**= Φ**x + e**can also be expressed as whereThis reduction will now allow us to prove the iteration invariant for general signals, Theorem 2. Let * x* be a general signal. By Lemma 5, we can write the noisy vector of samples as
. Applying the sparse iteration invariant, Theorem 3, we obtain

We then invoke the lower and upper triangle inequalities to obtain

Finally, recalling the estimate for from Lemma 5 and simplifying, we reach

where *v* is the unrecoverable energy (Equation 4).

## Join the Discussion (0)

## Become a Member or Sign In to Post a Comment