The search for efficient image denoising methods is still a valid challenge at the crossing of functional analysis and statistics. In spite of the sophistication of the recently proposed methods, most algorithms have not yet attained a desirable level of applicability. All show an outstanding performance when the image model corresponds to the algorithm assumptions but fail in general and create artifacts or remove image fine structures. The main focus of this paper is, first, to define a general mathematical and experimental methodology to compare and classify classical image denoising algorithms and, second, to describe the nonlocal means (NL-means) algorithm^{6} introduced in 2005 and its more recent extensions. The mathematical analysis is based on the analysis of the “method noise,” defined as the difference between a digital image and its denoised version. NL-means, which uses image self-similarities, is proven to be asymptotically optimal under a generic statistical image model. The denoising performance of all considered methods are compared in four ways: mathematical, asymptotic order of magnitude of the method noise under regularity assumptions; perceptual-mathematical, the algorithms artifacts and their explanation as a violation of the image model; perceptual-mathematical, analysis of algorithms when applied to noise samples; quantitative experimental, by tables of *L*^{2} distances of the denoised version to the original image.

### 1. INTRODUCTION

The goal of image denoising methods is to recover the original image from a noisy measurement:

where *v*(*i*) is the observed value, *u(i)* is the “true” value, and *n(i)* is the noise perturbation at a pixel *i*. The best simple way to model the effect of noise on a digital image is to add a Gaussian white noise. In that case, *n(i)* are i.i.d. Gaussian values with zero mean and variance σ^{2}.

Several methods have been proposed to remove the noise and recover the true image *u*. Even though they may be very different in tools it must be emphasized that a wide class share the same basic remark: denoising is achieved by averaging. This averaging may be performed locally: the Gaussian smoothing model,^{27} the anisotropic filtering,^{2,35} and the neighborhood filtering;^{40,42,46} by the calculus of variations: the total variation minimization;^{39} or in the frequency domain: the empirical Wiener filters^{46} and wavelet thresholding methods.^{12,17}

Formally we define a denoising method *D*_{h} as a decomposition

where *v* is the noisy image and *h* is a filtering parameter, which usually depends on the standard deviation of the noise. Ideally, *D*_{h}*v* is smoother than *v* and *n(D*_{h}, *v)* looks like the realization of a white noise.

The denoising methods should not alter the original image *u*. Now, most denoising methods degrade or remove the fine details and texture of *u*. In order to better understand this removal, we shall introduce and analyze the *method noise.* The method noise is defined as the difference between the original (always slightly noisy) image *u* and its denoised version.

The denoising methods should not introduce visual artifacts. The *noise-to-noise* principle requires that a denoising algorithm transforms a white noise into white noise. This paradoxical requirement seems to be the best way to characterize artifact-free algorithms.

We also propose and analyze the *NL-means* algorithm, which is defined by the simple formula

where
*dz* is a normalizing constant, *G*_{a} is a Gaussian kernel, and *h* acts as a filtering parameter. This formula amounts to say that the denoised value at *x* is a mean of the values of all points whose Gaussian neighborhood looks like the neighborhood of *x*. The main difference of the NL-means algorithm with respect to local filters or frequency domain filters is the systematic use of all possible self-predictions the image can provide, in the spirit of Efros and Leung.^{19} Section 2 gives formulas for the method noise of several classic algorithms. Section 3 describes NL-means. Section 4 gives its consistency results. A substantial experimental Section 5 compares several classic algorithms by the new introduced criteria and Section 6 reviews many recent extensions, applications, and variants.

### 2. METHOD NOISE

DEFINITION 1 (METHOD NOISE). *Let u be an image and D _{h} a denoising operator depending on a filtering parameter h. Then, we define the* method noise

*as the image difference u D*

_{h}u.The application of a denoising algorithm should not alter the non-noisy images. So the method noise should be very small when some kind of regularity for the image is assumed. If a denoising method performs well, the method noise must look like a noise even with non-noisy images and should contain as little structure as possible. Since even good quality images have some noise, it makes sense to evaluate any denoising method in that way, without the traditional “add noise and then remove it” trick. We shall list formulas permitting to compute and analyze the method noise for several classical local smoothing filters: the Gaussian filtering,^{27} the anisotropic filtering,^{2,35} the total variation minimization,^{39} and the neighborhood filtering.^{46} The formal analysis of the method noise for the frequency domain filters falls out of the scope of this paper. These method noises can also be computed but their interpretation depends on the particular choice of the wavelet or Fourier basis.

**The Gaussian Filtering**: The image isotropic linear filtering boils down to the convolution of the image by a linear symmetric kernel. The paradigm of such kernels is of course the Gaussian kernel
. In that case, *G*_{h} has standard deviation *h* and it is easily seen that

THEOREM 1 (GABOR 1960). *The image method noise of the Gaussian convolution satisfies u* *G*_{h} * *u* = −*h*^{2} Δ *u +* *o*(*h*^{2}).

The Gaussian method noise is zero in harmonic parts of the image and very large near edges or texture, where the Laplacian cannot be small. As a consequence, the Gaussian convolution is optimal in flat parts of the image but edges and texture are blurred.

**The Anisotropic Filtering:** The anisotropic filter (*AF*) attempts to avoid the blurring effect of the Gaussian by convolving the image *u* at **x** only in the direction orthogonal to *Du*(**x**). The idea of this filter goes back to Perona and Malik^{35} and is interpreted in Alvarez et al.^{2} It is defined by

for **x** such that *Du*(**x**) ≠ 0 and where
= (*y, x*) and *G*_{h} is the one-dimensional Gauss function with variance *h*^{2}. If one assumes that the original image *u* is twice continuously differentiable (*C*^{2}) at **x**, it is easily shown by a second-order Taylor expansion that

THEOREM 2. *The image method noise of AF _{h} satisfies (for Du*(

**x**) ≠ 0)

By *curv(u)* (**x**), we denote the curvature, that is, the signed inverse of the radius of curvature of the level line passing by **x**. This method noise is zero wherever *u* behaves locally like a straight line and large in curved edges or texture (where the curvature and gradient operators take high values). As a consequence, the straight edges are well restored while flat and textured regions are degraded.

**Total Variation Minimization:** The total variation minimization was introduced by Rudin et al.^{39} Given a noisy image *v*(**x**), these authors proposed to recover the original image *u*(**x**) as the solution of the minimization problem:

where *TV* (*u*) denotes the total variation of *u* and λ is a given Lagrange multiplier. The minimum of the above minimization problem exists and is unique. The parameter λ is related to the noise statistics and controls the degree of filtering of the obtained solution.

THEOREM 3. *The method noise of the total variation minimization is*

As in the anisotropic case, straight edges are maintained because of their small curvature. However, details and texture can be over smoothed if λ is too small.

**Neighborhood Filtering:** The previous filters are based on a notion of spatial neighborhood or proximity. *Neighborhood filters* instead take into account gray-level values to define neighboring pixels. In the simplest and more extreme case, the denoised value at pixel *i* is an average of values at pixels that have a gray-level value close to *u*(*i*). The gray-level neighborhood is therefore

This is a fully nonlocal algorithm, since pixels belonging to the whole image are used for the estimation at pixel *i*. This algorithm can be written in a more continuous form:

where
is an open and bounded set, and *C*(**x**) =
*d*y is the normalization factor.

The Yaroslavsky neighborhood filters^{46} consider mixed neighborhoods *B*(*i, h*)࣮*B*_{ρ}(*i*), where *B*_{ρ}(*i*) is a ball of center *i* and radius ρ. So the method takes an average of the values of pixels that are both close in gray-level and spatial distance. This filter can be easily written in a continuous form as

where
*d***y** is the normalization factor. More recent versions, namely the *SUSAN filter*^{40} and the *Bilateral filter*^{42} weigh the distance to the reference pixel **x** instead of considering a fixed spatial neighborhood.

In the next theorem, we compute the asymptotic expansion of the Yaroslavky neighborhood filter when ρ, *h* → 0 have the same order of magnitude. The proof and generalizations of this result can be found in Buades et al.^{7}

THEOREM 4. Suppose *u*
*C*^{2}(Ω), and let ρ, h, α > 0 such that ρ, h → 0 and h = O(ρ). Let us consider the continuous function
defined by
for t ≠ 0,
, where
ds. Let
be the continuous function defined by
. Then, for **x**
Ω,

According to Theorem 4 the Yaroslavsky neighborhood filter acts as an evolution PDE with two terms. The first term is proportional to the second derivative of *u* in the direction ζ, which is tangent to the level line passing through **x**. The second term is proportional to the second derivative of *u* in the direction η, which is orthogonal to the level line passing through **x**.

The weighting coefficient of the tangent diffusion, *u*_{ζζ}, is given by
. The function
is positive and decreasing. Thus, there is always diffusion in that direction. The weight of the normal diffusion, *u*_{ηη}, is given by
. As the function
takes positive and negative values (see Figure 1), the filter behaves as a filtering/enhancing algorithm in the normal direction depending on |*Du*|. The intensity of the filtering in the tangent diffusion and the enhancing in the normal diffusion tend to zero when the gradient tends to infinity. Thus, points with a very large gradient are not altered.

The neighborhood filter asymptotically behaves as the PeronaMalik equation,^{35} also creating shocks inside smooth regions (see Buades et al.^{7} for more details on this comparison).

### 3. NL-MEANS ALGORITHM

Given a discrete noisy image *v* = {*v*(*i*)|*i*
*I*}, the estimated value *NL[v](i)*, for a pixel *i*, is computed as weighted average of all the pixels in the image:

where the family of weights *{w(i, j)}*_{j} depend on the similarity between the pixels *i* and *j* and satisfy the usual conditions 0 ≤ *w(i, j)* ≤ 1 and
*w(i, j)* = 1.

The similarity between two pixels *i* and *j* depends on the similarity of the intensity gray level vectors *v(N _{i})* and

*v*(

*N*

_{j}), where

*N*

_{k}denotes a square neighborhood of fixed size centered at a pixel

*k*. This similarity is measured as a decreasing function of the weighted Euclidean distance, , where

*a*> 0 is the standard deviation of the Gaussian kernel. The expectation of the Euclidean distance of the noisy neighborhoods is

This equality shows the robustness of the algorithm since in expectation the Euclidean distance preserves the order of similarity between pixels.

The pixels with a color neighborhood similar to *v*(*N*_{i}) have larger weights in the average, see Figure 2. These weights are defined by

where *Z(i)* is the normalizing constant and the parameter *h* controls the decay of the weights as a function of the Euclidean distances. NL-means not only compares the gray level at a single point but the geometrical configuration in a whole neighborhood. This fact allows a more robust comparison than neighborhood filters (see Figure 2). This figure illustrates how NL-means implicitly chooses a best averaging neighborhood depending on the image structure in a wide neighborhood.

In Section 2, we have computed explicitly the method noise of the local smoothing filters. These formulas are corroborated by the visual experiments of Figure 3. This figure displays the method noise for the standard image Boat, that is, the difference *u**D*_{h}(*u*), where the parameter *h* is been fixed in order to remove a noise with standard deviation 2.5. The method noise helps us in understanding the performance and limitations of the denoising algorithms, since removed details or texture correspond to a large method noise. We see in Figure 3 that the NL-means method noise does not present noticeable geometrical structures. Figure 4 explains this property since it shows how the NL-means algorithm chooses a weighting configuration adapted to the local and nonlocal geometry of the image.

### 4. NL-MEANS CONSISTENCY

Under stationarity assumptions, for a pixel *i*, the NL-means algorithm converges to the conditional expectation of *i* once observed a neighborhood of it. In this case, the stationarity conditions amount to say that as the size of the image grows, we can find many similar patches for all the details of the image.

Let *V* be a random field and suppose that the noisy image *v* is a realization of *V*. Let *Z* denote the sequence of random variables *Z*_{i} = {*Y*_{i}, *X*_{i}} where *Y*_{i} = *V*(*i*) is real valued and *X*_{i} = *V* (*N*_{i}\{*i*}) is
valued. The NL-means is an estimator of the conditional expectation *r* (*i*)=*E*[*Y*_{i}|*X*_{i} = *v*(*N*_{i}\{*i*})].

THEOREM 5 (CONDITIONAL EXPECTATION THEOREM). *Let Z = {V(i), V(N*_{i}\{*i*})} *for i* = 1, 2, … *be a strictly stationary and mixing process. Let NL*_{n} *denote the NL-means algorithm applied to the sequence*
. *Then for j*
{1, …, *n*},

The full statement of the hypothesis of the theorem and its proof can be found in a more general framework in Roussas.^{38} This theorem tells us that the NL-means algorithm corrects the noisy image rather than trying to separate the noise (oscillatory) from the true image (smooth).

In the case that an additive white noise model is assumed, the next result shows that the conditional expectation is the function of *V* (*N*_{i}\{*i*}) that minimizes the mean square error with the true image *u*.

THEOREM 6. *Let V, U, N be random fields on I such that V* = *U* + *N, where N is a signal-independent white noise. Then, the following statements hold good.*

*E[V(i*) |*X*_{i}=*x*] =*E*[*U*(*i*)|*X*_{i}=*x*]*for all i*.*The expected random variable E[U(i) | V*(*N*_{i}\{*i*})]*is the function of V (N*_{i}*\{i}) that minimizes the mean square error*

Similar optimality theoretical results have been obtained in Ordentlich et al.^{34} and presented for the denoising of binary images.

### 5. DISCUSSION AND EXPERIMENTATION

In this section, we compare the local smoothing filters, the wavelet thresholding algorithms,^{17} sliding DCT Wiener filter,^{46} and the NL-means algorithm. The wavelet thresholding and the sliding DCT algorithms yield state-of-the-art results among frequency domain filters.

For computational purposes of the NL-means algorithm, the search of similar windows was restricted to a larger “search window” with size *S* × *S* pixels. In all experiments, the search window has 21 × 21 pixels and the similarity square neighborhood 3 × 3 pixels for color images and 5 × 5 pixels for gray images. When denoising a color image, the whole color patch containing the red, green, and blue pixels is compared. Thus, a single weight is obtained for any pair of pixels and used for the denoising of the three channels at this pixel.

The filtering parameter *h* has been fixed to 0.8σ when a noise of standard deviation σ is added. Due to the fast decay of the exponential kernel, large Euclidean distances lead to nearly zero weights acting as an automatic threshold (Figure 4).

**Visual Comparison:** Visual criteria remains essential to decide if the quality of the image has been improved by the denoising method. We display some denoising experiences comparing the NL-means algorithm with local smoothing and frequency domain filters. All experiments have been simulated by adding a Gaussian white noise of standard deviation σ to the true image. The objective is to compare the visual quality of the restored images, the nonpresence of artifacts, and the correct reconstruction of edges, texture, and details.

Due to the nature of the algorithm, the most favorable case for NL-means is the textured or periodic case. In this situation, for every pixel *i,* we can find a large set of samples with a very similar configuration. See Figure 4f for an example of the weight distribution of the NL-means algorithm for a periodic image. Figure 5 illustrates the performance of the NL-means for a natural texture and Figure 6 for an artificial periodic pattern.

Natural images also have enough redundancy to be restored by NL-means, see Figure 7. Flat zones present a huge number of similar configurations lying inside the same object, see Figure 4a. Straight or curved edges have a complete line of pixels with similar configurations, see Figure 4b, c. In addition, natural images allow us to find many similar configurations in far away pixels, as Figure 4f shows. Figure 7 shows an experiment on a natural image. This experience must be compared with Figure 3, where we display the method noise of the original image. The blurred or degraded structures of the restored images coincide with the noticeable structures of its method noise. Figure 8 shows that the frequency domain filters are well adapted to the recovery of oscillatory patterns. Although some artifacts are noticeable in both solutions, the stripes are well reconstructed. The DCT transform seems to be more adapted to this type of texture, and stripes are little better reconstructed. NL-means also performs well on this type of texture, due to its high degree of redundancy.

**Noise-to-Noise Criterion:** The noise-to-noise principle requires that a denoising algorithm transforms a white noise into white noise. This paradoxical requirement seems to be the best way to characterize artifact-free algorithms. The transformation of a white noise into any correlated signal creates structure and artifacts. Only white noise is perceptually devoid of visual structure, a principle first stated by Attneave.^{4} Figure 9 shows how denoising methods transform a white noise.

The convolution with a Gauss kernel is equivalent to the product in the Fourier domain with a Gauss kernel of inverse standard deviation. Therefore, convolving the noise with a kernel reinforces the low frequencies and cancels the high ones. Thus, the filtered noise actually shows big grains due to its prominent low frequencies.

Noise filtered by a wavelet thresholding is no more a white noise. The few coefficients with a magnitude larger than the threshold are spread all over the image. The pixels that do not belong to the support of one of these coefficients are set to zero. The visual result is a constant image with superposed wavelets as displayed in Figure 9. It is easy to prove that the denoised noise is spatially highly correlated.

Given a noise realization, the filtered value by the neighborhood filter at a pixel *i* only depends on its value *n(i)* and the parameters *h* and ρ. The neighborhood filter averages noise values at a distance from *n(i)* less or equal than h. Thus, when the size ρ of the neighborhood increases, by the law of large numbers the filtered value tends to the expectation of the Gauss distribution restricted to the interval (*n*(*i*) *h*, *n(i)* + *h*). This filtered value is therefore a deterministic function of *n(i)* and *h*. Independent random variables are mapped by a deterministic function on independent variables. Thus the *noise-to-noise* requirement is asymptotically satisfied by the neighborhood filter. NL-means satisfies the *noise-to-noise* principle in a similar manner. However, a mathematical statement and proof of this property are more intricate and we shall skip them.

**Numerical Comparison:** Table 1 displays the mean square error for the denoising experiments given in the paper. This numerical measurement is the most objective one, since it does not rely on any visual interpretation. However, this error is not computable in a real problem and a small mean square error does not assure a high visual quality. So all above-discussed criteria seem necessary to compare the performance of denoising algorithms.

### 6. EXTENSIONS SINCE 2005

Our conclusions on the better denoising performance of nonlocal methods with respect to former state-of-the-art algorithms such as the total variation or the wavelet thresholding have been widely accepted. The “method noise” methodology to compare the denoising performance has been adopted ever since.

The NL-means algorithm is easily extended to the denoising of image sequences and video. The denoising algorithm involves indiscriminately pixels not belonging only to same frame but also to all frames in the image. The algorithm favors pixels with a similar local configuration, as the similar configurations move, so do the weights. Thus, the algorithm is able to follow the similar configurations when they move without any explicit motion computation (see Figure 10). This is not the case of classical movie denoising algorithms, which are motion compensated (see Buades et al.^{9} for more details on this discussion). The very same idea on movie denoising can be applied for *super-resolution,* an image zooming method by which several frames from a video, or several low resolution photographs, can be fused into a larger image.^{20,37}

Improvements or adaptations of the NL-means algorithm have been proposed for the denoising of several types of data: in fluorescence microscopy,^{5} cryon microscopy,^{15} magnetic resonance imaging (MRI),^{31} and 3D data set points.^{47}

Most successful improvement of NL-means combine the nonlocal principle with former classic algorithms and have indeed shown an improved denoising performance. Probably the best performing methods so far are the hybrid method BM3D proposed in Dabov et al.^{14} and the NL-PCA proposed in Zhang et al.^{48} Both algorithms combine not less than block-matching, a linear transform thresholding, and Wiener filtering.

The NL-means algorithm has also expanded to most image processing tasks: *Demosaicking,* which is the operation that transforms the “R or G or B” raw image in each camera into an “R and G and B” image;^{10,30} *movie colorization*^{21,26};*image inpainting* by proposing a nonlocal image inpainting variational framework with a unified treatment of geometry and texture^{3} (see also Wong and Orchard^{44}); *Zooming* by a fractal like technique where examples are taken from the image itself at different scales^{18}; *movie flicker stabilization ^{16}* that compensates spurious oscillations in the colors of successive frames.

NL-means is a computationally demanding algorithm. Several papers have proposed fast and extremely fast (linear) implementations, by block preselection,^{29} by Gaussian KD-trees to classify image blocks,^{1} by SVD,^{33} by using the FFT to compute correlation between blocks^{43}, and by statistical arguments.^{13} The statistical validity of the NL-means algorithm is wide open. See Ebrahimi and Vrscay,^{18} Kervrann et al.,^{24} and Thacker et al.^{41} (where a Bayesian interpretation is proposed) or Xu et al.^{45} (where a bias of NL-means is corrected).

The relationship of neighborhood filters to classic local PDE’s has been discussed in Buades et al.,^{7,8} leading to an adaptation of NL-means that avoids the stair casing effect. Yet, the main interest has shifted to defining *nonlocal PDEs.* The extension of the NL-means method to define nonlocal image-adapted differential operators and nonlocal variational methods starts with Kindermann et al.,^{25} who propose to perform denoising and deblurring by nonlocal functionals. Several articles on deblurring have followed this variational line^{22,23,32} (for image segmentation) and Lou et al.^{28} for deconvolution and tomographic reconstruction.

A particular notion of nonlocal PDE has emerged, whose coefficients are actually image dependent. For instance, in Elmoataz et al.,^{21} the image colorization is viewed as the minimization of a discrete partial differential functional on the weighted block graph. Thus, it can be seen either as a nonlocal heat equation on the image or as a local heat equation on the space of image patches.

The exploration of image redundancy and its application to image restoration has led to new attempts at sparse image representations by block dictionaries.^{24} Algorithms and application have been developed by Chatterjee and Milanfar,^{11} Mairal et al.,^{30} and Protter and Elad.^{36}

### Acknowledgment

This work was partially financed by MCYIT grant number TIN200804752.

### Figures

Figure 1. Magnitude of the tangent diffusion (continuous line) and normal diffusion (dashed line – -) of Theorem 4.

Figure 2. q1 and *q*2 have a large weight in NL-means because their similarity windows are similar to that of *p*, while the weight *w*(*p*, *q*3) is much smaller.

Figure 3. Image method noise. From left to right and from top to bottom: original image, Gaussian convolution, anisotropic filtering, total variation minimization, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filter, and the NL-means algorithm. The parameters have been set for each method to remove a method noise with variance σ^{2} = 2.5^{2}.

Figure 4. On the right-hand side of each pair, we display the weight distribution used to estimate the central pixel of the left image by the NL-means algorithm. (a) In flat zones, the weights are distributed as a convolution filter (as a Gaussian convolution). (b) On straight edges, the weights are distributed in the direction of the edge (like for AF). (c) On curved edges, the weights favor pixels belonging to the same contour or level line, which is a strong improvement with respect to AF. (d) In a flat neighborhood, the weights are distributed in a gray-level neighborhood (like for a neighborhood filter). In the cases of (e) and (f), the weights are distributed across the more similar configurations, even though they are far away from the observed pixel. This shows a behavior similar to a nonlocal neighborhood filter or to an ideal Wiener filter.

Figure 5. NL-means denoising experiment with a Brodatz texture image. Left: Noisy image with standard deviation 30. Right: NL-means restored image. The Fourier transform of the noisy and restored images show how main features are preserved even at high frequencies.

Figure 6. Denoising experience on a periodic image. From left to right and from top to bottom: noisy image (standard deviation 35), total variation minimization, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filtering, and NL-means.

Figure 7. Denoising experience on a natural image. From left to right and from top to bottom: noisy image (standard deviation σ = 20), Gaussian convolution (*h* = 1.8), anisotropic filter (*h* = 2.4), total variation (λ = 0.04), the Yaroslavsky neighborhood filter (ρ = 7, *h* = 28), and the NL-means algorithm. Parameters have been set for each algorithm so that the removed quadratic energy is equal to the energy of the added noise.

Figure 8. Denoising experience on a natural image. From left to right and from top to bottom: noisy image (standard deviation 35), total variation minimization, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filter, and NL-means.

Figure 9. The noise-to-noise criterion. From left to right and from top to bottom: original noise image of standard deviation 20, Gaussian convolution, anisotropic filtering, total variation, neighborhood filter, translation invariant wavelet thresholding, DCT sliding window Wiener filter, and NL-means. Parameters have been fixed for each method so that the noise standard deviation is reduced by a factor 4.

Figure 10. Weight distribution of NL-means applied to a movie. In (a), (b), and (c) the first row shows a five frames image sequence. In the second row, the weight distribution used to estimate the central pixel (in white) of the middle frame is shown. The weights are equally distributed over the successive frames, including the current one. They actually involve all the candidates for the motion estimation instead of picking just one per frame. The aperture problem can be taken advantage of for a better denoising performance by involving more pixels in the average.

## Join the Discussion (0)

## Become a Member or Sign In to Post a Comment