# Communications of the ACM

Research highlights

# Technical Perspective: Robust Statistics Tackle New Problems

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

The following paper represents the beginning of a long and productive line of work on robust statistics in high dimensions. While robust statistics has long been studied, going back at least to Tukey,6 the recent revival centers on algorithmic questions that were largely unaddressed by the earlier statistical work.

Robust statistics centers on the question of how to extract information from data that may have been corrupted in some way. The most common form of robustness, also considered here, is robust to outliers: some fraction of the data has been removed and replaced with arbitrary, erroneous points. A familiar instance of robust statistics is using the median instead of the mean, since the median is less sensitive to extreme points, while in contrast a single overly large value could completely skew the mean.

There are several generalizations of the median to higher dimensions, each with their own robustness properties. These include the geometric median (which finds the point with minimum average Euclidean distance to the input dataset) and the Tukey median (which finds the "deepest" point in the dataset). However, the geometric median is fragile when the dimension is large, and the Tukey median is NP-hard to compute. In general, in the older statistics literature there were no estimators that were both efficient to compute and worked well in high-dimensional settings, although Donoho1 and Donoho and Gasko2 did construct several estimators toward this general end whose ideas are still relevant today.

This brings us to the current paper by Diakonikolas et al., which constructs efficient robust estimators for several problems including robust mean estimation of a Gaussian. For this problem, all prior robust estimators either had unfavorable error in high dimensions or could not be computed in polynomial time (concurrent work by Lai et al.5 also develops a polynomial-time estimator with slightly worse bounds; and earlier work by Klivans et al.4 introduces related ideas for robust classification).

While there are a number of variations, there are two key techniques in this paper that underlie many of the results. The first, which has been most commonly used in subsequent work, is the "filtering algorithm." The basic idea is to check if your distribution has some desirable property (such as bounded covariance), such that if it did we could perform robust recovery straightforwardly. If the property holds, then we are done; otherwise, we hope that a certificate of violation of the property will allow us to "filter out" bad points and then try again. In this way, we must eventually succeed: the desired property will eventually hold since there are only a finite number of bad points.

The second technique constructs an "approximate separation oracle" for a certain convex program, such that approximately solving the convex program leads to an accurate robust estimate. The twist is the convex program depends on the quantity that we are supposed to estimate in the first place! But this is where the approximate separation oracle comes in—it turns out to be possible to construct this separation oracle without knowing the quantity itself. This particular technique has not been as widely employed as the filtering algorithm, but is spiritually related to later ideas such as the non-convex analysis of Zhu et al.7

The following paper constructs efficient robust estimators for several problems including robust mean estimation of a Gaussian.

Since these early papers, researchers have explored a number of algorithmic questions including robust classification and regression, robustness in list-decoding models, sum-of-squares proofs for robustness, and non-convex optimization for robust recovery. The recent work has also moved beyond algorithmic questions to unearth new statistical insights, such as new estimators that work under fairly weak statistical assumptions, connections to minimum distance functionals introduced by Donoho and Liu,3 and robustness to new forms of corruptions defined by transportation metrics.

For those interested in learning more, there are now several tutorials, expositions, and courses on these topics, including the thesis of one of the authors, a related course at University of Washington, and my own lecture notes.

### References

1. Donoho, D.L. Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, 1982.

2. Donoho, D.L. and Gasko, M. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Annals of Statistics 20, 4 (1992), 1803–1827.

3. Donoho, D.L. and Liu, R.C. The "automatic" robustness of minimum distance functionals. Annals of Statistics 16, 2 (1988), 552–586.

4. Klivans, A.R., Long, P.M. and Servedio, R.A. Learning halfspaces with malicious noise. J. ML Research 10 (2009), 2715–2740.

5. Lai, K.A., Rao, A.B. and Vempala, S. Agnostic estimation of mean and covariance. Foundations of Computer Science, 2016.

6. Tukey, J.W. A survey of sampling from contaminated distributions. Contributions to Probability and Statistics 2 (1960), 448–485.

7. Zhu, B., Jiao, J. and Steinhardt, S. Robust estimation via generalized quasi-gradients. arXiv preprint, 2020.

### Author

Jacob S. Steinhardt is an assistant professor at the University of California Berkeley, CA, USA.

### Footnotes

To view the accompanying paper, visit doi.acm.org/10.1145/3453935

Request permission to (re)publish from the owner/author