Sign In

Communications of the ACM

Research highlights

Technical Perspective: Image Processing Goes Back to Basics


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

In recent years, the image sensors in digital cameras have improved in many ways. The increases in spatial resolution are well known. Equally important, but less obvious, are improvements in noise level and dynamic range. At this point digital cameras have gotten so good it is challenging to display the full richness of their image data. A low noise imager can capture subtly varying detail that can only be seen by turning up the display contrast unnaturally high. A high dynamic range (HDR) imager presents the opposite problem: its data cannot be displayed without making the contrast unnaturally low. To convey visual information to a human observer, it is often necessary to present an image that is not physically correct, but which reveals all the visually important variations in color and intensity. A discipline known as computational photography has emerged at the intersection of photography, computer vision, and computer graphics, and the twin problems of detail enhancement and HDR range compression (also called tone mapping) have become recognized as important topics.

Given an individual image patch, it is not difficult to find display parameters that will effectively convey the local visual information. The problem is this patch must coexist with all the other image patches around it, and these must join into a single, globally coherent image. Many techniques have been proposed to find an image that simultaneously displays everything clearly, while still looking like a natural image. In struggling to bring about a global compromise between all the local constraints, these techniques tend to introduce visually disturbing artifacts, such as halos around strong edges, or distortions of apparent contrast, sharpness, and position of local features.

Performance has improved through the use of increasingly sophisticated image processing techniques, which can manipulate information smoothly across multiple spatial scales, while preserving the integrity of sharp edges. Recent progress in "edge-aware" processing builds on a foundation of work in such topics as anisotropic diffusion, regularization, and sparse image coding. New classes of edge-aware filters have been devised, utilizing ideas from robust estimation. Novel forms of wavelet decomposition have been introduced, specifically to deal with the challenges of processing sharp edges within a multiscale representation. However, none of the methods has proven entirely satisfactory, and some of them are quite complex.

In the following paper, Paris et al. made a surprising move. They chose to build a system on the Laplacian pyramid, which is a very simple multiscale representation that predates wavelets. It lacks an impressive mathematical pedigree, but is still widely used because of its simplicity and reliability; it serves as a basic building block for many image-processing schemes. At the same time, the Laplacian pyramid seems ill suited to any tasks involving specialized processing near edges. Its basic functions are smooth, overlapping, and non-oriented, whereas edges are sharply localized and oriented.

The authors also eschew a wide range of modern techniques. Indeed, the most striking thing about the paper is what is missing: There are no statistical image models, no machine learning, no PDEs, no fancy wavelets, and no objective functions. Instead, the authors return to an old-fashioned style rarely seen today: carefully considering a problem at the level of pixels and patches, and specifying the requirements in the most direct possible way. It should be noted that these authors are fully capable of developing elaborate machinery when they need it, but they have chosen to avoid it here. They want to rethink the problem from the ground up, setting out basic principles about the behavior they desire with edges, textures, and smooth regions.

Their new direction is quite unexpected. To make an analogy, it is almost as if some experts in 3D manufacturing decided to abandon their CAD systems and 3D printers in order to sculpt marble with a hammer and chisel. Sometimes the fancy tools get in the way, and the best thing is to get back in direct contact with the material.

The results in this case are stunning. The authors are able to achieve extreme levels of detail enhancement and HDR range compression. There are almost no visible artifacts. It is difficult to believe anyone can do much better, and in that sense one could say the problems have been solved.

So, is this paper the last word? No, because beautiful pictures are not enough. It is still important to situate the work intellectually within the greater worlds of image processing and computational photography. How do these techniques relate to the many other approaches to detail enhancement and HDR range compression? How can the insights from this paper be integrated into methods that are couched in other languages, such as wavelets or image statistics? More generally, what does this paper teach us about the underlying problems of edge-aware image processing? There is already progress on these questions, as noted in the revised research that Paris et al. present here. We can expect more insights to follow, as people digest the results of this refreshingly original paper.

Back to Top

Author

Edward Adelson (adelson@csail.mit.edu) is the John and Dorothy Wilson Professor of Vision Science in the Department of Brain and Cognitive Sciences at MIT, Cambridge, MA.

Back to Top

Footnotes

To view the accompanying paper, visit doi.acm.org/10.1145/2723694


Copyright held by author.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.


 

No entries found