Smarter Photography

side-by-side photos of a butterfly — Lytro Inc.'s Light Field Engine software travels with an image, thereby enabling users to selectively focus on any part of an image by clicking on it.

For most of its first 10 years, digital photography largely mimicked film photography. Of course, Photoshop and other software made image processing easier, but digital photographers worked in much the same way as before and were often beset by similar problems. Now, however, digital photography is advancing to a fundamentally new level of capability, aided by technology that would have been unimaginable a decade ago.

For example, in the days of film photography, a photographer finding an unwanted object in a picture had few options. He or she might have cropped it out, “burned” the unwanted object in the darkroom to make it less conspicuous, or tried to take a better photo the next time. Soon after 2000, however, a digital photographer possessed a new option with photo-editing software’s clone tool, which could copy over the unwanted object with a more desirable part of the image.

Going one step further, computer scientists at Carnegie Mellon University have demonstrated prototype software that searches millions of photos on the Internet, locates one similar to yours, and downloads an image patch to paste into your photo. “The whole idea seems a little ludicrous,” says Alexei Efros, leader of the team that invented the technique. “It doesn’t seem like this should work.”

Advancements in computational photography are dramatically increasing both the ease of taking pictures and the quality of those pictures. The advances can be classified as those occurring in camera hardware, in post-processing software, in prompts and automated assists to the user taking a picture, and in powerful tools for users retrieving and viewing digital images.

Lytro Inc.’s recently introduced light field camera uses a unique new sensor and software to, in essence, capture everything in sharp focus. The magic is that, unlike a conventional sensor that records only the color and intensity of light, the Lytro’s image sensor also records the direction, or angle, at which a ray of light strikes it. This additional data enables algorithms in the camera’s software to reconstruct a sharp view of any part of the photograph. User software, called the Light Field Engine, travels with the image so viewers can selectively focus by clicking on any part of the image.

“Being able to snap the shutter without picking a thing of interest, or focal point, and being able to refocus it later has huge potential,” says Richard Koci-Hernandez, assistant professor at the University of California, Berkeley’s Graduate School of Journalism and a beta user of the Lytro camera. Koci-Hernandez recalls his days as a film photographer at the San Jose Mercury News, when he often had to make split-second decisions about where in the viewfinder to focus. This is not necessary with the Lytro camera, he notes.

While the Lytro camera greatly expands the in-focus depth of field of an image, a similar concept expands the dynamic range of exposure in a photo. Many cellphone cameras can now create these high dynamic range images by combining the best-exposed portions of multiple images, which are taken in split seconds without user awareness.

A different approach to capturing useful information with an image is taken by Shree Nayar, chairman of the computer science department at Columbia University and a pioneer in computational imaging, who combines algorithms and innovative optics to produce images with more than one billion pixels. Nayar and his students have developed computational cameras that can achieve splendidly pixel-rich images without the very large and expensive lenses that such images would normally require. The difficulty in attaining such pixel-rich images lies not in the sensor as gigapixel sensors are available today. Rather, the difficulty is in the limits in resolving power, due to geometric aberrations, inherent in conventional optics.

Nayar’s lenses, which are spherical in shape, produce coded output that is manipulated by algorithms to remove any aberrations. The most-advanced lens design has a number of smaller “relay lenses” positioned on the surface of a ball lens to focus small portions of the overall image onto megapixel sensors positioned in a hemispherical array around the lenses. Compared with other gigapixel cameras, this camera is small, less than 10 cm. in diameter, and relatively inexpensive to produce, Nayar says.

Nayar’s work is partly funded by the U.S. Defense Advanced Research Projects Agency, and has obvious defense and security applications. “If I can recognize your face from a mile away, that’s a game changer,” says Nayar.

Post-Processing Software

Adobe Systems, manufacturer of the flagship Photoshop image-processing software, has conducted substantial research in plenoptics, in which multiple images or perspectives are generated with the single push of the shutter button and then combined. Plenoptics enables the Lytro light field camera’s variable focusing, high dynamic range’s variable brightness or exposure, and variable perspective. “All of these can be done after the fact in software,” says Bryan O’Neil Hughes, senior product manager for Photoshop.

Although Adobe does not sell cameras, it developed a lens, similar to an insect eye with multiple lenses, which consists of hundreds of glued-together micro-lenses, each of which can sample incoming light from a different perspective. With an array of new algorithms and a 60 megapixel sensor, the Adobe lens was able to achieve variable focus after the fact, as the Lytro camera does.

Adobe has also developed experimental, computationally intensive software that can provide “deblurring” capabilities by working on traditional images. Part of Adobe’s trick is to distinguish blur caused by camera movement from blur caused by subject movement. Unlike the plenoptic research involving multiple images, Adobe’s deblurring works with a single image.

Meanwhile, scientists at Microsoft Research have used both hardware and software to fix the blurring caused by camera movement. Their experimental system, which operates in-camera, uses inexpensive inertial sensors, like those in many cell phones, and to estimate a point-spread function (PSF) from a single image capture. The PSF, essentially a representation of the amount and direction of blur, goes through a process of deconvolution that creates a blur-free image.

Software alone can produce some deblurring, as Adobe has demonstrated, but adding the inertial data helps the software do a better job, says Richard Szeliski, director of the Interactive Visual Media Group at Microsoft Research. “This is multisensor fusion—the inertial sensors plus the visual [optical] sensors.”

Szeliski’s method works with a single image, but he says a trend for future cameras will be very rapidly capturing multiple images of the same scene, then merging them in different ways to achieve various objectives. That will require very short exposures, which are individually noisy but can be scrubbed of noise through averaging. “All this requires faster silicon and faster digital signal processors,” he says.

Helping Photographers

Sensors, insect-eye lenses, lightning-fast exposures, and smart algorithms have provided significant advantages, but why not help photographers avoid some of the age-old pitfalls in shooting? That is the approach taken by computer scientist Stephen Brewster and colleagues at the University of Glasgow. Their work centers on the cameras in Android smartphones because they are more advanced than conventional digital cameras in many ways and because of their open architecture, fast processors, and increasing popularity.

One of Brewster’s experimental cameras uses a smartphone’s accelerometer to warn the photographer that the camera is moving too much to get a sharp picture. It does this by displaying a warning in the camera’s image display or, for people who are holding the phone away from their head, by an audible or vibrational cue. And for users who do not understand the luminance histogram that appears on the rear of many digital cameras, Brewster’s team has created a camera that emits a low tone when an image is underexposed and a high tone when it is overexposed. Another set of techniques use the face-detection software built into Android smartphones to help the photographer improve composition.

For the user who may regard these multiple warnings as information overload, the University of Glasgow researchers have taken all of the camera’s measurements about exposure, subject, camera movement, and composition and merged them into a summary Picture Quality Index. The Picture Quality Index is updated once per second when the shutter button is half-depressed and helpfully displays its evaluation—red for poor, amber for fair, and green for good—in the viewfinder image.

Image Viewing and Retrieval

The Lytro light field camera is about getting objects in focus, but it is also about giving photographers multiple choices long after the picture is taken. Microsoft and others are taking that trend further, partly to overcome limitations in current displays. The average computer monitor lacks the contrast ratio and resolution to properly view images with billions of pixels and a very large range of luminance.

Microsoft’s HD View software allows smooth panning and zooming across gigapixel images, including panoramas. It also adjusts the dynamic range of the portion of the photo being viewed by the user to the much more limited luminance range of his or her monitor so that, for example, the viewer can see good detail when looking at a bright sky, but also when looking into the dark shadow under a tree, when both details appear in the same photograph.

Empowering users after a photo is taken is at the core of the research done by Carnegie Mellon’s Alexei Efros. His remote cloning technique works surprisingly well because photographers tend to shoot similar things over and over, he says. But, more important, the remote cloning works because it employs a relatively new approach to modeling in which very simple machine-learning techniques are applied to huge databases. Large amounts of data can overcome weaknesses in algorithms, Efros says. “The standard view in computer science has been that the most important thing is the algorithm, then you have the representation, then you find some data,” he says. “But it’s actually the other way around: The most important thing is the data. Then it’s the representation, and only then comes the algorithm.”

Alexei Efros’ remote image-cloning technique employs a relatively new approach to modeling in which simple machine-learning techniques are applied to huge databases.

Efros and his colleagues are applying this principle to a new technique that finds visually similar images even if they are quite different at the pixel level and are not effectively matched by conventional techniques. It uses a statistical technique called support vector machine to estimate the relative importance of different features in a query image. “Our approach shows good performance on a number of difficult cross-domain visual tasks by, for example, matching paintings or sketches to real photographs,” he says.

The approach by Efros’ team could create “a new age in visual expression,” says Columbia’s Shree Nayar. Existing image-editing tools use classical image processing techniques, he says, “but now you have the opportunity to use lots of data and machine learning.”

Efros says his ultimate goal is to create a “visual memex”—a model for linking images not by categories, such as car, person, or city, but by zeroing in on what is unusual or unique in an image. These visual characteristics would become, in effect, hyperlinks. In a YouTube video titled “Data-driven Visual Similarity for Cross-domain Image Matching,” Efros downloads 200 images from Flickr based on a simple keyword search for “Medici Fountain Paris.” From them he builds a “visual memex graph” whose nodes are images and parts of images and whose edges are various types of associations, such as visual similarity and context. He goes on to show how his algorithms use the graph to find far better matches to a test image of the fountain than traditional searching techniques. By zeroing in on relatively small but unique elements in his test image, the technique avoids the superficial but incorrect matches based on similar skies or foregrounds that occupy large portions of the pictures but which are irrelevant to the fountain.

Where might these type of techniques lead? “My dream is to make kind of a World Wide Web for visual data,” says Efros. “That’s a 10- to 15-year project.”

Further Reading

Cossairt, O., Miau, D., and Nayar, S.
Gigapixel computational imaging, IEEE International Conference on Computational Photography, Pittsburgh, PA, April 810, 2011.

Kee, E., Paris, S., Chen, S., and Wang, J.
Modeling and removing spatially-varying optical blur, IEEE International Conference on Computational Photography, Pittsburgh, PA, April 810, 2011.

Nack, J.
Adobe demos refocusable images, http://blogs.adobe.com/jnack/2010/09/adobe-demos-refocusable-images.html, Sept. 25, 2010.

Srivastava, A., Malisiewicz, T., Gupta, A., and Efros, A.
Data-driven visual similarity for cross-domain image matching, Proceedings of the 2011 SIGGRAPH Asia Conference, Hong Kong, Dec. 1215, 2011.

Szeliski, R.
Computer Vision: Algorithms and Applications, 2011. Springer, New York, NY, 2011.

Figures

Figure. Lytro Inc.’s Light Field Engine software travels with an image, thereby enabling users to selectively focus on any part of an image by clicking on it.