The Edge of Computational Photography

Since their introduction more than a decade ago, smartphones have been equipped with cameras, allowing users to capture images and video without carrying a separate device. Thanks to the use of computational photographic technologies, which utilize algorithms to adjust photographic parameters in order to optimize them for specific situations, users with little or no photographic training can often achieve excellent results.

The boundaries of what constitutes computational photography are not clearly defined, though there is some agreement that the term refers to the use of hardware such as lenses and image sensors to capture image data, and then applying software algorithms to automatically adjust the image parameters to yield an image. Examples of computational photography technology can be found in most recent smartphones and some standalone cameras, including high dynamic range imaging (HDR), auto-focus (AF), image stabilization, shot bracketing, and the ability to deploy various filters, among many other features. These features allow amateur photographers to produce pictures that can, at times, rival photographs taken by professionals using significantly more expensive equipment.

Computational photography uses hardware to capture image data, and software to adjust image parameters to yield an image.

One of the key computational photography techniques that has been employed across smartphones and standalone cameras is HDR, a technique that is designed to reproduce a greater dynamic range of luminosity, or brightness, than is possible with standard digital imaging or photographic techniques.

The human eye adjusts constantly to adapt to a broad range of luminance present in the environment via changes in the iris, and the brain continuously processes this data so a viewer can see in a wide range of lighting conditions. Today’s complementary metal oxide semiconductor (CMOS) image sensors can capture a high dynamic range (bright and dark areas) from a single exposure, or from multiple frames of the same image taken within milliseconds of each other. By using tuned algorithms to process this information, the images are combined so a final image can display a wider dynamic range without requiring any image compression. Furthermore, most smartphones are now designed to allow HDR to be turned on automatically.

“The value and the benefit to the user is that [they] don’t need to turn this mode on; the software just takes care of it for them,” says Josh Haftel, principal product manager with Adobe Systems, which makes image-processing software, including Adobe Lightroom (a family of image organization and image manipulation software). Haftel observes that HDR is an example of one of the first computational photography technologies that really resonated with the public, because it provided real value to users by allowing them to produce brilliant-looking pictures without requiring any significant user decisions.

Another technology related to HDR that has been incorporated into Google’s Pixel smartphone is Night Sight. Night Sight is a feature of the Pixel Camera app that allows users to take photographs in dimly lit or dark situations, and actually makes them brighter than they are in reality, without any graininess or blurriness in the background. Before a picture is taken, the software uses motion metering to account for camera movement, the movement of objects in a scene, and the amount of light available, to decide how many exposures to take and how long these should be. Night Sight then segments the image exposure into a burst of consecutively shot frames, which are then reassembled into a single image using an algorithm trained to discount and discard the tints cast by unnatural light, thereby allowing for proper reproduction colors of objects. The software’s tone-map was adjusted to bring out the colors in a low-light image that can’t be perceived by the human eye in low-light situations. The results are hyper-real images that maintain the dark background of the surroundings, but feature more brilliant colors and detail than the human eye can process in real life.

“Google has recently been doing a great job promoting their Night Sight mechanism,” Haftel says. “That’s a big problem that customers have, which is ‘how do I take a photo at nighttime that’s neither grainy nor blurry, [while also ensuring] I can see people’s faces?'”

Another key technology that has been deployed is autofocus (AF), which uses sophisticated pattern, color, brightness, and distance detection to understand subjects and track them. The goal of AF is to help camera sensors recognize these objects, and then adjust the camera’s focus settings automatically and quickly to allow them to track their typical movement, ensuring faster and more accurate focus tracking. “[Autofocus makes] focus easier for everything from sports to weddings to parents wanting to shoot their toddlers and kids,” says Rishi Sanyal, science editor at Digital Photography Review. “They’re even using machine learning to teach their AF systems to recognize faces, eyes, animals, and [objects] like trains and motorcycles.”

Computational photography can also be used to create images taken from a camera’s data sensors to produce a photo that would be impossible to capture with more conventional tools. Examples include the ability to capture multiple frames or multiple camera inputs and then fuse them into a single image, allowing for crisper or richer images in a single shot. Incorporating a synthetic zoom view that looks nearly as good as one produced via the traditional external lens used on professional cameras allows elements from both a wide shot and a telephoto shot to be combined automatically.

“You could take a photo with 100 people in the picture, but if you want to take your friend in the center of the picture and are using a telephoto sensor, her face or his face will be very clear,” explains Zack Zhou, senior director of engineering at Qualcomm, Inc.

This type of compuational technology has become somewhat commonplace in the market today. “There are many smartphone OEMs that are using multiple sensors with actual multiple depth-to-field lenses to get just a few different planes of focus, up to many, many planes of focus so that you could refocus [the photo] after the fact,” says Judd Heape, senior director of product management for cameras, computer vision, and video at Qualcomm. Qualcomm supplies the SnapDragon Mobile Platform, a hardware platform that supports a wide range of computational photography techniques and technologies and is used in virtually all smartphones (except for Apple’s iPhones).

Still, the best computational techniques are not yet able to outperform top professional photographers using professional-level digital single-lens reflex cameras (DSLRs, which feature larger lenses and better sensors that still yield better large-format images than consumer cameras or smartphone cameras, due to their ability to capture more light). Nevertheless, there is significant recognition that computational photography is here to stay, and will be a point of technological investment and improvement in the years to come in both smartphones and standalone camera bodies.

One example is the Light L16, a standalone multi-lens, multi-sensor camera released in July 2017. When the user shoots a picture using the Light L16, the camera captures 10 or more images simultaneously, each with a slightly different perspective of the same scene. The L16 uses algorithms to choose a combination of its 28mm, 70mm, and 150mm modules to use in each shot, depending on the level of zoom. These individual shots are then computationally fused together to create a high-resolution 52-megapixel (MP) photograph.

Though a Light representative did not wish to comment on the camera or anticipated future developments, the company’s November 2018 press release indicated improvements to the L16 were imminent, such as allowing the user to adjust the aperture, or depth effect, after a photo has been captured, using images from five camera modules. The L16 is also being updated to allow video recording at 1080p resolution and 30 frames per second.

“You have companies like Light who are going out there and utilizing multiple lenses to try and overcome the idea that you can’t have a long lens on a smartphone because of physics,” says Adobe’s Haftel, who says sensors, mirrors, and algorithms are used to the mimic the look and feel of an image being taken on a high-end professional camera that has its subject in focus, and the background out of focus, known as a bouquet.

The best computational techniques are not yet able to outperform professional photographers with professional-grade camera equipment.

Some more traditional high-end standalone cameras also have incorporated computational photographic features. The Canon 5D Mark IV includes a DIGIC 6+ Image Processor, which uses a noise-processing algorithm to help keep noise at a minimum at high ISO settings, an automatic AF selection mode, and a Digital Lens Optimizer that can automatically apply a variety of aberration and diffraction corrections, as well as other corrective measures specific to the lens in use.

Meanwhile, Nikon announced in January its CoolPix B600 camera, which also includes computational photography-based features, such as its 19 scene modes; the user only needs to select the most appropriate mode for the scene, and the camera automatically applies the appropriate settings. The CoolPix B600’s Creative mode offers 36 effects, designed to provide optimal combinations of exposure, contrast, and color reproduction.

High-end lens maker ZEISS introduced in 2018 the ZX1, a camera built on an Android platform using the Qualcomm Snapdragon processor, and outfitted with lots of RAM, a graphics processing unit (GPU), and a large hard drive, feature sets typically found on smartphones. The camera was slated to be available by early this year, and could be a game changer, given that unlike the Light L16, the ZX1 features a high-performance 35mm f/2 ZEISS Distagon mirrorless lens system, as well as classic dials to control aperture, shutter speed, and sensitivity, providing a traditional, comfortable way for photographers to adjust settings.

“ZEISS is the first company that we’ve seen do this in a rather meaningful way,” Haftel says, noting that the ZX1’s combination of classic features and computational photography elements allow photographers to “do the similar kinds of computational photography that you’re seeing with handset manufacturers.”

Ultimately many of the technical improvements on the horizon will be focused on incorporating multiple lenses combined with the image stacking and super high-resolution techniques, to provide a wide zoom range, Sanya says. Further, dedicated cameras, which can shoot at 20 to 60 frames per second, will allow a multitude of sophisticated image stacking possibilities in the future, given the large amount of high-quality image data that is provided by the high frame rate. Further, high-definition images that are fed into specific algorithms can also be used to help render photographs with a surprising amount of depth, such as those found in Facebook’s three-dimensional photos.

“I think a lot of the OEMs are working with software partners out there that are dreaming up all kinds of computational use cases, many of which we probably haven’t even thought of yet,” Qualcomm’s Heape says, highlighting work being done to address the processing of multiple planes of focus at once, as well as the ability to freeze a part of an image while other parts of the image remain in motion, and segmentation techniques that allow users to highlight certain parts of an image and colorize them while the rest goes to black and white or goes into a bouquet. “These are some common use cases, but I think there will be even more in the coming couple years that we haven’t even really thought of yet.”

Further Reading