Architecture and Hardware News

Holograms on the Horizon?

Machine learning drives toward 3D imaging on the move.
globe and light shining from display screen, illustration
  1. Article
  2. Author
globe and light shining from display screen, illustration

Researchers at the Massachusetts Institute of Technology (MIT) have used machine learning to reduce the processing power needed to render convincing holographic images, making it possible to generate them in near-real time on consumer-level computer hardware. Such a method could pave the way to portable virtual-reality systems that use holography instead of stereoscopic displays.

Stereo imagery can present the illusion of three-dimensionality, but users often complain of dizziness and fatigue after long periods of use because there is a mismatch between where the brain expects to focus and the flat focal plane of the two images. Switching to holographic image generation overcomes this problem; it uses interference in the patterns of many light beams to construct visible shapes in free space that present the brain with images it can more readily accept as three-dimensional (3D) objects.

“Holography in its extreme version produces a full optical reproduction of the image of the object. There should be no difference between the image of the object and the object itself,” says Tim Wilkinson, a professor of electrical engineering at Jesus College of the U.K.’s University of Cambridge.

Conventional holograms based on photographic film can capture interference patterns that work over a relatively wide viewing range, but cannot support moving images. A real-time hologram uses a spatial-light modulator (SLM) to alter either the amplitude or phase of light, generally provided by one or more lasers, passing through it on a pixel-by-pixel basis. Today’s SLMs are nowhere near large or detailed enough to create holographic images that can be viewed at a distance, but they are just good enough right now to create near-eye images in headsets and have been built into demonstrators such as the HoloLens prototype developed by Andrew Maimone and colleagues at Microsoft Research.

A major obstacle to a HoloLens-type headset lies in the computational cost of generating a hologram. There are three algorithms used today to generate dynamic holograms, each of which has drawbacks. One separates the field of view into layers, which helps reduce computation time but lacks the ability to fine-tune depth. A scheme based on triangular meshes, like those used by games software that render 3D scenes onto a conventional two-dimensional (2D) display, helps cut processing time (although without modifications to handle textures, it lacks realism). The point-cloud method offers the best potential for realism, although at the expense of consuming more cycles. In its purest form, an algorithm traces the light emanating from each point to each pixel in the SLM’s replay field. “Light from a single point can diverge to a very wide area. Every single point source creates a sheet of refractions in the replay field,” says Wilkinson.

A drawback of the point cloud is that light from every point will not reach every pixel in the target hologram, because it will be blocked by objects in front of it. That calls for software to remove the paths that should be occluded, which increases the number of branches in the code. Though it removes the need to map the light from every point onto every pixel in the SLM, the checks and branches slow down execution. Photorealistic holograms intended for use as codec test images, created using a method developed by David Blinder, a post-doctoral researcher at Belgium’s Vrije Universiteit Brussel, and colleagues, take more than an hour to render using an nVidia Titan RTX graphics processing unit. However, numerous optimizations have been proposed that reduce arithmetic precision and the steps required, with some loss of quality, to achieve real-time performance on accelerated hardware.

The MIT approach uses several approximations and optimizations built around a deep neural network (DNN) made up of multiple convolutional layers that generate the image from many subholograms. This involves far fewer calculations than trying to map a complete point cloud directly to a final complete hologram. In conventional optimizations, lookup tables of diffraction patterns can help build those subholograms more quickly, but it is still an intensive process.

The DNN allows a more progressive approach to assembling the final image, which results in fewer calculations, particularly as the network can handle occlusion. The team trained the model on images of partially occluded objects and their sub-hologram patterns. The resulting algorithm can deliver images at a rate of just over 1Hz using the A13 Bionic accelerators in the iPhone 11 Pro. Without the computational optimizations provided by the DNN, the researchers suggest processing would take at least two orders of magnitude longer.

The MIT work underpins the need for good data in machine learning. The team looked at existing datasets for generating the required data, but all of them missed key components that made it impossible to train an effective model. One issue Ph.D. student Liang Shi and coworkers on the MIT project found is that existing datasets have objects clustered either at close range or far away from the viewer, with relatively few objects in the middle ground. This work needed a more consistent set of examples to avoid biases in the model that would lead to unwanted artefacts appearing in rendered scenes. Shi points out that the RGB image and depth data also need to be well aligned to ensure the DNN handles occlusions well. “This prohibits the use of real-world captured datasets, which often have undefined depth regions or misaligned depth values,” he notes.

Wilkinson argues machine learning used in this way is unlikely to fit well with holographic displays that need to employ more extensive calculations of photon interference. These typically use Fourier transforms rather than an approximation of diffraction based on Fresnel optics, which underpin the subhologram algorithms.

“Machine learning is generally a one-to-one or many-to-one translation process. Holography, because it’s Fourier-based, is a one-to-many process. Each point can have an effect on every other,” Wilkinson says. He points to that fact that the patterns seen in SLMs for full holograms tend to look like “random mush, though what you get out at the end is a lovely hologram. In these types of machine learning system, if you look at what they display on the SLM, you see a partially diffracted version of the real image.”

The pixel density and resolution limitation of SLMs limit the effective viewing angle that can be supported, as well as the size of the “eyebox.”

Blinder says the approaches taken by MIT and others may not scale well if SLMs evolve to deliver larger fields of view. “The method is probably not suitable for holographic television with multiple viewers.”

In the near term, this may not be an issue. The pixel density and resolution limitation of SLMs limit the effective viewing angle that can be supported, as well as the size of the “eyebox,” which is the size of the region in which an observer will be able to see any of the hologram. Eye tracking coupled with rapid re-rendering can potentially compensate for these limitations in headsets and avoid the need to implement algorithms that can handle wider viewing ranges.

Machine learning also could help improve the perceived quality of the display’s output. Gordon Wetzstein, an assistant professor of electrical engineering at Stanford University, says SLMs and the other optics in holographic displays are difficult to control, which leads to degraded image quality in experiments. “They almost never behave in exactly the way you simulate them. Machine learning can compensate for this difference by learning proxy models of the hardware,” he says.

Wetzstein and coworkers used a camera-in-the-loop system to help train models to compensate for the optical imperfections and improve perceived image quality. Shi says the MIT team is working on similar approaches built on top of the DNN-based rendering system. “We have done follow-up work that takes into account both SLM deficiencies and a user’s vision aberrations and compensate for both in the hologram computation.”

Wilkinson reckons machine learning may be overkill in correction, at least for consumer displays. “Aberration is often quite a low-order problem, though there are some applications where it isn’t, such as free-space optical communications. I would not be surprised if machine learning were ultimately used there.”

The open question is whether machine learning will become a mainstay of holographic rendering, or whether work on algorithms will result in similar or even greater computational efficiencies that can be used in commercial holographic displays or projectors.

Wilkinson says opportunities remain for deterministic, rather than AI-based, techniques that are optimized for performance. In much of the computational holography work so far, he says there is a tendency to stick to known solutions for calculating holograms. “Today, there are just three algorithms we use for holography. That can’t be right. We must be missing something here. We tend to find a solution that works and just use that. We don’t think outside the box too much. I think that’s a mistake.”

One issue is that it is difficult to determine how well an algorithm performs in terms of image quality. This, says Wilkinson, is where machine learning’s use of error minimization and norms may prove useful by providing automated ways of evaluating how close images are to a golden reference.

Blinder says holographic displays may take a similar path to systems such as nVidia’s Deep Learning Super Sampling, which employs machine learning to interpolate higher-resolution imagery from low-resolution, partially rendered data. “Given the generality of DNNs, I think that hybrid systems will be the most likely future outcome. But this may be more challenging to achieve in holography since information is not well-localized spatially.”

One possible direction for machine learning for holography may be in combining the output from multiple SLMs to try to build larger-scale projectors rather than headsets, Wilkinson says.

SLM size, resolution, and switching performance remain obstacles to delivering viable headsets, but work on computational holography has led to manufacturers taking more of an interest in these applications. “We are starting to see custom silicon appear that shows the manufacturers are taking holograms seriously,” Wilkinson says.

With improvements in both hardware and algorithms, virtual reality may be able to move away from stereoscopic displays and the usability that go with them.

*  Further Reading

Maimone, A., Georgiou, A., and Kollin, J.S.
Holographic near-eye displays for virtual and augmented reality, ACM Transactions on Graphics, Vol. 36, No. 4, Article 85 (2017).

Shi, L., Li, B., Kim, C., Kellnhofer, P., and Matusik, W.
Towards real-time photorealistic 3D holography with deep neural networks, Nature, Vol 591, p234, 11 March 2021.

Chang, C., Bang, K., Wetzstein, G., Lee, B., and Gao, L.
Toward the next-generation VR/AR optics: a review of holographic near-eye displays from a human-centric perspective, Optica, Vol 7, Number 11 (2020).

Blinder, D., Ahar, A., Bettens, S., Birnbaum, T., Symeonidou, A., Ottevaere, H., Schretter, C., and Schelkens, P.
Signal processing challenges for digital holographic video display systems, Signal Processing: Image Communication 70 (2019) 114-130.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More