Sign In

Communications of the ACM

ACM News

Google's AI Upscaling Technology Outperforms Previous Models

View as: Print Mobile App Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Examples of Image Super-Resolution via Iterative Refinement.

On the left, original 64x64 images; on the right, the same images upscaled to 1,024x1,204 resolution via Super-Resolution via Repeated Refinements (SR3).

Credit: Google AI Blog

Until now, if you wanted to use a photo that only existed at a low resolution, you'd have no choice but to put up with the low image quality. We all know that the "zoom and enhance" features seen in movies for decades have been merely science fiction. That could soon change, thanks to recent artificial intelligence (AI) breakthroughs from Google.

In July, the company's Google Brain team published the results of research into different techniques for "image super-resolution," or using AI-powered machine learning models to turn low-resolution images into high-resolution ones.

Using two new techniques, Super-Resolution via Repeated Refinements (SR3) and Cascaded Diffusion Models (CDM), the Google Brain team has created high-resolution images from low-resolution ones at a quality level that outperforms previous methods used to perform this task.

The process is called "upscaling," and Google's two new models work together to take blurry portraits and make them photo-realistic.

SR3 is a "diffusion model" that turns low-resolution images into high-resolution ones. Diffusion models progressively add Gaussian noise, or random noise, to a dataset until it is 100% noise. A neural network is then trained to reverse-engineer the noise additions to make a low-resolution image a high-resolution image. CDM is a "class-conditional" diffusion model that is used to generate a cascade of the same image at higher and higher resolutions.

Together, the two models can be stacked to accomplish "super-resolution" tasks, such as taking an image from 64x64 resolution to a whopping 1,024x1,024 resolution.

The techniques produce better results than the previous method of models considered cutting-edge for super-resolution tasks: generative adversarial networks (GANs). GANs are named as "adversarial" because they rely on two models working against each other—one that generates examples (the generator) and one that tries to classify the examples (the discriminator). The two models compete, and in doing so, train the network. Until recently, GANs were used for super-resolution tasks, as well as for image transformation and generation.

Google's diffusion models seem likely to replace them, thanks to their advantages.

The adversarial training used by GANs leads to problems, says Minguk Kang, an image recognition researcher in the Graduate School of Artificial Intelligence of South Korea's Pohang University of Science and Technology.

"Training between the generator and discriminator in GANs easily breaks down due to the nature of adversarial learning, where the generator and discriminator have to fool each other, called mode-collapsing," says Kang. GANs also suffer from mode-dropping, in which images contain a lack of diversity.

"Diffusion models, however, can generate high-quality and diverse images very efficiently," says Kang.

There are a number of significant applications for super-resolution upscaling like Google's, says Grigorios Chrysos, a computer vision researcher at Switzerland's Ecole Polytechnique Federale de Lausanne.

Super-resolution upscaling can be used to turn older photos into high-quality images. It can deblur or upscale photos and videos that were taken imperfectly. It can even be used in biomedical applications, like medical imaging, to increase the resolution of important images. It also has implications for the physical world: if upscaling works well enough, do we even need to invest in more expensive sensors or cameras?

"I feel we have only scratched the surface with applications outside of the traditional machine learning field," says Chrysos.

Google's diffusion models remain, as of writing, unapplied in commercial applications. The company has also not shared plans for which users of its products may end up having access to the technology.

That may be because the technology still has limitations.

"One significant limitation of the proposed models is that they need substantial resources to train them, and it remains uncertain if the same results can be obtained with less computational resources," says Chrysos.

The resource needs of these models also make them slow, says Kang. Some diffusion models can take dramatically longer than GANs to generate high-resolution images—think hours instead of minutes.

"Although the synthesized outputs are promising, I think it is essential to resolve the speed for deployment in real-world applications," he says.

There remain unresolved issues with image generation models at large, including Google's. There has not been adequate study into how image recognition and generation models can resist implicit and explicit bias, or patch vulnerabilities around non-random data that skews results.

"The questions that we are asking as a research community should be extended beyond photo-realistic quality," says Chrysos. "We should be asking whether these methods are production-ready, accounting for their robustness to noise, fairness guarantees, and explainability."

It seems that, while Google has significantly moved the field forward, AI photo upscaling still has a long way to go.

Logan Kugler is a freelance technology writer based in Tampa, FL, USA. He has written for over 60 major publications.


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account