Relying on BONES to Enhance Video Streaming

High-definition (HD, UHD, 3D) video streaming only works over broadband Internet connections, and even the most expensive of these cannot keep up with live feeds that drop enough frames to interrupt broadcasts. Even the largest buffer memories in a video receiver are not always enough, and thus are typically complemented with adaptive bitrate (ABR) algorithms that drop back to lower resolutions when enough hi-resolution frames are lost.

To solve this problem, artificially intelligent (AI) neural networks are being adapted to upgrade lower-resolution video frames to higher resolutions in the stream’s buffer before being displayed to the viewer. The first of these algorithms to perform neural-enhanced streaming (NES) with a mathematically provable performance guarantee, according to its inventors at the University of Massachusetts, is called Buffer-Occupancy-based Neural-Enhanced Streaming (BONES).

“BONES advances video streaming towards neural-enhanced streaming for large scale video consumption in network-constrained environments with insufficient and unstable bandwidth (especially in, 3G and 4G networks),” explained Klara Nahrstedt, director of the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign, who was not involved in the research. “BONES is based on an advanced parallel buffer structure and its corresponding resource managing service, supported by innovative algorithmic and protocol solutions. BONES’ results show the greatly improved performance in comparison to state-of-the-art control approaches, and the NES controls advances in enhancing video quality, managing video quality switches and re-buffering, and improving overall Quality of Experience [QoE].”

The University of Massachusetts recently demonstrated BONES’ performance empirically by “outperforming existing methods by 5-20% in Quality of Experience test scores,” said streaming video expert, lead researcher, and co-inventor of BONES Lingdong Wang of the University of Massachusetts. “QoE combines higher visual quality, less playback stalls, and less quality switches into a single metric,” Wang said, adding, “Compared to the reference implementation of the MPEG-DASH standard supported by big companies like YouTube, BONES can largely improve the QoE by 7.33%.”

“This work has high relevance for video streaming systems that incorporate deep learning techniques,” said Michael Zink, professor of electrical and computer engineering in the College of Engineering at the University of Massachusetts Amherst, who was not involved with this research. “Advancements in deep learning allow the enhancement of degraded video content, which is often described as ‘super resolution.’ This approach allows the streaming of low-quality video in scenarios where the bandwidth between server and client is constrained or fluctuating. At the client, super-resolution algorithms are applied to improve the quality of the video before displaying it to the viewer. This approach involves a trade-off between bandwidth consumption and local compute resources at the client.

“BONES makes an important contribution to deep learning-enhanced video streaming by solving an optimization problem for this trade-off which provides mathematical performance guarantees,” Zink said.

How Does It Work?

BONES is designed to address the fastest-growing bottleneck on the Internet, according to Lyn Cantor, chief executive officer of Internet intelligence company Sandvine Inc. (Waterloo, Ontario, Canada). “Total usage across the world’s 6.4-billion mobile subscriptions and 1.4-billion fixed connections is approximately 33 exabytes per day, with per-subscriber usage of about 4.2 GB per day” said Sandvine. Of that traffic, “video downstream now dominates,” according to Sandvine, with approximately 70% taken up by on-demand video, streaming video, and videoconferencing or person-to-person video calls.

The ability to download lower-quality videos, then enhance them to higher definition while still in the player’s buffer, without interrupting the viewer’s stream, will give much-sought-after leeway to all the streaming video giants struggling to build-out enough capacity to keep up with demand. That, according to Sandvine, includes (in approximate order of size) YouTube, Netflix, TikTok, Amazon Prime, and Disney+, as well as video embedded in Facebook Messenger, Snapchat, Instagram, and specialty channels like Peacock and Hulu.

“In these traditional video streaming systems, the only way to access high-quality content is to download it,” said Wang. “With BONES using deep learning algorithms, we open up the possibility of routinely downloading low-quality content, and then recovering visual quality in the user’s playback device via neural enhancement techniques.”

The BONES algorithm itself, however, does not contain the deep learning algorithms, but instead manages the real-time control of switching from downloaded to neural-enhanced content without interrupting the stream. BONES’ open secret to success is its use of twin buffers: one with the highest-resolution downloaded content, the other with enhanced-resolution content ready to go the instant bad frames are detected in the high-resolution buffer.

“Despite BONES’ guaranteed performance improvement, our strategy raises the real-life challenge of jointly scheduling network and computational resources, namely when and how to quickly switch to neural-enhanced streaming,” said Wang. “Previous attempts at NES have relied on black-box algorithms or heuristic approaches, which leads to unbounded and inferior results.”

BONES uses explainable parameters, which Wang claims are easy to implement, and which are both mathematically proven to improve quality and empirically proven to outperform existing methods over large-scale simulations and real-world tests.

“Our algorithm will promote media delivery systems to embrace the potential of deep learning, helping address our existing Internet communication bottleneck with computational power,” said Wang.

“We believe that BONES is destined to become the new benchmark for adaptive neural enhancement of video streams,” said Ramesh Sitaraman, Distinguished University Professor and Associate Dean for Educational Programs and Teaching in the College of Information and Computers Sciences at the University of Massachusetts at Amherst.

As they were not motivated by profit, Wang, Sitaraman, and their colleagues made BONES open-source. In that way, they felt they were helping to marry widely available deep learning-based neural enhancement modules with real-time video streaming to help improve every user’s experience.

“We reasoned that deep learning is expensive and slow, so it was important to determine when and how to perform these enhancements in the most efficient manner,” said Wang. “BONES itself is a general solution agnostic to neural enhancement methods and media formats.”

Today, BONES can be implemented on a user’s video playback hardware with no changes to the streaming source. In the future, however, Wang et. al. are working on modifications to the server side that could further enhance video streaming—even for existing video players not upgraded with BONES software.

R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.