TUTORIALS
List of Tutorials
- Generative Face Video Coding for Video Conferencing: Introduction, Performance and Challenges
- Theory and Applications of Graph-based Nearest Neighbor Search
- Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges, and Applications
- Low-level Image Processing with Diffusion Models
1. Generative Face Video Coding for Video Conferencing: Introduction, Performance and Challenges
Speaker:
Prof. Anthony Trioux, Xidian University, China
Dr. Giuseppe Valenzise, CNRS, Univ. Paris-Saclay, France
Prof. Shiqi Wang, City University of Hong Kong, China
Prof. Goluck Konuko, CNRS, Univ. Paris-Saclay, France
Prof. Fuzheng Yang, Xidian University, China
Abstract:
Video conferencing applications constitute a important portion of Internet video traffic, which has significantly increased in the past few years with the global pandemic. Current video conferencing systems relies on conventional advanced video compression standards such as H.264, HEVC or VVC. However, despite over three decades of refinement and optimization, these codecs still struggle to deliver satisfactory performance at extremely low bitrates. In scenarios where bandwidth is severely constrained, such as in congested networks or areas with weak radio coverage, the resulting video quality becomes unacceptable (loss of facial details), degrading the video conferencing experience significantly.
Generative Face Video Coding (GFVC) architectures, pioneered by recent advances in deep learning, have recently demonstrated a high potential to address the above issues. Such architectures process facial video data efficiently by employing generative adversarial networks (GANs) to represent and reconstruct facial video content in a compact form. Such process allows to drastically reducing bandwidth requirements while enhancing the visual quality of video conferencing applications, ultimately improving the user experience in video conferencing applications. This tutorial will give a complete overview of GFVC schemes, from recent research advances in the literature as well as current standardization activities.
2. Theory and Applications of Graph-based Nearest Neighbor Search
Speaker:
Prof. Yusuke Matsui, The University of Tokyo, Japan
Abstract:
Neural search, a technique for efficiently searching for similar items in deep embedding space, is the most fundamental technique for handling large multimodal collections. With the advent of powerful technologies such as foundation models, efficient neural search is becoming increasingly important. For example, multimodal encoders such as CLIP allow us to convert various problems into simple embedding-and-search. Despite the above attention, it is not obvious how to design a search algorithm for given data. We outline the theory and applications of graph-based nearest neighbor search methods. Graph-based methods are the current de facto standard for in-memory (million-scale) search, but they are difficult to understand because of their complex structure with many heuristics. We will explain its basic mathematical concepts, summarize recent improvements, and providepractical guidelines for choosing an algorithm
3. Film Grain Coding for Versatile Video Coding Systems: Techniques, Challenges, and Applications
Speaker:
Dr. Vignesh V Menon, Fraunhofer HHI, Germany
Dr. Philippe de Lagrange, InterDigital, France
Abstract:
In recent years, the proliferation of high-definition video content across various platforms has led to an increased focus on optimizing video coding standards to achieve higher compression efficiency and improved visual quality. The Versatile Video Coding (VVC) standard, developed by the Joint Video Experts Team (JVET) of the International Telecommunication Union (ITU) and the Moving Picture Experts Group (MPEG), represents the latest advancement in video compression technology. VVC offers significant improvements over its predecessors, such as High Efficiency Video Coding (HEVC), regarding compression efficiency, flexibility, and support for emerging multimedia applications. A critical aspect of video coding is handling film grain, which refers to the random noise inherent in film-based video content. Film grain plays a crucial role in the visual aesthetic of many movies and television shows, contributing to the overall texture and atmosphere of the scenes. However, accurately preserving and reproducing film grain during encoding poses several challenges, even for new video coding standards. These challenges stem from the unique nature of film grain, its varying characteristics across different films and scenes, and the need to balance preservation with compression efficiency.
As the VVC standard approaches widespread adoption, film grain handling within its open-source implementations has garnered significant attention from researchers and industry practitioners in video coding. Critical components like the Film Grain Analysis (FGA) module in the VVC encoders and the Film Grain Synthesis (FGS) module in VVC decoders play pivotal roles in analyzing and synthesizing film grain within encoded video sequences. Given the imminent adoption of VVC, understanding the methodologies and techniques employed in these modules becomes crucial for optimizing film grain handling in VVC implementations and elevating the visual quality of encoded video content. However, addressing film grain in VVC poses unique challenges. The complex nature of film grain patterns, varying film stocks, and diverse film grain intensities present formidable obstacles in accurately analyzing and synthesizing grain within encoded videos. Moreover, reconciling the fidelity of film grain reproduction with the efficiency demands of modern video compression adds another layer of complexity. Thus, while integrating film grain handling in VVC promises enhanced visual fidelity, navigating these challenges requires a nuanced understanding of film grain characteristics and video coding principles. This tutorial aims to provide participants with comprehensive insights into the nuances of film grain coding for video systems, spanning fundamental concepts, advanced techniques, challenges, and practical applications, to empower them in effectively addressing these complexities.
4. Low-level Image Processing with Diffusion Models
Speaker:
Dr. Xin Li, University of Science and Technology of China, China
Prof. Zhibo Chen, University of Science and Technology of China, China
Abstract:
In this tutorial, we will introduce the recent progress of low-level image processing with diffusion models, which targets on utilizing the new generation paradigm, i.e., diffusion model, to tackle several pivotal challenges, e.g., perception quality, information compensation, and multi-modality interaction in low-level image processing. It has drawn increasing attention and made tremendous development in recent years because of its wide range of applications in image/video restoration, enhancement, ultra low-bitrate compression, etc. We therefore conduct this tutorial of the definition, analysis, methodology, and related applications of diffusion models in low-level image processing, including image restoration and compression to clarify the main progress the community has made. In addition, recent results on diffusion model-based low-level image processing have revealed that this new processing paradigm can potentially improve the perceptual quality even with extremely low-quality images/videos, and further reduce the cost for image/video compression.
Specifically, we first briefly introduce the definition of diffusion models and summarize their recent progress on applications in various fields. Second, we shed light on the advances of diffusion model-based image restoration from three distinct directions: (i) supervised diffusion model-based image restoration, (ii) zero-shot diffusion modelbased image restoration and (iii) the diffusion model-based image restoration under more practical and challenging scenarios, i.e., blind/real-world degradation. Third, we provide the symatical review for the advancement of diffusion models in image compression by two representative tasks: (i) diffusion models for ultra low-bitrate image compression and (ii) diffusion models for multi-modal image compression. The above three parts can provide the comprehensive summary for recent advancement in diffusion model-based low-level image processing. Last but not least, we will discuss the challenges and potential directions of low-level image processing with diffusion models from the perspectives: sampling efficiency, model compression, distortion invariant learning, and framework design. In summary, our tutorial will cover latest works and the progress in community, which will help the audiences with different backgrounds better understand the recent progresses in this emerging research area.