VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY UNIVERSITY OF INFORMATION TECHNOLOGY FACULTY OF COMPUTER SCIENCE BACHELOR THESIS SPATIO-TEMPORAL RE-RENDERING FOR FACIAL VIDEO RESTORATION Bachelor of Computer Science (Honors degree) NGO HUU MANH KHANH- 19520125 NGO QUANG VINH - 19520354 Supervised by Dr. Nguyen Vinh Tiep TP. HO CHi MINH, 2023 VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY UNIVERSITY OF INFORMATION TECHNOLOGY FACULTY OF COMPUTER SCIENCE BACHELOR THESIS SPATIO-TEMPORAL RE-RENDERING FOR FACIAL VIDEO RESTORATION Bachelor of Computer Science (Honors degree) NGO HUU MANH KHANH- 19520125 NGO QUANG VINH - 19520354 Supervised by Dr. Nguyen Vinh Tiep TP.
HO CHi MINH, 2023 DANH SÁCH HOI DONG BAO VỆ KHÓA LUẬN Hội đồng chấm khóa luận tốt nghiệp, thành lập theo Quyết định số. của Hiệu trưởng Trường Đại học Công nghệ Thông tin. Acknowledgements The successful completion of this dissertation is the result of the invaluable support and assistance provided by many individuals. We are deeply grateful for their insightful feedback.
First of all, I would like to express gratitude to my supervisor, Dr. Nguyen Vinh Tiep, for his dedicated direction, enthusiastic guidance, and invaluable instruction throughout this research. His valuable advice and support were instrumental in helping us navigate the research process and successfully complete this thesis. We would like to express our sincere thanks to the Dean of the Faculty and all the teachers in the Faculty of Computer Science, University of Information Technology, for their support and for helping us prepare enough knowledge to complete this thesis.
We are also grateful to Multimedia Laboratory (MMLab-UIT) for providing us with a conducive research environment and state-of-the-art equipment for this research. Furthermore, we would like to extend our appreciation to the researchers of the MMLab for their valuable feedback and critical questions that greatly contributed to our research. It helps us identify and correct mistakes, improve the quality of this thesis Abstract Facial old films are a great source of historical value, providing us a vivid imagination of the significant figures in the past. However, they were captured with old camera technology in the past, old films were low-quality and exhibited visual artifacts like pepper noise and stripes.
Besides, old films can be damaged due to poor keeping environment. As a result, they are difficult or impossible to watch. There is a demand to restore and preserve these old films so future generations can enjoy them. Not limited to restoring old films, facial restoration can be used for security purposes.
More specifically, surveillance cameras are installed in many public places to prevent crime, but their records are often low-quality due to camera resolution and poor lighting, making it difficult to identify people. Facial video restoration is a solution to this problem, it upgrades the quality of the face in the video and makes it easier to identify crime in the video. Although the similar problem, facial image restoration, has been researched for a long time, the work on facial video restoration is still less explored. The current facial image restoration model has impressive performance, we can directly use them for video restoration by restoring each frame individually.
Nonetheless, this approach struggles with flickering problems since these models are designed for image restoration and do not take into account temporal information. In this thesis, we propose Spatio-temporal Re-rendering for Antique Facial Video Restoration (STERR-GAN), a facial video restoration model that employs both temporal and spatial information for restoring, the experiment shows that our model can address the flickering problem and yield a better result. In addition, to the best of our knowledge, the datasets for facial image restoration or video restoration are available, but the dataset for the facial image restoration domain is still unavailable. As such, we introduce the VAR dataset (Video dataset for Antique Restoration), a new video restoration dataset for facial domain.
I expect that this dataset will become a valuable resource for measuring the performance of future models and advancing research in this study area. Table of contents List of figures vii List of tables ix 1 Introduction =Gmœ— 11 Overview. Pf ee es Ans 13 Objectives. fw ee ee 1.1 Recurrent Neural Networks (RNN) .2 Bidirectional Recurrent Neural Networks .1 Handcrafted Features for Optical Flow Estimation 23 2.2 RAFT: Recurrent All-Pairs Field Transforms for Optical Flow.
ca vi Table of contents 3. ee ee 4 Method 4.2 Spatio-temporal Re-rendering for Antique Facial Video Restoration 4. ee ee 55 5 Experiment 59 5.1 Peak Signal-to-Noise Aatio(PSNR).2 Structural Similarity Index Measure (SSIM).4 Frechet Video Distance(FVD). eee eee eee 69 6.2 Future Work 70 References 71 List of figures 11 Example of the Facial VideoRestoration and some old films deterioration.
3 21 Story of counterfeit money. 02 ee ee ee eee 8 2.2 Approximate the data distribution of GAN 2.3 Backpropagation in Generator traning.4 Backpropagation in Discriminator traning.5 The progress in generating face images using GANs model .6 The architecture of StyleGAN generator.7 Illustrative example with two factors of variation .8 Example of water droplet-like artifacts in StyleGAN images .9 The architecture of StyleGAN2 16 2.10 Example of "phase" artifacts 2.11 Some alternative network architectures of StyleGAN2.12 Illustratiuon of Recurrent NeuralNetworks.13 Illustration of Bidirectional Recurrent Neural Networks .14 The architectureof RAFT.15 Example optical flow esimation.1 Illustration of GFP-GAN framework .2 Overview of the DeepRemaster. 35 Illustration of the source-reference attention layer.4 Illustration of framework proposed by Wanetal.5 Overview of Wan etal.1 The process of collection Video dataset for Antique Restoration (VAR) .2 Some samples from VAR 1.3 Visualization of STERR-GAN famework_. 65 Viii List of figures 5.
ee ee 68 List of tables 5.1 Quantitative result of STERR-GAN, GFP-GAN and DeepRemaster 5.2 Ablation study of STERR-GAN Chapter 1 Introduction 1. Practical Context Back to the late 19th century, when the motion picture was first introduced to mankind. From that time, a surprising amount of films were recorded and released. However, due to the technology at that time, films were low-quality and exhibited visual artifacts like pepper noise and stripes.
In addition, old films suffered from degradation due to poor keeping environment. With all of these factors, the significant historical value of old videos can be lost. Despite the fact that film restoration techniques have been created to bring these antique films back to life, the process is laborious. Nowadays, video restoration is typically conducted digitally, with artists manually retouching each frame to remove blemishes, fix flickering, and perform colorization.
However, this process is extremely time-consuming and expensive, as it requires examining and repairing every single frame of the old film. As a result, there is a desire for an algorithm that can automate these tedious tasks, allowing old films to be restored and given a more modern appearance at a lower cost. Old film restoration, or generally Video Restoration, has many applications in real-life. Preserving historical video footage Preserving historical video footage is an essential application of video restoration tech- nology.
Historical video footage refers to videos that capture important events, people, or cultural artifacts from the past. These recordings can be a valuable source of information and cultural heritage, and it is essential to preserve them for future generations. However, video recordings are often subject to degradation over time due to factors such as wear and tear, exposure to heat and moisture, and the passage of time. This can make it difficult to view or use these recordings, as they may be blurry, distorted, or otherwise of 2 Introduction poor quality.
In addition, many historical video recordings are stored in formats that are no longer widely used, such as VHS tapes or film reels, making it difficult to access or view the footage. Video restoration techniques can be used to preserve and restore historical video footage, improving the quality of the video and making it possible to view and study these recordings in greater detail. This can involve various techniques, such as noise reduction, color correction, and image enhancement, to view. By using video restoration techniques to improve the quality of these recordings, it is possible to preserve and share these essential pieces of history for future generations.
Enhancing the clarity of surveillance footage Surveillance footage is typically captured by cameras that are placed in strategic locations to monitor and record activity in a particular area. This footage is often used for a variety of purposes, such as security, crime prevention, and investigation. However, surveillance footage can often be of low quality due to factors such as poor lighting, camera movement, and noise. This can make it difficult to identify people and objects in the footage, which can make it less useful for its intended purpose.
Video restoration techniques can be used to improve the clarity of surveillance footage by applying a variety of techniques such as noise reduction, color correction, and image enhancement. For example, some video restoration model is proposed to remove noise or blur from the footage, making it easier to see details such as facial features or license plate numbers. These techniques can help to improve the effectiveness of surveillance footage by making it easier to identify people and objects in the video, which can be useful for security, crime prevention, and investigation purposes 1.2 Problem Definition Facial Video Restoration is a subfield of video restoration that aims to restore high-quality faces from low-quality counterparts with various deterioration, such as low-resolution, noise, blur, compression artifacts, etc.1 illustrates an example of facial video restoration. ¢ Input: a sequence of old films frame, and they contain a complex mixture of degradation such as film grain noise (blue box) or scratches (red arrow) ¢ Output: a corresponding color high-quality videos.1 Overview 3 (a) (b) Input Output Fig.1 Example of the Facial VideoRestoration.
The first row shows various frames from the input video, the second row shows the restored frame, where T is the frame index in videos. The old movies suffer from a plethora of deterioration issues, such as scratches (a) and film grain noise (b) which make them challenging to restore to their original quality 1.3 Challenges Besides common challenges of computer vision tasks, facial video restoration has its own difficulties ¢ Lack of dataset. The training dataset is one of the primary difficulties that we are facing in this work. A paired dataset is unavailable to our problem, and the previous work [51] use the synthesis dataset.
However, to the best of our knowledge, the dataset for facial video restoration is insufficient. ¢ Keeping facial detail. The face contains a lot of subtle details that are important for conveying emotions and expressions. It can be challenging to restore a video in a way that preserves these details while still improving the overall quality of the image.
Besides, The appearance of the face can be affected by complex lighting conditions, such as shadows, highlights, and reflections. This can make it challenging to correct color and exposure issues in the facial region 4 Introduction ¢ Flickering problem. The flickering problem is the unwanted changes in brightness or color in restored video sequences. It can be particularly noticeable in high-motion or low-light scenes and can be distracting and unpleasant for viewers.
¢ Requires high computational resources. Since we apply complex image processing techniques to a large number of frames in a video, these techniques can be computa- tionally intensive, especially when applied to high-resolution videos. ¢ Old films contain a complex mixture of degradation. Due to the poor keeping environment and old capture technique, antique videos often contain many distortions.
Therefore, comprehensively mitigating these issues in a single deep neural network is difficult. Motivation From our survey, there are many research about video and old film restoration such as Video Restoration [56, 37, 5] and Facial Image Restoration [27, 51]. On the oter hand, although facial video restoration has many practical applications in preserving old film, security, and crime prevention, the work on this topic is less explored. Therefore, we choose Facial Video Restoration as our research topic in this thesis.