VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN MINH HOA MOTION ANALYSIS FROM ENCODED VIDEO BITSTREAM MASTER’S THESIS HA NOI – 2018 z VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN MINH HOA MOTION ANALYSIS FROM ENCODED VIDEO BITSTREAM Major: Computer Science MASTER’S THESIS Supervisor: Dr. Do Van Nguyen Co-Supervisor: Dr. Tran Quoc Long HA NOI - 2018 z i AUTHORSHIP “I hereby declare that the work contained in this thesis is of my own and I have not submitted this thesis at any other institution in order to obtain a degree. To the best of my knowledge and belief, the thesis contains no materials previously published or written by another person other than those listed in the bibliography and identified as references.” Signature: ……………………………………………… z ii SUPERVISOR’S APPROVAL “I hereby approve that the thesis in its current form is ready for committee examination as a requirement for the Master of Computer Science degree at the University of Engineering and Technology.” Signature: ……………………………………………… Signature: ……………………………………………… z iii ACKNOWLEDGMENTS First of all, I would like to express special gratitude to my supervisors, Dr.
Do Van Nguyen and Dr. Tran Quoc Long, for their enthusiasm for instructions, the technical explanation as well as advices during this project. I also want to give sincere thanks to Assoc. Ha Le Thanh, Assoc.
Nguyen Thi Thuy for the instructions as well as the background knowledge for this thesis. And I would like to also thank my teachers, my friends in Human Machine Interaction Lab for their support. Thank my friends, my colleagues in the project "Nghiên Cứu Công Nghệ Tóm Tắt Video", and project “Multimedia application tools for intangible cultural heritage conservation and promotion”, project number ĐTDL.CN-34/16 for their working and support. Last but not least, I want to thank my family and all of my friends for their motivation and support as well.
They stand by and inspire me whenever I face the tough time. z 1 TABLE OF CONTENTS AUTHORSHIP. iii TABLE OF CONTENTS. 3 List of Figures.
4 List of Tables. 9 Moving object detection in the pixel domain. 9 Moving object detection in the compressed domain. Motion vector approaches.
Size of Macroblock approaches. 15 Video compression standard h264. Process video bitstream. Macroblock-based Segmentation.
Object-based Segmentation. 30 The moving object detection application. The process of application. The motion information.
Synthesizing movement information. Storing Movement Information. 45 List of of author’s publications related to thesis. 47 z 3 ABBREVIATIONS MB Macroblock MV Motion vector NALU Network Abstraction Layer Unit RBSP Raw Byte Sequence Payload SODB String Of Data Bits z 4 List of Figures Figure 1.
The process of moving object detection with data in the pixel domain. The process of moving object detection with data in the compressed domain. The structure of a H264 file. The motion vector of a Macroblock.
The process of moving object detection method. (a) An outdoor and in-door frames (b) The "size-map" of frames, (c) The "motion-map" of frames. Example about the “consistent” of motion vector. The implementation process of the approach.
Data struct to storage motion information. Example frames of test videos. Example frames and their ground truth. An example frame of Pedestrians (a) and ground truth image (b).
40 z 5 List of Tables Table 2. The information of test videos. The information of test sequences in group 1. The performance of two approachs with Pedestrians, PETS2006, Highway, and Office.
The experimental result of Poppe’s approach on 2nd group. The experimental result of proposed method on 2nd group. 43 z 6 INTRODUCTION Today, video content is extensively used in the areas of life such as indoor monitoring, traffic monitoring, etc. The number of videos sharing over the Internet at any given time is also extremely large.
According to statistics, hundreds of hours of video are uploaded to Youtube every minute [1]. Not only that, the general trend today is the surveillance cameras installed in homes for surveillance and sercurity purposes. These cameras will normally operate and store the surveillance videos automatically. Only when there are some special situations, or some special events occur, humans will use the video data to revisit.
The problem is that in a short amount of time, how can such a large video volume be evaluated? For example, when there is a burglary, an intrusion occurs, we can not spend hours to check each video previously stored. Then, a tool that lets you determine the moment when an object is moving in a long video is essential to reducing the time and effort of searching. Normally, in order to reduce the size of videos for transmission or storing, a video compression procedure is performed at surveillance cameras. After that, the compressed information in form of bit stream is stored, or transmitted to a server for analysis.
The video analysis process needs a lot of features to describe different aspects of vision. Typically, these features are extracted from the pixel values of each video frame by fully decompressing bitstream. The decompression procedure requires high computation capacity device to perform. However, with the trend of "Internet of Things", there are many low processing capacity devices which are not capable for performing this full video decompression at high speed.
So, it is difficult to perform an approach that requires a lot of computing power in real time. Another way to extract the feature from the video is using the data on the compressed video. These data can be: transform coefficients, motion vectors, quantization steps, quantization parameters, etc. From the above data, through the process and analysis, we can handle some important tasks in the computer vision include moving objects detection, human actions detection, face recognition, motion objects tracking.
This thesis proposes a new method to determine moving object by exploring and applying some motion estimation techniques in the video compression domain. After that, the method will be used to build an application that supports movement searching in the surveillance videos in the families. The compression format of z 7 the videos in the thesis is the H264 compression standard (MPEG-4 part10), a popular video compression standard today. Aims The goal of the thesis is to propose a method for determining moving objects in the compressed domain of a video.
Then, I try to build an application using the method for support searching the moments which have moving objects in the video. Object and Scope of the study Within the framework of the thesis, I study the algorithms related to determining moving objects in video, especially the algorithms that determine moving objects in the compressed domain. The video compression standard is used in the thesis is H264/AVC. The theory of video compression and computer vision are taken from scientific articles related to the video analysis problem on the compression domain, determine the motion form on the compression domain of the video.
The videos for test and experiment are obtained from the surveillance cameras both indoor and outdoor. Method and procedures - Research on motion analysis and evaluation systems on existing compressed video, scientific articles related to the analysis and evaluation of motion on compressed video. - Experimental research: Conduct experiential settings for each theoretical part such as extracting video data, compiling data, and evaluating motion based on the obtained data. - Experimental evaluation: Each experiment will be conducted independently on each module and then integrated and deployed.
Contributions The thesis proposes a new moving object detection method in surveillance video encoded with H264 compression standard using the motion vector and size of macroblock. z 8 Thesis structure Apart from the introduction, the conclution and the references, this thesis is organized into 3 chapters with the following main contents: Chapter 1 is literature review. This chapter will show the related work of the thesis include the moving object detection methods in the pixel domain and the moving object detection methods in the compressed domain. Chapter 2 mentiones the basic knowledge about video compression standard H264 such as H264 file structure, macroblocks, motion vectors and describes the detail of moving object detection method including processing video bitstreams, macroblock-based segmentation phase, object-based segmentation phase, and object refinement phase.
Chapter 3 shows the results of method including an application using proposed method and experimental results. LITERATURE REVIEW Today, surveillance cameras are used extensively in the world. The volume of video surveillance has also grown tremendously. Some problems that are often encountered with video surveillance include event searching, motion tracking, abnormal behavior detection, etc.
In order to handle these tasks, it is necessary to have a method that can determine which the moments in each videos exist movements. Usually, the video is compressed for storage and transmission. The previous moving object detection method usually use the data from the pixel images such as color value, edges, etc. To get the images that can be displayed, or processed, the system must decode video fully.
This consumes a large number of computing resources, time and memory of the device. I suggest a method that can quickly determine the moving objects in high resolution videos. The data used in the method will be taken from the compressed video domain including information about the motion vector and the size of the macroblock (in bit) after encoding. The method reduces the processing time of the method considerably compared to methods implemented with data on the pixel domain.
The problem of motion detection in a video has long been studied. This is the first step in a series of computer vision problems such as object tracking, object detection, abnormal movement detection, etc. There are usually two approaches to address this problem: using fully decoded video data (pixel domain data) or using live data from an undecoded video (compressed domain data). The following section will outline the studies based on these two approaches.
Moving object detection in the pixel domain Typically, to reduce the size of the video for transmission, a video encoding process is performed inside the surveillance camera and the compressed information is transmitted as a bit stream to a server for video analysis. Common video compression standards used today including mp4, H264, H265. To be viewable, these compressed videos need to be decoded to image frames. We call these image frames are the pixel domain and the data obtained from these image frames are the data in the pixel domain.1 describes the process of moving object detection methods in the pixel domain.
The data in the pixel domain include the color values of the pixels, the number of color channels of each pixel, the edges, etc. The process of moving object detection with data in the pixel domain To determine moving objects in the pixel domain, background subtraction algorithms are commonly used. There are many research results that have been introduced long ago. These methods usually use data as the relationship between frames in a time series.
Background subtraction in [2] is defined as: “Background subtraction is a widely used approach for detecting moving objects in videos from static cameras. The rationale in the approach is that of detecting the moving objects from the difference between the current frame and a reference frame, often called The “background image”, or “background model”. As a basic, the background image must be a representation of the scene with no moving objects and must be kept regularly updated so as to adapt to the varying luminarice conditions and geometry settings. Results of the researchs may include the methods use Gaussian average such as the method of Wren et al.
[3], the method of Koller et al. [4]; the methods use Temporal median filter such as the method of Lo and Velasti [5], the method of Cucchiara et al.