Research on the Visual Based Video Coding Technology

Tutor: ZhaoDeBin
School: Harbin Institute of Technology
Course: Computer Science and Technology
Keywords: HVS,foveation,content adaptively,motion attention,multiple fixations
CLC: TN919.81
Type: Master's thesis
Year:  2008
With the coming of the information age, human being is entering into a brand-new network-multimedia age. As one of the most active fields of computer research, the multimedia encode technology is developing with the request of the application. Recent years, there are great progressing in image and video compressing. However, the ultimate recipient of image and video information is human beings. Today¡¯s compress method just based on the statistical redundancy among image pixels, while the perceptual redundancy information is totally neglected. Essentially, compression schemes and vision systems face a similar problem, that is, how to represent visual object in an efficient ways. It is possible to apply certain vision technologies in compression system toward perceptual fidelity rather than pixel-wise fidelity. So, this paper does some researches on the characteristic of human visual system and proposes a video compression scheme based on human eye¡¯s characteristic, such as luminance adaptive, spatial and temporal masking etc.In video compression, human receives the actual signal is image after decompression. So, human are very concerned about the quality assessment of reconstruct image. On the basis of some representative video quality metrics, this paper analyzes and realizes the metric of SSIM which based on the distortion of structural similarity. At last, we use the SSIM value to measure the performance of the proposed video coding method.As we know, the distribution of cone receptors and ganglion cells are highly non-uniform on the retina. The densities of them are highest at fovea and drops very fast with increasing of eccentricity. As a result, when a human observer gazes at a point of image or video in a real-word, the image or video with variable resolution is captured by human visual systems. Combined with this characteristic of human eyes, this paper proposes a video compress scheme based on the content adaptively foveation model. Firstly, we set the fixation in the center of current image or video signal, and then the resolution of different regions is adjusted with the content of the signal. Experiment results indicated that this method can get high compression efficient. When Observers in the observation of the video signal, their concerns are constantly changing, and different observers often concerns different regions. So, this paper proposes a video compression method with multiple fixations, which are getting from the special motion attention model. After get the motion vector filed of the current frame, we us the motion intensity and the coherence of the motion vector to measure the attention regions, which we called the value is attention map value. Then, the multiple fixations for the foveation video coding are obtained from the attention map value. Experiment results indicates that this method can get the similarly results with the method whose multiple fixations are demarcated manually.
