1.5  Thesis outline: the multi-view video system


Besides the introduction and conclusion chapters (Chapter 1, Chapter 2 and Chapter 8), the core part of this thesis is divided into five chapters. Each individual chapter focuses on a sub-system of the proposed multi-view video system. Figure 1.5 shows the proposed system architecture, which is composed of a depth-estimation sub-system (Chapter 3), a multi-view texture and depth video coder/decoder (Chapter 5, 6 and 7), and a 3D-video rendering engine (Chapter 4). In the following, we outline each of these sub-systems.

PIC
Figure 1.5 Overview of the proposed multi-view video system that includes the acquisition of 3D video (depth estimation), the coding and decoding of the multiple texture and depth image sequences and the rendering sub-system.


1.5.0.0  A. 3D/multi-view video acquisition


Chapter 2 contains introductory material to specify the geometry of multiple views. We especially describe a method for calculating the internal and external parameters of the multiple cameras. This task is known as camera calibration. The principal result of the chapter is the specification of the projection matrix and its usage for depth estimation and image rendering.
Chapter 3 concentrates on the problem of acquiring a 3D geometric description of the video scene, i.e., depth estimation. At first, we focus on the problem of depth estimation using two views and present the basic geometric model that enables the triangulation of corresponding pixels across these two views. We review two calculation/optimization strategies for determining corresponding pixels: a local depth calculation and a one-dimensional optimization strategy. Second, we generalize the two-view geometric model for estimating the depth, using all multiple views simultaneously. Based on this geometric model, we propose a multi-view depth-estimation technique, based a one-dimensional optimization strategy that reduces the noise level in the estimated depth images and enforces consistent depth images across the views.

1.5.0.0  B. Multi-view depth image based rendering


Chapter 4 addresses the problem of multi-view image rendering. We commence with reviewing two different rendering techniques: a 3D image warping and a mesh-based rendering technique. We show that each of these two techniques suffers from, either low image rendering quality, or high computational complexity. To circumvent the aforementioned issues, we propose two image-based rendering algorithms: an alternative formulation of the relief texture algorithm and an inverse mapping rendering technique. Experimental comparisons with 3D image warping show an improvement in rendering quality of 3.8 dB for the relief texture mapping and 3.0 dB for the inverse mapping rendering technique.

1.5.0.0  C. Multi-view video coding


In the presented multi-view video system, the video coding blocks include the encoding and decoding of both the multi-view depth and texture video.
Chapter 5 addresses the problem of encoding the multi-view depth and texture video, using an extended version of a standard H.264/MPEG-4 AVC video encoder. The concept is based on exploiting the correlation between neighboring views. To this end, two view-prediction tools are used in parallel by the encoder: a block-based disparity-compensated prediction and a View Synthesis Prediction (VSP) scheme. Our encoder adaptively selects the most appropriate prediction scheme, using a rate-distortion criterion for an optimal prediction-mode selection. We present experimental results for several texture and depth multi-view sequences, yielding a quality improvement of up to 0.6 dB for the texture and 3.2 dB for the depth, when compared to solely performing H.264/MPEG-4 AVC disparity-compensated prediction. In this proposal, the use of the VSP scheme for multi-view depth encoding is original and particularly attractive because the prediction scheme does not rely on additional side information for prediction. To preserve random-access capabilities to any user-selected view, we employ a prediction structure that requires only two reference cameras.

Chapter 6 focuses on the compression of the depth signal. We present a novel depth image coding algorithm that concentrates on the special characteristics of depth images: smooth regions delineated by sharp edges. The algorithm models these smooth regions using piecewise-linear functions and sharp edges by a straight line. Since this approach is new and no alternative depth encoder was published at the time of this research, the proposed algorithm was compared with a JPEG-2000 encoder, which is the de-facto standard for digital cinema. A comparison with an encoder based on H.264/MPEG-4 AVC is also discussed. For typical bit rates (between 0.01 bit/pixel and 0.25 bit/pixel), experiments have revealed that the proposed encoder outperforms a JPEG-2000 encoder by 0.6-3.0 dB.

Chapter 7 discusses a novel joint depth/texture bit-allocation algorithm for the joint compression of texture and depth images. The described algorithm combines the depth and texture Rate-Distortion (R-D) curves to obtain a single R-D surface. This surface allows the optimization of the quantization parameters such that the rendering quality is maximized. Next, we discuss a hierarchical algorithm that enables a fast optimization by exploiting the smooth monotonic properties of the R-D surfaces. Experimental results show an estimated gain of up to 1 dB compared to a compression performed without joint bit-allocation optimization and using an encoder based on H.264/MPEG-4 AVC.