7.2  Joint depth/texture bit allocation


In this section, we first present a joint bit-allocation analysis of depth and texture and afterwards, we provide an experimental analysis of the two-dimensional R-D model to enable a fast search algorithm for estimating the optimal quantization parameters.

7.2.1  Formulation of the joint bit-allocation problem


Let us consider the problem of jointly coding a texture and depth image at a maximum rate Rmax with minimum rendering distortion Drender. The rate Rmax and distortion Drender functions can be defined as follows. First, the maximum rate value Rmax is decomposed into the sum of the individual rates required for texture and depth coding. Because the texture and depth images are coded with two different quantizer settings (denoted qt and qd, respectively), the texture and depth rate functions can be written as Rt(qt) and Rd(qd), respectively. The joint rate function can therefore be specified as
R    (q,q ) = R (q) + R (q ).
  max  t d     t  t    d  d
(7.1)

Second, the rendering distortion function Drender depends on the image rendering algorithm. The rendering algorithm relies on the quality of the input texture and depth images and therefore on the quantization parameters qt and qd. Consequently, as there is one rendering quality, we define a joint rendering distortion as Drender(qt,qd).

The goal of the joint bit-allocation optimization is to determine the optimal quantization parameters (qtopt,qdopt) for coding the depth and texture images, such that the rendering distortion is minimized. The optimization problem can now be formulated as finding the minimum of the rendering distortion, hence

  opt  opt
(qt  ,qd ) = argmin Drender(qt,qd),
            qd,qt∈Q
(7.2)

under the constraint that the joint bit rate is bounded to Rmax, so that

Rt (qtopt)+ Rd (qodpt) ≤ Rmax,
(7.3)

where Q denotes the set of all possible quantizer settings. Without prior assumption, the solution to Equation (7.2) involves an exhaustive search over Q, in order to find the quantization setting with minimum distortion. Fortunately, a more efficient search can be performed by exploiting special properties of the R-D function. For example, assuming a smooth monotonic R-D surface model, hierarchical optimization techniques can be employed to find the best setting. Therefore, prior to investigating fast search algorithms, we provide a performance-point analysis of the R-D function to validate the smoothness of the surface.

7.2.2  R-D surface analysis


To analyze the R-D function, we construct a surface using an input data set, composed of multi-view images and their corresponding depth images. The rendering algorithm is based on the relief texture mapping (see Chapter 4). We generate the R-D surface by measuring the rendering distortion for all quantizers (qt,qd), defined within a search range of qmin ≤qt,qd ≤qmax. In total, k = qmax -qmin + 1 compression iterations of the depth and texture images are carried out, which yields k ×k R-D performance points. In our specific case, we employ an H.264/MPEG-4 AVC encoder to compress the reference texture and depth images. However, since the proposed joint bit-allocation method is generic, any depth and texture encoder can be employed, as long as they have a controllable quantizer.

To measure the rendering distortion, one solution is to warp a coded reference image using the corresponding depth image. The rendering distortion is evaluated by calculating the Mean Squared Error (MSE) between the rendered image and the corresponding image captured at the same location and orientation (see Figure 7.1).


PIC
Figure 7.1 The rendering distortion is obtained by rendering a synthetic image at the position of a neighboring camera. The rendering distortion is then evaluated by calculating the MSE between the original captured image and the rendered view.


Therefore, considering an N-view data set and a selected quantizer set (qt,qd), N -r distortion measures can be obtained (excluding the r reference images). To obtain a single rendering distortion measurement, the N -r measures are then averaged. The pseudo-code of the R-D surface construction algorithm is summarized in Algorithm 5.



Algorithm 5 R-D surface construction algorithm
Require:  A set of multiple texture and depth image pairs.   Initialize a 2D array RDSurface[.][.].
  for (qt = qmin;qt <= qmax;qt + +) do
  Encode the reference texture image at QP = qt.
  for (qd = qmin;qd <= qmax;qd + +) do
  Encode the reference depth image at QP = qd.
  for each non-reference view V i do
  Render an image at the position and orientation of the view V i.
  Calculate MSE mi between captured and rendered image.

  end for
  Żm=Average MSE mi;
  RDSurface[qt][qd]=Żm;

  end for

  end for

As a result, Figure 7.2 shows the R-D surfaces of the two multi-view sequences “Ballet” and “Breakdancers”. To generate the presented curves, the first depth and texture of the reference views are encoded with an H.264/MPEG-4 AVC encoder in intra-mode.


PIC
(a)
PIC
(b)

Figure 7.2 (a) R-D surface for the depth and texture of the “Breakdancers” sequence. (b) R-D surface for the depth and texture of the “Ballet” sequence. Note the difference in scale of the rendering quality.

Considering Figure 7.2, it is readily observed that both R-D surfaces show smooth monotonic properties. Up till now, we have only established an empirical validation of monotonic nature of the R-D surface. Assuming that the rendering function is indeed monotonic, a fast quantizer-setting search algorithm can be employed.