6.2  Geometric modeling of depth images

6.2.1  Overview of the depth image coding algorithm

In this section, we present a novel approach for depth image coding using the piecewise-linear functions mentioned in Section 6.1. The concept followed is to approximate the image content with modeling functions. In our framework, we use two classes of modeling functions: a class of piecewise-constant functions and a class of piecewise-linear functions. For example, flat surfaces that show smooth regions in the depth image can be approximated by a piecewise-constant function. Similarly, planar surfaces of the scene like the ground plane and walls, appear as regions of gradually changing grey levels in the depth image. Hence, such a planar region can be approximated by a single linear function. To identify the location of these surfaces in the image, we employ a quadtree decomposition (see Figure 6.2), which recursively divides the image into variable-size blocks, i.e., nodes of different size, according to the degree of decomposition.

Figure 6.2 Example of quadtree decomposition. Each block, i.e., nodes, of the quadtree is approximated by one modeling function.

In some cases, the depth image within one block can be approximated with one modeling function. If no suitable approximation can be determined for the block, it is subdivided into four smaller blocks. To prevent that too many small blocks are required along a discontinuity, we divide the block into two regions separated by a straight line. Each of these two regions is coded with an independent function. Consequently, the algorithm chooses between four modeling functions for each leaf as follows.

The selection of a particular modeling function is based on a rate-distortion criterion that is described in Section 6.3.1. This criterion trades-off the distortion and rate of the individual functions.

6.2.2  Modeling-function definitions

In this section, we define the four different modeling functions  ˆ
f j, for j ∈{1,2,3,4}, used to approximate each quadtree block.

To model the depth signal within each block, we focus on finding a compression scheme that can compactly represent smooth blocks as well as boundaries (contours) of objects. First, to approximate smooth blocks, we employ a simple constant function fˆ 1(x,y) = α0. Second, to approximate gradually changing object surfaces, we use a piecewise-linear function  ˆ
f 2, which defined by

f2(x,y) = (β0 + β1x + β2y), for (x,y) ∈ S.

Third, to properly approximate blocks containing sharp edges, we segment those blocks into two regions A and B divided by a straight line of arbitrary direction. Each region is subsequently approximated by one modeling function. To define the position and direction of the subdivision-line, we connect two pixels, P1 and P2, located at two different sides of the block. For coding the key parameters of this line, instead of saving the coordinates of both marker points, we use two indexes i1 and i2 addressing the location of P1 and P2. The index is defined such that all pixels directly surrounding the block are scanned in a clockwise fashion (see Figure 6.3).

Figure 6.3 An example edge line that divides a quadtree block into two regions A and B, using pixels P1 (index 3) and P2 (index 15) as marker points.

To determine if a pixel belongs to region A or B, we use a numerical test known as the orientation test in computational geometry applications.

The two resulting regions A and B can be approximated by either a wedgelet function, or a platelet function. A wedgelet function is composed of two piecewise-constant functions over two regions A and B, separated by a straight line, and can be defined as

ˆ           ˆf3A(x,y) = γ0A,  for (x, y) ∈ A,
f3(x,y) =   ˆf3B(x,y) = γ0B, for (x, y) ∈ B.

Similarly, a platelet function is composed of two planar surfaces covering two different regions and separated by a straight edge (waterfall), and is defined by

         {  ˆ
ˆf4(x,y) =   f4A(x,y) = θ0A + θ1Ax + θ2Ay, for (x,y) ∈ A,
            ˆf4B(x,y) = θ0B + θ1Bx + θ2By,  for (x,y) ∈ B.

Figure 6.4 depicts example patterns for each modeling function ˆf 1, fˆ 2, ˆf 3 and fˆ4.


Figure 6.4 (a)-(d) Examples of functions ˆf 1, ˆf 2, fˆ 3 and fˆ 4, respectively.

Let us now specify the computation of the function coefficients.

6.2.3  Estimation of model coefficients

The objective of the estimation step is to calculate the coefficients of the modeling functions subject to a constraint. We employ a constraint that minimizes the approximation error between the original depth signal in a block and the corresponding approximation. Specifically, the approximation error is the Mean Squared Error (MSE) over the block where the model applies to. Let us divide the estimation of the modeling function into two sections: one for the constant functions, and one for the linear functions.

A. Coefficient estimation for modeling the constant functions  ˆ
f 1 and  ˆ
f 2.

For  ˆ
f1(x,y) = α0, only one coefficient α0 has to be computed. Practically, the coefficient α0 that minimizes the MSE between f and ˆf 1 simply corresponds to the mean value of the original data block, which leads to the computation of α0 = n×1n--x,y∈Sf(x,y).

For fˆ 2, the function should approximate blocks that contain a gradient. In order to determine the three coefficients β0, β1 and β2 of the linear function fˆ2(x,y) = β0 + β1x + β2y, a least-squares optimization is used. This optimization minimizes the sum of squared differences between the depth image f(x,y) and the proposed linear model. Accordingly, coefficients β0, β1 and β2 are determined such that the error

                n  n
               ∑  ∑                          2
e2(β0,β1,β2) =       (β0 + β1x+ β2y - f(x,y))

is minimized, where n denotes the node size expressed in pixels. This error function e2012) is minimal when the gradient function crosses zero, hence ||∇e2|| = 0. When taking the partial derivatives with respect to β01 and β2 of this equation, we find a set of linear equations 1 specified by

(  t  u   v )  ( β1 )    (  ∑nx=1∑ny=1 xf(x,y) )
(  u  t   v )  ( β  )  = (  ∑n   ∑n    yf(x,y) ) ,
           2       2        ∑nx=1 ∑ny=1
   v  v  n       β0            x=1   y=1 f(x,y)


t = n2(n+1)(2n+1),  u = n2(n+1)2, and v = n2(n+1).
         6               4                2

Since the matrix at the left side of the equation system is constant, it is possible to pre-compute and store the inverse for each size of the square (quadtree node). This enables to compute the coefficients β012 with a simple matrix multiplication.

B. Coefficient estimation for modeling the linear functions ˆf 3 and fˆ 4.

For the wedgelet ˆf 3 and platelet fˆ 4 functions, we have to determine not only the model coefficients, but also the separation line. For each model, we aim at approximating the depth value f(x,y) with two functions in the supporting areas A and B. Each area is delineated by a subdividing line P1P2, as defined by points P1 and P2 (see Figure 6.3). The best set of coefficients for the wedgelet fˆ3 and platelet ˆf 4 functions are those coefficient values that minimize the approximation errors
          ˆ   2        ˆ  2                   ˆ   2        ˆ   2
e3 = ||f - f3A || + ||f - f3B|| and   e4 = ||f - f4A|| + ||f - f4B || ,

respectively. These minimizations cannot be solved directly for the following reason. Without knowing the subdivision boundary between the two regions, it is not possible to determine the model coefficients. On the other hand, it is not possible to identify the subdivision line as long as the region coefficients are unknown. To overcome this mutual dependence, we first identify the separation line. For this reason, the coefficient estimation is initialized by testing every possible line that divides the block into two areas. This step provides a candidate subdivision line P1cP2c and two candidate regions Ac and Bc. Subsequently, the wedgelet and platelet coefficients are computed over the candidate regions using the average pixel value and a least-squares minimization, respectively. At the end of the exhaustive search for each function fˆ 3 and fˆ 4, the best fitting model is selected. The major advantage of this technique is that it leads to the optimal set of coefficients that minimizes the least-squares error between the model and the image block.

Table 6.1 shows a summary of possible modeling functions of the previous subsections to approximate each block of the quadtree and their corresponding coefficient-estimation techniques.

Modeling function

Coefficient-estimation technique

fˆ 1(x,y) = α0

Average value of pixel (x,y) ∈ S.

fˆ 2(x,y) = β0 + β1x + β2y

Compute least-squares minimization.

fˆ 3(x,y) = {
  fˆ3A(x,y) = γ0A  (x,y) ∈ A
  fˆ3B(x,y) = γ0B  (x,y) ∈ B

Average pixel values over regions A and B, where A and B are obtained through an exhaustive search.

f 4(x,y) = {
  fˆ4A(x,y) = θ0A + θ1Ax + θ2Ay  (x,y) ∈ A
  fˆ4B(x,y) = θ0B + θ1Bx +  θ2By   (x,y) ∈ B

Compute least-squares minimization over regions A and B, where A and B are obtained through an exhaustive search.

Table 6.1 Summary of modeling functions and their corresponding coefficient-estimation techniques.

1To derive this equation, the following properties of series are needed: ∑ i=1ni = n(n+1)
  2 and ∑ i=1ni2 = n(n+1)(2n+1)
     6, where i takes the parameter x or y.