6.3  Bit-allocation strategy

This section provides details about the bit-allocation strategy that optimizes the coding in a rate-distortion sense. Lossy compression is a trade-off between rate and distortion. Considering our lossy encoder/decoder framework, our aim is to optimize the compression. This is realized by specifying an objective Rate-Distortion (R-D) constraint. There are three coding and modeling parameters that influence this trade-off.

Concluding, the primary problem is to adjust each of the above parameters such that the objective R-D constraint is satisfied. To optimize these parameters in an R-D sense, a possible approach is to define a cost function that combines both rate Ri and distortion Di of the image i. Typically, for this problem, the Lagrangian cost function J(Ri) is used with

J(Ri) = Di(Ri)+ λRi,

where Ri and Di represent the rate and distortion of the image, respectively, and λ is a weighting factor that controls the rate-distortion trade-off. For example, for λ = 0, we obtain a signal reconstructed at the lowest distortion and highest bit rate achievable. In the opposite case, for λ = ∞, the signal is compressed at the lowest rate and highest distortion. To achieve an optimal encoding, the Lagrangian cost function J(Ri) should be minimized. This cost function is minimized when the derivative is set to 0, which yields

λ = - ∂D-(Ri).

Thus, the optimal encoding is obtained when compressing the image at an operation point that has a constant slope -λ at the convex hull of the R-D points (see Figure 6.5).

Figure 6.5 Rate-distortion plane and convex hull of the R-D points.

Additionally, it can be shown that for an optimal encoding, not only the image must be encoded at constant R-D slope -λ, but also for each independent sub-image [81]. As a consequence, the key principle followed in the discussed bit-allocation strategy is to adjust coding and modeling parameters (i.e., quadtree decomposition, model selection and quantizer step size), such that the image and also each quadtree block is encoded at a constant slope -λ.

However, optimizing all parameters in a single iteration would be highly complicated. For example, without a given quadtree segmentation, it is not possible to select an appropriate modeling function for each quadtree block. To come to a feasible solution for this optimization problem, we divided the optimization procedure in three steps. We first assume that an optimal quadtree segmentation and coefficient quantizer setting is provided. Using this information, the selection of modeling functions for each quadtree block can be performed. Then, we describe a method to optimize the quadtree decomposition and finally, we optimize the quantizer setting. The complete solution will be developed later.

6.3.1  Modeling function selection

In Section 6.2.2, it was explained that a quadtree block can be approximated by four different modeling functions fˆ 1, ˆf 2, fˆ3 and fˆ 4. The objective is not only to select the most appropriate modeling function, but also to incorporate this choice in the overall R-D constraint. The function mentioned in Equation (6.8) represents the global cost function for the image i. However, the objective is now to select for each block the modeling function fˆ j, j ∈{1,2,3,4}, that minimizes the global coding cost Di(Ri) + λRi.

Since the rate and distortion are additive functions over all blocks, an independent optimization can be performed for each block. This statement is valid because the modeling is performed for each block individually without a relationship to neighboring blocks. Therefore, for each block, the algorithm selects the modeling function that minimizes the Lagrangian R-D cost function according to

˜                    ˆ         ˆ
f = ˆ arˆgmˆinˆ ˆ (Dm (fj)+  λRm (fj)),

where Rm(fˆ j) and Dm(ˆf j) represent the rate and distortion resulting from using one modeling function fˆ j. In the implementation, the applied distortion measure is the squared error and the rate for each modeling function is derived from the predetermined quantizer setting.

6.3.2  Quadtree decomposition

Let us now provide more details about optimizing the quadtree decomposition. The objective is to obtain an optimal quadtree decomposition of an image in the R-D sense. Because the quadtree segmentation is not known in advance, the algorithm first segments the image into a full quadtree decomposition up to the pixel level. From this full tree, the algorithm aims at deriving a sub-tree that optimally segments and approximates the image in the R-D sense. To obtain an optimal tree decomposition of the image, a well-known approach is to perform a so-called bottom-up tree-pruning technique [15]. The concept is based on parsing the initial full tree from bottom to top and recursively prune nodes (i.e., merge blocks) of the tree according to a decision criterion. From earlier work, we have decided to adopt the already specified Lagrangian cost function for this purpose.

The tree-pruning algorithm is illustrated by Figure 6.6 and can be described as follows.

Figure 6.6 Illustration of a bottom-up tree-pruning procedure.

Consider four children nodes denoted by N1, N2, N3 and N4, that have a common parent node which is represented by N0. For each node k, a Lagrangian coding cost (DNk + λRNk), k ∈{0,1,2,3,4}, can be calculated. Using the Lagrangian cost function, the four children nodes should be pruned whenever the sum of the four coding cost functions is higher than the cost function of the parent node. More formally written, the children nodes are pruned whenever the following condition holds

   (DNk + λRNk ) > (DN0 + λRN0 ).

When the children nodes are not pruned, the algorithm assigns the sum of the coding costs of the children nodes to the parent node. Subsequently, this tree-pruning technique is recursively performed in a bottom-up fashion. It has been proven [15] that such a bottom-up tree pruning leads to an optimally pruned tree, thus in our case to an R-D optimal quadtree decomposition of the image.

To estimate the bit rate at block level, we also include the side information required to code the node of the quadtree, the modeling function index and the bit rate resulting from using the modeling function. We address all three elements briefly. For the side information, we assume that the quadtree structure can be coded using one bit per node (one bit set to “1” or “0” to indicate that the node is subdivided or not). For the index, two bits per leaf are necessary to indicate which modeling function is used. The previous two aspects contribute to the bit rate. Hence, the rate RNk that is required to code one given node Nk can be written as

RNk =  1+ 2 + Rm (˜f),

where Rm(f˜ ) is the bit rate needed to code the best modeling function ˜f (see Equation (6.10)). The distortion DNk for one node Nk corresponds to the distortion introduced by the selected modeling function ˜f .

6.3.3  Quantizer selection

A criterion that selects the optimal quantizer is now described.

Up till now, we have discussed a scheme that subdivides the image into a full quadtree, optimizes the decomposition of the quadtree and optimally selects the modeling function for each node of tree. These optimizations were carried out under a predetermined quantizer setting. Now, the problem is to select the optimal quantizer, denoted ˜q , that yields the desired R-D trade-off. We propose to select the quantizer ˜q out of a given set of possible scalar quantizers {q2,q3,q4,q5,q6,q7,q8}, operating at 2, 3, 4, 5, 6, 7 and 8 bits per level, respectively. To optimally select the quantizer, we re-use the application of the Lagrangian cost function and select the quantizer ˜q that minimizes the Lagrangian coding cost of the image i according to

˜q =      argmin       Di(Ri,ql) + λRi(ql).

Here, Ri(ql) and Di(Ri,ql) correspond to the global rate Ri and distortion Di(Ri), in which the parameter ql is added to indicate the influence of the quantizer selection. To solve the optimization problem of Equation (6.13), the image is encoded using all possible quantizers {q2,q3,q4,q5,q6,q7,q8} and the quantizer ˜q is selected that yields the lowest coding cost for Ji(Ri,˜q ) = Di(Ri,˜q ) + λRi(˜q ).

6.3.4  Summary of the complete algorithm

Since we have now defined how to select the modeling function, perform quadtree decomposition and know how to choose the quantizer setting, a complete algorithm for the joint optimization can be described. The algorithm requires as an input the depth image and a weighting factor λ that controls the R-D trade-off. The selection of the weighting factor λ that yields the desired bit rate can be calculated using a simple bisection search [78]. The image is first recursively subdivided into a full quadtree decomposition. All nodes of the tree are then approximated by four modeling functions ˆf 0, fˆ 1, ˆf 2, ˆf 3. In a second step, the coefficients of the modeling functions are quantized using one scalar quantizer ql. For each node of the full tree, an optimal modeling function ˜
f can now be selected using Equation (6.10). Employing the tree-pruning technique described in Section 6.3.2, the full tree is then pruned in a bottom-up fashion. Then, the previous step of the algorithm is repeated for all quantizers ql ∈{q2,q3,q4,q5,q6,q7,q8} and the quantizer that leads to the lowest global coding cost is selected (see Equation (6.13)). Subsequently, for each leaf of the tree, the quantized zero-order coefficients (α000A0B0A0B) are saved and because they are uniformly distributed, they are fixed-length coded. However, the first-order coefficients (β1, β2, θ1A, θ2A, θ1B, θ2B) satisfy a Laplacian distribution. For this reason, we encode these coefficients using an adaptive arithmetic encoder. The pseudo-code of the complete algorithm is summarized in Algorithm 4.

Algorithm 4 Encode depth image - algorithm summary
Require:  A depth image and a factor λ to control the R-D trade-off.   Step 1: 1.  Recursively subdivide image into a full quadtree decomposition. 2.  Approximate each node of the full tree by four modeling functions ˆf
 3 (see Table 6.1).
  Initialize l ← 2, ˜q←q2.
  Step 2:
1.  For each node of the full tree, quantize coefficients of the modeling functions with quantizer ql, and select a best modeling function ˜f using the criterion of Equation (6.10).
2.  Prune the tree in a bottom-up fashion.
3.  If the global coding cost Ji(Ri,ql) ≤Ji(Ri,˜q ), then update the best R-D quantizer, i.e., ˜q←ql.
  Step 3: ll+1, as long as l ≤ 8, go to Step 2.