6.3 Bit-allocation strategy
This section provides details about the bit-allocation strategy that optimizes the coding in a rate-distortion sense. Lossy compression is a trade-off between rate and distortion. Considering our lossy encoder/decoder framework, our aim is to optimize the compression. This is realized by specifying an objective Rate-Distortion (R-D) constraint. There are three coding and modeling parameters that influence this trade-off.
- Quadtree decomposition. To accurately approximate a block that shows fine details, the quadtree decomposition recursively subdivides the block into smaller blocks. However, additional subdivisions lead to a higher bit rate. Therefore, an appropriate subdivision criterion is required, that prevents the creation of many small blocks.
- Selection of modeling functions. There are four different modeling functions _{j}, j {1,2,3,4}, to approximate a depth pixel block. Each modeling function has a different approximation accuracy, but also different complexity (thus rate). Therefore, an appropriate model has to be selected that balances both rate and model distortion.
- Quantization step size. The quantization process in the encoder is covered by a variable quantization accuracy of the model coefficients. Thus, each function coefficient is quantized such that the model approximates the depth with a predetermined accuracy. In our case, the involved quantization step size is adapted to the required R-D constraint.
Concluding, the primary problem is to adjust each of the above parameters such that the objective R-D constraint is satisfied. To optimize these parameters in an R-D sense, a possible approach is to define a cost function that combines both rate R_{i} and distortion D_{i} of the image i. Typically, for this problem, the Lagrangian cost function J(R_{i}) is used with
| (6.8) |
where R_{i} and D_{i} represent the rate and distortion of the image, respectively, and λ is a weighting factor that controls the rate-distortion trade-off. For example, for λ = 0, we obtain a signal reconstructed at the lowest distortion and highest bit rate achievable. In the opposite case, for λ = ∞, the signal is compressed at the lowest rate and highest distortion. To achieve an optimal encoding, the Lagrangian cost function J(R_{i}) should be minimized. This cost function is minimized when the derivative is set to 0, which yields
| (6.9) |
Thus, the optimal encoding is obtained when compressing the image at an operation point that has a constant slope -λ at the convex hull of the R-D points (see Figure 6.5).
Additionally, it can be shown that for an optimal encoding, not only the image must be encoded at constant R-D slope -λ, but also for each independent sub-image [81]. As a consequence, the key principle followed in the discussed bit-allocation strategy is to adjust coding and modeling parameters (i.e., quadtree decomposition, model selection and quantizer step size), such that the image and also each quadtree block is encoded at a constant slope -λ.
However, optimizing all parameters in a single iteration would be highly complicated. For example, without a given quadtree segmentation, it is not possible to select an appropriate modeling function for each quadtree block. To come to a feasible solution for this optimization problem, we divided the optimization procedure in three steps. We first assume that an optimal quadtree segmentation and coefficient quantizer setting is provided. Using this information, the selection of modeling functions for each quadtree block can be performed. Then, we describe a method to optimize the quadtree decomposition and finally, we optimize the quantizer setting. The complete solution will be developed later.
6.3.1 Modeling function selection
In Section 6.2.2, it was explained that a quadtree block can be approximated by four different modeling functions _{1}, _{2}, _{3} and _{4}. The objective is not only to select the most appropriate modeling function, but also to incorporate this choice in the overall R-D constraint. The function mentioned in Equation (6.8) represents the global cost function for the image i. However, the objective is now to select for each block the modeling function _{j}, j {1,2,3,4}, that minimizes the global coding cost D_{i}(R_{i}) + λR_{i}.
Since the rate and distortion are additive functions over all blocks, an independent optimization can be performed for each block. This statement is valid because the modeling is performed for each block individually without a relationship to neighboring blocks. Therefore, for each block, the algorithm selects the modeling function that minimizes the Lagrangian R-D cost function according to
| (6.10) |
where R_{m}( _{j}) and D_{m}( _{j}) represent the rate and distortion resulting from using one modeling function _{j}. In the implementation, the applied distortion measure is the squared error and the rate for each modeling function is derived from the predetermined quantizer setting.
6.3.2 Quadtree decomposition
Let us now provide more details about optimizing the quadtree decomposition. The objective is to obtain an optimal quadtree decomposition of an image in the R-D sense. Because the quadtree segmentation is not known in advance, the algorithm first segments the image into a full quadtree decomposition up to the pixel level. From this full tree, the algorithm aims at deriving a sub-tree that optimally segments and approximates the image in the R-D sense. To obtain an optimal tree decomposition of the image, a well-known approach is to perform a so-called bottom-up tree-pruning technique [15]. The concept is based on parsing the initial full tree from bottom to top and recursively prune nodes (i.e., merge blocks) of the tree according to a decision criterion. From earlier work, we have decided to adopt the already specified Lagrangian cost function for this purpose.
The tree-pruning algorithm is illustrated by Figure 6.6 and can be described as follows.
Consider four children nodes denoted by N_{1}, N_{2}, N_{3} and N_{4}, that have a common parent node which is represented by N_{0}. For each node k, a Lagrangian coding cost (D_{Nk} + λR_{Nk}), k {0,1,2,3,4}, can be calculated. Using the Lagrangian cost function, the four children nodes should be pruned whenever the sum of the four coding cost functions is higher than the cost function of the parent node. More formally written, the children nodes are pruned whenever the following condition holds
| (6.11) |
When the children nodes are not pruned, the algorithm assigns the sum of the coding costs of the children nodes to the parent node. Subsequently, this tree-pruning technique is recursively performed in a bottom-up fashion. It has been proven [15] that such a bottom-up tree pruning leads to an optimally pruned tree, thus in our case to an R-D optimal quadtree decomposition of the image.
To estimate the bit rate at block level, we also include the side information required to code the node of the quadtree, the modeling function index and the bit rate resulting from using the modeling function. We address all three elements briefly. For the side information, we assume that the quadtree structure can be coded using one bit per node (one bit set to “1” or “0” to indicate that the node is subdivided or not). For the index, two bits per leaf are necessary to indicate which modeling function is used. The previous two aspects contribute to the bit rate. Hence, the rate R_{Nk} that is required to code one given node N_{k} can be written as
| (6.12) |
where R_{m}( ) is the bit rate needed to code the best modeling function (see Equation (6.10)). The distortion D_{Nk} for one node N_{k} corresponds to the distortion introduced by the selected modeling function .
6.3.3 Quantizer selection
A criterion that selects the optimal quantizer is now described.
Up till now, we have discussed a scheme that subdivides the image into a full quadtree, optimizes the decomposition of the quadtree and optimally selects the modeling function for each node of tree. These optimizations were carried out under a predetermined quantizer setting. Now, the problem is to select the optimal quantizer, denoted , that yields the desired R-D trade-off. We propose to select the quantizer out of a given set of possible scalar quantizers {q_{2},q_{3},q_{4},q_{5},q_{6},q_{7},q_{8}}, operating at 2, 3, 4, 5, 6, 7 and 8 bits per level, respectively. To optimally select the quantizer, we re-use the application of the Lagrangian cost function and select the quantizer that minimizes the Lagrangian coding cost of the image i according to
| (6.13) |
Here, R_{i}(q_{l}) and D_{i}(R_{i},q_{l}) correspond to the global rate R_{i} and distortion D_{i}(R_{i}), in which the parameter q_{l} is added to indicate the influence of the quantizer selection. To solve the optimization problem of Equation (6.13), the image is encoded using all possible quantizers {q_{2},q_{3},q_{4},q_{5},q_{6},q_{7},q_{8}} and the quantizer is selected that yields the lowest coding cost for J_{i}(R_{i}, ) = D_{i}(R_{i}, ) + λR_{i}( ).
6.3.4 Summary of the complete algorithm
Since we have now defined how to select the modeling function, perform quadtree decomposition and know how to choose the quantizer setting, a complete algorithm for the joint optimization can be described. The algorithm requires as an input the depth image and a weighting factor λ that controls the R-D trade-off. The selection of the weighting factor λ that yields the desired bit rate can be calculated using a simple bisection search [78]. The image is first recursively subdivided into a full quadtree decomposition. All nodes of the tree are then approximated by four modeling functions _{0}, _{1}, _{2}, _{3}. In a second step, the coefficients of the modeling functions are quantized using one scalar quantizer q_{l}. For each node of the full tree, an optimal modeling function can now be selected using Equation (6.10). Employing the tree-pruning technique described in Section 6.3.2, the full tree is then pruned in a bottom-up fashion. Then, the previous step of the algorithm is repeated for all quantizers q_{l} {q_{2},q_{3},q_{4},q_{5},q_{6},q_{7},q_{8}} and the quantizer that leads to the lowest global coding cost is selected (see Equation (6.13)). Subsequently, for each leaf of the tree, the quantized zero-order coefficients (α_{0},β_{0},γ_{0A},γ_{0B},θ_{0A},θ_{0B}) are saved and because they are uniformly distributed, they are fixed-length coded. However, the first-order coefficients (β_{1}, β_{2}, θ_{1A}, θ_{2A}, θ_{1B}, θ_{2B}) satisfy a Laplacian distribution. For this reason, we encode these coefficients using an adaptive arithmetic encoder. The pseudo-code of the complete algorithm is summarized in Algorithm 4.
Algorithm 4 Encode depth image - algorithm summary
Require: A depth image and a factor λ to control the R-D trade-off.
Step 1:
1. Recursively subdivide image into a full quadtree decomposition.
2. Approximate each node of the full tree by four modeling functions
,,, (see Table 6.1).
Initialize l ← 2, ←q_{2}. Step 2: 1. For each node of the full tree, quantize coefficients of the modeling functions with quantizer q_{l}, and select a best modeling function using the criterion of Equation (6.10). 2. Prune the tree in a bottom-up fashion. 3. If the global coding cost J_{i}(R_{i},q_{l}) ≤J_{i}(R_{i}, ), then update the best R-D quantizer, i.e., ←q_{l}. Step 3: l ←l+1, as long as l ≤ 8, go to Step 2. |