6.4  Experimental results


In this section, we first present rate-distortion curves of the proposed depth image encoder. Second, the quality of rendered views synthesized using coded depth images is evaluated.

6.4.1  Coding experiments


Since this depth coding algorithm is completely new at the design time and there is no alternative, the performance was compared a number may be compared to a number of alternative image encoders, e.g., the Bandelet encoder [7980], the Contourlet encoder [22] or the Ridgelet encoder [21]. However, each of these alternatives employs a wavelet-based encoder as a reference. Because of the high coding performance of standard encoders such as the H.264/MPEG-4 AVC and JPEG-2000, these coders will be used for comparison.

6.4.1.0  A. Comparison with JPEG-2000


For evaluating the performance of the coding algorithm, experiments were carried out using the “Teddy”, “Cones”, “Breakdancers” and “Ballet” depth images 2. The R-D performance of the proposed encoder has been compared to a wavelet-based JPEG-2000 encoder [3]. Let us now examine the resulting rate-distortion curves of Figures 6.8(a), 6.8(b), 6.9(a) and 6.9(b). First, it can be observed that the proposed depth coder consistently outperforms the JPEG-2000 encoder. For example, considering the “Teddy” depth image, coding improvements over JPEG-2000 are as high as 2.5 dB and 3.3 dB at 0.1 and 0.2 bit per pixel, respectively. For the “Cones” depth image, an image-quality improvement of up to 2.8 dB at 0.3 bit per pixel can be observed. For the “Breakdancers” and “Ballet” images, a gain of 0.9 dB and 0.4 dB can be obtained at 0.04 bit per pixel, respectively.

Let us now examine the subjective coding artifacts of the proposed depth coder. First, experiments have revealed that the proposed algorithm can approximate large smooth areas as well as sharp edges with a single node (see Figure 6.7). As a result, sharp edges are compactly encoded and accurately represented. An example is the left vertical edge in Figure 6.7(c).


PIC
(a)
PIC
(b)
PIC
(c)

Figure 6.7 (a) The original depth image “Teddy”. (b) The corresponding reconstructed depth image using the complete algorithm, (c) Superimposed nodes of the quadtree for the picture in (b). (Coding achieved with bit rate=0.12 bit/pixel and PSNR=36.1 dB).


Perceptually, our algorithm reconstructs edges of higher quality than the JPEG-2000 encoder. For example, the edge along the dancer in Figure 6.10(b) is accurately approximated when compared to the JPEG-2000 encoded signal (see Figure 6.10(c)), which is much more blurred.


PIC
(a)
PIC
(b)

Figure 6.8 Rate-distortion curves for the (a) “Teddy” and (b) “Cones” depth images, both for our algorithm and JPEG-2000.



PIC
(a)
PIC
(b)

Figure 6.9 Rate-distortion curves for the (a) “Breakdancers” and (b) “Ballet” depth images, both for our algorithm and JPEG-2000.



PIC
(a)
PIC
(b)
PIC
(c)

Figure 6.10 (a) “Ballet” depth image with a marked area. (b) The marked area coded with piecewise-linear functions at 35.8 dB PSNR. (c) Same area coded with JPEG-2000 at 35.5 dB PSNR. Both results are obtained at 0.038 bit/pixel.


6.4.2  Rendering experiments based on encoded depth


Let us now evaluate the quality of synthetic views rendered, using depth images encoded by the complete algorithm. The described experiment employs as input a depth image combined with the corresponding texture image and two neighboring texture views/images. To measure the rendering distortion, one approach consists of rendering a synthetic view at the position of a neighboring camera (same procedure as in Section 4.5). A rendering-distortion measure can be obtained by calculating the PSNR between the original and the rendered view. Practically, for this rendering PSNR evaluation, we have employed two neighboring texture views/images, from which the two resulting PSNR measures have been averaged. In Section 3.4.5, we have discussed the problem of selecting an appropriate objective rendering quality metric. Specifically, two rendered synthetic images with very different subjective qualities may show a similar objective PSNR quantity. This case will be illustrated by Figure 6.11. However, the PSNR was finally selected as a quality metric because it remains widely employed and the most broadly accepted metric by the research community. To evaluate the impact of depth compression on the rendering quality, the reference depth image is encoded at three different bit rates using two different depth-coders based on: (1) JPEG-2000 and (2) piecewise-linear functions. In this experiment, we investigate the impact of depth compression on the quality of the synthetic views. Therefore, the reference texture image is not encoded. The employed image rendering algorithm is the relief texture mapping algorithm which was presented in Section 4.2.3. Table 6.2 summarizes the rendering-quality measurements for the “Breakdancers” and “Ballet” sequences.






PSNR (dB)


Image Bit rate Piecewise-linear functionsJPEG-2000








0.030 bpp 33.4 dB 33.2 dB



“Breakdancers”0.065 bpp 33.6 dB 33.5 dB



0.09 bpp 33.6 dB 33.6 dB




0.058 bpp 29.3 dB 28.9 dB



“Ballet” 0.1 bpp 29.5 dB 29.3 dB



0.14 bpp 29.6 dB 29.5 dB





Table 6.2 Rendering-quality measurements obtained using two different depth coders based on: (a) piecewise-linear functions and (2) JPEG-2000.

Let us now discuss the obtained rendering quality. First, it can be observed that the depth coder based on piecewise-linear functions consistently outperforms the JPEG-2000 coder in terms of rendering. For example, an image-quality improvement of up to 0.2 dB and 0.4 dB can be observed for the “Breakdancers” and “Ballet” sequences, respectively. As expected, the image-quality improvement is particularly significant at object borders. For example, the depth discontinuity along the edges of the lady dancer in the “Ballet” sequence is preserved (Figure 6.11(c)). At the opposite, a JPEG-2000 coder deteriorates depth discontinuities, resulting in rendering artifacts (Figure 6.11(a)). In this case, it should be noted that the rendering-distortion values of both images are comparable and equal to 29.5 dB and 29.3 dB. Hence, although both images show only a limited rendering-distortion difference of 0.2 dB, important visual differences can be observed. Similar conclusions can be drawn by observing synthetic images of the “Breakdancers” sequence (see Figure 6.12).


PIC
(a)
PIC
(b)
PIC
(c)
PIC
(d)

Figure 6.11 (a) and (b) Synthetic image and corresponding magnified area rendered using a depth image encoded by a JPEG-2000 coder. (c) and (d) Synthetic image and corresponding magnified area rendered using a depth image encoded by a coder based on piecewise-linear functions. Both reference depth images are encoded at 0.1 bit per pixel.



PIC
(a)
PIC
(b)
PIC
(c)
PIC
(d)

Figure 6.12 (a) and (b) Synthetic image and corresponding magnified area rendered using a depth image encoded by a JPEG-2000 coder. (c) and (d) Synthetic image and corresponding magnified area rendered using a depth image encoded by a coder based on piecewise-linear functions. Both reference depth images are encoded at 0.03 bit per pixel.


6.4.3  Comparisons with H.264/MPEG-4 AVC and MVC


Besides the coding performance comparisons with JPEG-2000, additional comparisons with two encoders based on H.264/MPEG-4 AVC were carried out in collaboration with P. Merkle from Fraunhofer HHI. In the discussed experiments, Merkle carried out the compression and the rendering of the multi-view sequences using the H.264/MPEG-4 AVC and MVC encoders, while the author performed the encoding of the depth image sequences using the proposed platelet-based encoder (also termed earlier as the coder based on piecewise-linear functions; in the remainder, we employ the platelet-based term to save text). To evaluate the compression efficiency of the platelet-based encoder, we have employed two different reference encoders. First, an H.264/MPEG-4 AVC encoder has been employed to compress the multi-view depth sequences. Note that the proposed platelet-based encoder does not exploit the temporal and spatial inter-view correlations. Thus, to obtain similar compression settings, the GOP size of the H.264/MPEG-4 AVC was set to 1 (intra-coding only). Second, an MVC encoder has been used for compressing the multi-view depth sequences. The compression settings of the MVC encoder have been initialized such that the CABAC and the rate control are enabled and that the motion search range is 96 pixels.

The first coding comparison [57] shows that the platelet-based encoder is outperformed by an H.264/MPEG-4 AVC intra-coding. However, experiments have also revealed that a lower coding efficiency does not imply a lower rendering quality. Specifically, the rendering-quality evaluation shows that a platelet-based encoder achieves an equal or better rendering quality, although its coding efficiency is lower than the efficiency of H.264/MPEG-4 AVC intra-coding.

The second comparison [58] has revealed that also the MVC encoder yields a higher coding efficiency than a platelet-based encoder. This higher coding efficiency is obtained because the MVC encoder exploits the temporal and inter-view correlations. However, considering the “Breakdancers” sequence, the proposed depth encoder renders synthetic images with higher quality (when compared to an MVC depth encoder). For the “Ballet” sequence, the platelet-based encoder yields similar or slightly lower rendering quality. However, subjective evaluations of the rendered pictures show that the depth images encoded with a platelet-based encoder yield synthetic images with less rendering artifacts. For more details about the experimental settings and the experimental results, the reader is referred to the corresponding joint publications [5758].

2“Breakdancers” and “Ballet” depth image number 0 of camera 0. Note that the complexity of depth images is not significantly varying over time and across the views, so that including more depth images would not change the results.