5.4  Experimental results


5.4.1  Conditions


For evaluating the performances of the proposed multi-view texture- and depth-coding algorithms, experiments were carried out using the “Ballet” and “Breakdancers” texture and depth multi-view sequences. The presented experiments investigate the impact of the prediction accuracy on the rate-distortion performance. For each presented rate-distortion curve, the compression of multi-view images is performed under three different conditions indicated as follows.
  1. “Simulcast”: the compression of multi-view depth and texture video is performed using a simulcast coding structure  4, i.e., limited random-access capabilities. (Figure 5.1(a))
  2. “DCP”: the prediction of depth and texture views is carried out using only the H.264/MPEG-4 AVC block-based disparity-compensated prediction. To enable better random-access capabilities, the coding structure illustrated by Figure 5.4 is employed.
  3. “DCP/VSP”: the prediction scheme (VSP or DCP) is carried out adaptively, using a rate-distortion criterion for an optimal prediction-mode selection. To enable better random-access capabilities, the coding structure illustrated by Figure 5.4 is employed.

For the coding experiments, we have employed the open-source H.264/MPEG-4 AVC encoder x264 [4] using main profile encoding. The arithmetic coding algorithm CABAC was enabled for all experiments and the motion search area was 32 × 32 pixels. We have set the number of reference frames to 2: one reference for the block-based disparity prediction and a second for the view-synthesis prediction. The Group Of Pictures (GOP) size is set to 25 frames and the GOP structure is defined as IBBP (as proposed in the JVT document [1]). Thus, as opposed to the JVT input document [62] that investigates the compression performance of MVC for multi-view video plus depth, no hierarchical B-pictures are employed in the presented experiments. Prior to rendering, the reference depth is encoded with quantizer setting QP = 29. An algorithm for optimally selecting this quantizer setting is proposed in Chapter 7. It should be noted that depth images should be encoded at a relatively high quality to avoid ringing-artifacts along object borders in the depth map. This prevents that rendering artifacts occur in the synthesized view. This remark is similar to the conclusions related to recent depth compression results [6128]. Because depth data is necessary for 3D rendering in any case, it can be assumed that depth images are transmitted even in the case that no view-synthesis prediction is employed. Hence, employing the view-synthesis prediction does not involve any bit-rate overhead. Therefore, it should be noted that the presented rate-distortion curves for texture sequences do not include the bit rate of depth images.

5.4.2  Experimental results for multi-view texture coding


The result of the comparison for multi-view texture compression is provided by Figure 5.7.

Comparison of “DCP” with respect to “DCP/VSP”.

First, it can be observed that the proposed “DCP/VSP” algorithm consistently outperforms the “DCP” scheme. For example, observing the “Breakdancers” and “Ballet” rate-distortion curves (Figure 5.7(a) and Figure 5.7(b)), it can be seen that the “DCP/VSP” algorithm yields a quality improvement of 0.6 dB and 0.2 dB at a bit rate of 250 kbit/s and 500 kbit/s, respectively. However, at high bit rate of about 1 Mbit/s, no coding improvement is obtained. In that case, the view synthesis does not provide a sufficiently accurate prediction, so that the disparity-compensated prediction mode is mostly selected. The proposed VSP-based predictor is therefore mostly efficient at low bit rates and yields limited coding improvements at high bit rates, because it cannot accurately predict very fine textures. Summarizing, the adaptive “DCP/VSP” scheme always gives the best compression, but its gain is especially visible for low bit rates.

Comparison of “DCP/VSP” with respect to simulcast coding.

We especially evaluate the coding performance differences, while considering the random-access capabilities of the decoder. First, observing the rate-distortion curve of the “Breakdancers” sequence, it can be noted that the “DCP/VSP” coder yields slightly lower coding performances than the simulcast coding (0.2 dB at 250 kbit/s). In this case, the random-access capabilities are obtained at the cost of a limited coding-performance loss. For the “Ballet” sequence, it can be observed that random access capabilities can be obtained at the cost of a significant coding-performance loss. Such an important coding-performance loss can be explained by examining the properties of the “Ballet” multi-view sequence. More specifically, the “Ballet” sequence was captured using a wide baseline camera setup and depicts large foreground objects (the dancers). As a result, the sequence shows large occluded regions. Hence, the temporal correlation between consecutive frames is more important than the spatial inter-view correlation. This experimental result confirms the statistical analysis of [44], as discussed in Section 5.2.1. As suggested in Section 5.2.4, a solution to increase the coding efficiency is to employ more reference main views. In the extreme case, N reference main views may be employed to obtain the highest coding gain, where N corresponds to the number of views. For the “Ballet” sequence, this extreme case corresponds to a simulcast compression of the views and yields coding performances similar to those of an MVC encoder [62]. As a result, the system designer may employ the proposed coding structure, where the number of reference main views corresponds to a trade-off parameter that balances both the bit rate and the delay for accessing a desired view. This trade-off between coding efficiency and random access was discussed in Section 5.2.4. For each sequence, we summarize in Table 5.1 the corresponding scenarios discussed in Section 5.2.4 for selecting a prediction structure, such that the random access and coding efficiency are appropriately balanced.





Multi-view texture sequence


“Breakdancers”

“Ballet”







High coding efficiency required

An MaDC prediction structure or an MVC encoder should be employed because both schemes yield the highest coding performance.

A simulcast coding structure should be employed because simulcast coding yields a coding performance similar to that of an MVC encoder, but does not involve a high number of coding dependencies.




Low-delay rendering required

The proposed coding structure illustrated by Figure 5.4 should be used. However, a lower delay (than MaDC or MVC) for accessing and rendering views is obtained at the cost of an image quality loss of 0.75 dB, when compared to an MaDC prediction structure or an MVC encoder.

The proposed coding structure should be used (see Figure 5.4). However, a lower delay for accessing and rendering views can only be obtained at the cost of a significant coding-efficiency loss. To prevent such a loss, a high number of reference main views should be used. The highest coding efficiency is obtained when the number of main views is equal to the number of views N, i.e., simulcast compression of the views.





Table 5.1 Scenarios for balancing the trade-off between coding efficiency and random access for the “Breakdancers” and “Ballet” sequences.


PIC
(a)
PIC
(b)

Figure 5.7 RD-curves of sequences (a) “Breakdancers” and (b) “Ballet” encoded using simulcast, “DCP” and “DCP/VSP” coders.


5.4.3  Experimental results for multi-view depth coding


Figure 5.8 depicts the comparison results for the compression of multi-view depth sequences.

Comparison of “DCP” with respect to “DCP/VSP”.

First, it can be observed that the proposed “DCP/VSP” algorithm consistently outperforms the “DCP” scheme. For example, observing the rate-distortion curve of the “Breakdancers” depth image sequence (Figure 5.8(a)), the “DCP/VSP” algorithm yields a quality improvement of 1.6 dB and 1.2 dB at a bit rate of 250 kbit/s and 500 kbit/s, respectively. For the “Ballet” depth image sequence (Figure 5.8(b)), the proposed algorithm provides an image-quality improvement of 2.7 dB and 3.2 dB at a bit rate of 250 kbit/s and 500 kbit/s, respectively. Therefore, the view-synthesis predictive-coding algorithm brings significant coding improvements over a disparity-compensated prediction scheme. Such coding improvements can be explained by two factors. First, the view-synthesis algorithm correctly models the motion objects in the video scene. As a result, a view-synthesis prediction provides a more accurate prediction than a disparity-compensated prediction. Second, it should be reminded that consistency between depth images across the views was enforced during depth estimation (see Chapter 3). It can now be seen, that, besides depth-estimation accuracy, the inter-view consistency constraint for depth estimation also results in a coding efficiency improvement.

Comparison of “DCP/VSP” with respect to simulcast coding.

For the “Breakdancers” depth image sequence, it is interesting to note that the proposed “DCP/VSP” now outperforms simulcast depth coding. The obtained improvement is 1.6 dB and 1.7 dB at 250 kbit/s and 500 kbit/s, respectively. Summarizing, for the “Breakdancers” sequence, the proposed “DCP/VSP” algorithm provides random-access capabilities while improving the coding efficiency. For the “Ballet” sequence, a marginal coding loss of only 0.2 dB is obtained at 250 kbit/s and 500 kbit/s, respectively. Although no coding improvement can be reported for this particular case, the proposed algorithm still enables random access to different views as a remaining benefit.


PIC
(a)
PIC
(b)

Figure 5.8 RD-curves of depth multi-view sequences for (a) “Breakdancers” and (b) “Ballet”, encoded using simulcast, “DCP” and “DCP/VSP” coders.


4At the time the presented experiments were performed and published [737270], a simulcast compression constituted the anchor for the coding performance comparisons [1]. In 2007, the reference software JMVM became the anchor for comparisons [9513].