5.5  Conclusions

In this chapter, we have presented two predictive coding tools: the disparity-compensated prediction and the view-synthesis prediction. To increase the coding efficiency, the presented coding tools aim at exploiting the inter-view correlation by using various prediction structures. To this end, we have investigated the MaDC prediction structure that leads to a complex prediction structure and involves a large number of dependencies between the frames (see Figure 5.1(b)). We have therefore proposed an alternative coding structure that reduces the number of dependencies between the frames as shown by Figure 5.4. Related to this, we have discussed multiple scenarios for selecting an appropriate coding structure, fulfilling the requirements of (1) either high coding efficiency, or (2) low-delay access to a user-selected view.

Using the proposed prediction structure, we have presented a new algorithm for the predictive coding of multiple depth and texture camera views that employs two different view-prediction algorithms: a disparity-compen-sated prediction and a view-synthesis prediction. The advantages of the algorithm are that (1) the compression is robust against inaccurately estimated depth images and (2) the chosen prediction structure allows random access to different views. For each image block, the selection between the two prediction algorithms is carried out using a rate-distortion criterion. Furthermore, we have integrated the view-synthesis prediction scheme into an H.264/MPEG-4 AVC encoder, leading to an adaptive prediction, using either disparity-compensated prediction, or view-synthesis prediction. It was discussed that this integration can be performed smoothly and leads elegant advantages. The most relevant benefits are that the predictor selection is performed using the H.264/MPEG-4 AVC framework and that the coding mode selection is indicated by the reference frame index.

Experimental results have shown that the view-synthesis predictor can improve the resulting texture-image quality by up to 0.6 dB for the “Breakdancers” sequence at low bit rates, when compared to solely performing H.264/MPEG-4 AVC disparity-compensated prediction. However, we have found that the proposed coding structure yields a notable coding efficiency degradation for the “Ballet” texture sequence.

A major result of this chapter is that the view-synthesis prediction and its integration to the multi-view coding framework can be equally applied to the compression of multi-view depth images, thereby leading to a unified approach. The major difference between the texture and depth coding algorithm is that the depth prediction does not require any side information. As a bonus, the depth signals can be more easily predicted by the view-synthesis prediction because of their smooth properties and inter-view consistency. Experimental results have revealed that the proposed “DCP/VSP” coder yields a depth image quality improvement of up to a significant 3.2 dB when compared to the “DCP” coder. Finally, considering the multi-view depth sequence, we have indicated that random-access capabilities can be obtained without significant loss of coding efficiency or even coding gains.

At this point, it is relevant to briefly compare the results of this chapter with the recent developments of the MVC framework. The discussed MaDC prediction structure has a comparable performance to the MVC framework. For both MaDC and MVC encoders, a limited coding performance improvement occurs (when compared to simulcast compression) for wide baseline camera setups. For example, for the “Breakdancers” and “Ballet” multi-view sequences, an MVC encoder yields a compression improvement which ranges from 0.05 dB to 0.25 dB at 500 kbit/s for the texture compression, and -0.1 to 0.5 dB at 200 kbit/s for the depth compression. Bearing in mind that the MVC encoder has not provided a coding efficiency improvement for this wide baseline camera sequences, we have proposed to employ an appropriate number of reference main views that balances both the coding efficiency and the random-access capabilities.

The presented view-synthesis prediction scheme exploits both the depth and texture inter-view redundancy. The main arguments for employing this scheme are that it facilitates random access to a user-selected view (depth and texture) and we realize a significant improvement in multi-view depth compression. In Chapter 6, it will be shown that the the encoding of depth data is of crucial importance for rendering high-quality images. Subsequently, it will be described in Chapter 7 that the bit rate of the depth data constitutes an significant part of the total bit rate, so that improving the compression of depth signals will also improve the compression of the total data set.