For estimating accurate depth images, we have proposed a technique for depth-estimation that allows the use of several views simultaneously and circumvents the need of image-pair rectification. The algorithm integrates two smoothness constraints, ensuring smooth depth transitions, both across the lines and across the views. We have shown that both constraints can be efficiently integrated into a one-dimensional optimization dynamic programming algorithm. Note that this optimization for depth images is performed independently for each frame.
To extend the above results and obtain a more accurate depth estimation, the proposed frame-based scheme can be easily modified for the video domain. Specifically, it can be assumed that the depth of objects varies smoothly along the temporal dimension because of the inertia of natural objects. Similar to the two previously mentioned smoothness constraints, a temporal smoothness constraint can be integrated into the dynamic programming optimization algorithm. The integration of the temporal smoothness constraint can be carried out by modeling the temporal variation of depth into a cost function, penalizing fast transitions of depth across consecutive depth images. This penalty cost is then integrated into the dynamic programming graph by modeling the transition cost as an edge of the graph. This technique would ensure the temporal stability of the depth image and thus lead to temporally stable rendered images.
To enable high-quality rendering and avoid occluded regions, the proposed 3D video system is based on an N-depth/N-texture video representation format, where each camera view covers different regions of the video scene. First, to render synthetic images we have proposed a variant of the relief texture mapping. The described rendering technique efficiently handles holes and occluded pixels in the rendered images and can also be executed favorably by a Graphics Processor Unit. Second, we have proposed an image rendering technique based on an inverse mapping method that allows a simple and accurate re-sampling of synthetic pixels. Additionally, the presented method provides a simple means for handling occlusions by an elegant, unambiguous construction of a single synthetic image from the two compositing neighboring views. Experimental comparisons with 3D image warping show an improvement of rendering quality of up to 3.0 dB for the inverse mapping rendering technique.
In future research, the rendering quality of images can be increased by combining a higher number of source reference images. Specifically, it can be envisioned that compositing from multiple source images would enable the synthesis of super-resolution synthetic images. This technique would not only yield high-quality synthetic images, but also high-quality prediction views and thus increase the coding efficiency.
We have presented an algorithm for the predictive coding of multiple depth and texture camera views that employs two different view-prediction algorithms in parallel: (1) a block-based motion prediction and (2) a view-synthesis prediction. The algorithm selects the most appropriate predictor on a block basis, using a rate-distortion criterion for an optimal prediction-mode selection. The advantages of the complete algorithm are that the compression is robust against inaccurately estimated depth images and that the chosen prediction structure features random access to different views. Furthermore, we have integrated this prediction scheme into an H.264/MPEG-4 AVC encoder, such that the disparity-compensation prediction derived from the conventional motion compensation, is combined with the view-synthesis prediction. Experimental results have shown a modest gain for texture coding and a quality improvement of up to 3.2 dB for the depth signal, when compared to solely performing H.264/MPEG-4 AVC disparity-compensated prediction. A major advantage of the proposed multi-view depth encoder is that the depth prediction scheme does not require the transmission of any side information (such as motion vectors for motion-compensated prediction) because the depth image enables the prediction to be based only on the depth signal itself.
The current algorithm requires the selection of reference views from which neighboring camera views are predicted. This provides a simple method for view prediction, but it requires a proper selection of the reference views. The automatic selection of reference views remains as an interesting future research topic. This automatic selection can be applied not only to the reference views, but also to the prediction structure. Specifically, depending on the properties of the video scene, the prediction structure that yields the minimum bit rate for the lowest distortion may be automatically generated for further gain in compression.
In this chapter, we have proposed a novel depth image coding algorithm that exploits the properties of depth images, i.e., smooth regions delineated by sharp edges. The proposed coding algorithm is modeling smooth regions by piecewise-linear functions and sharp edges by a straight line. The quadtree decomposition and the selection of the type of modeling function and the quantizer setting for the model coefficients is optimized such that a global rate-distortion trade-off is realized. A beneficial comparison in coding and rendering with a JPEG-2000 encoder was made.
Possible future research involves the extension of the proposed compression algorithm to the video domain. As performed by standardized video encoders, a conventional approach is to reduce the temporal redundancy between consecutive depth images. To this end, a simple approach consists of encoding the motion-compensated differences between consecutive depth frames. Assuming that depth images can be modeled by piecewise-linear functions, it can be deduced that the same modeling can be applied to the residual difference of two depth images. Apart from the already realized gain, it is evident that an additional coding gain will be obtained using motion compensation.
Finally, we have proposed a novel algorithm that concentrates on the joint compression of depth and texture images. Instead of the conventional independent compression of texture and depth, we have presented an algorithm that jointly optimizes both the rate and the distortion to obtain optimal rendering quality. The presented algorithm optimizes the depth and texture quantization parameters, using an iterative hierarchical search similar to the well-known Three Step Search in motion estimation.
In the current scheme, the quantization parameters are entirely optimized at the first video frame and remain fixed for the sequel of the same video sequence. Instead of using fixed quantization parameters for the video sequence, a possible way to reduce the complexity (per frame) is to employ parameters of the previous video frame as an initialization for searching the optimal quantization parameters of the succeeding video frame. This approach would lead to an adaptive method for optimizing the quantization parameters.