6.1  Introduction

Previous work on depth image coding has used a transform-based algorithm derived from JPEG-2000 [4852] and MPEG encoders [29]. The key advantage of using a standardized video coding algorithm to compress depth images is the backward compatibility with already existing technology. However, transform coders have shown a significant shortcoming for representing edges without deterioration at low bit rates. Perceptually, such a coder generates ringing artifacts along edges that lead to errors in pixel positions, which appear as blurring within the depth signal-definition along the object borders. An alternative approach is to use triangular meshes [10011] to code depth maps. This alternative technique divides the image into triangular patches and approximates each patch by a linear function. If the data cannot be represented with a single linear function, smaller patches are used for that area. However, the placement of the patches is usually based on a regular grid, such that a large number of small patches are generated along edges.

The characteristics of depth maps significantly differ from normal textured images (see Figure 6.1). For example, since a depth map explicitly captures the 3-D structure of a scene, large parts of typical depth images represent object surfaces. As a result, the input depth image contains various areas of smoothly changing grey levels. Furthermore, at the object boundaries, the depth map incorporates step functions, i.e., sharp edges.

Figure 6.1 Example texture image (left) and the corresponding depth map (right). A typical depth image contains regions of linear depth changes bounded by sharp discontinuities.

Following these observations, we propose to model depth images by piecewise-linear functions which are separated by straight lines. For this reason, we are interested in an algorithm that efficiently extracts the sparse elements of the geometrical structures within the depth signal, yielding a compact representation and thus a low bit rate without significant distortion. Several contributions have considered the use of geometrical models to approximate images. Such a modeling function, the “Wedgelet” function [23], is defined as two piecewise-constant functions separated by a straight line. This concept was originally introduced as a means for detecting and recovering edges from noisy images, and was later extended to piecewise-linear functions, called “Platelet” [104] functions. To define the area of support of each modeling function, a quadtree segmentation of the image is used. The concept is to recursively subdivide the image into variable-size blocks and approximate each block with an appropriate model.

The framework used within this chapter for compression of 3D video signals is that we discuss depth-signal coding here independently from the textured video. Hence, we make an attempt to find an algorithm that well fits to the characteristics of the depth signal and we assume that the texture video is coded with any suitable video coder, such as H.264/MPEG-4 AVC. Considering this compression framework, we have adopted the “Wedgelet” and “Platelet” image-modeling method. In this way, we follow the idea developed to code images using piecewise polynomials [91] as modeling functions. More particularly, we consider four different piecewise-linear functions for modeling. The first and second modeling functions, which are a constant and linear function, respectively, have been found suitable to approximate smooth regions. The third and fourth modeling functions attempt to capture depth discontinuities, using two constant functions or two linear functions separated by a straight line. The proposed four modeling functions are used to approximate the entire image. To this end, the depth image is subdivided into variable-size blocks, using a quadtree decomposition. An independent modeling function is subsequently selected for each node. The selection of the most appropriate function is performed using a cost function that balances both rate and distortion. In a similar way, we employ an equivalent cost function that determines the optimal block sizes employed within the quadtree. Our results show that the proposal yields up to 1-3 dB PSNR improvement over a JPEG-2000 encoder.

This chapter is organized as follows. In the next section of this chapter, we define the previously introduced piecewise-linear modeling functions. Afterwards, Section 6.3 describes the bit-allocation strategy for mapping the model function outputs to code words, using rate-distortion principles. Experimental results are provided in Section 6.4 and the chapter concludes with Section 6.5.