6.1 Introduction
Previous work on depth image coding has used a transformbased algorithm derived from JPEG2000 [48, 52] and MPEG encoders [29]. The key advantage of using a standardized video coding algorithm to compress depth images is the backward compatibility with already existing technology. However, transform coders have shown a significant shortcoming for representing edges without deterioration at low bit rates. Perceptually, such a coder generates ringing artifacts along edges that lead to errors in pixel positions, which appear as blurring within the depth signaldefinition along the object borders. An alternative approach is to use triangular meshes [100, 11] to code depth maps. This alternative technique divides the image into triangular patches and approximates each patch by a linear function. If the data cannot be represented with a single linear function, smaller patches are used for that area. However, the placement of the patches is usually based on a regular grid, such that a large number of small patches are generated along edges.
The characteristics of depth maps significantly differ from normal textured images (see Figure 6.1). For example, since a depth map explicitly captures the 3D structure of a scene, large parts of typical depth images represent object surfaces. As a result, the input depth image contains various areas of smoothly changing grey levels. Furthermore, at the object boundaries, the depth map incorporates step functions, i.e., sharp edges.

Following these observations, we propose to model depth images by piecewiselinear functions which are separated by straight lines. For this reason, we are interested in an algorithm that efficiently extracts the sparse elements of the geometrical structures within the depth signal, yielding a compact representation and thus a low bit rate without significant distortion. Several contributions have considered the use of geometrical models to approximate images. Such a modeling function, the “Wedgelet” function [23], is defined as two piecewiseconstant functions separated by a straight line. This concept was originally introduced as a means for detecting and recovering edges from noisy images, and was later extended to piecewiselinear functions, called “Platelet” [104] functions. To define the area of support of each modeling function, a quadtree segmentation of the image is used. The concept is to recursively subdivide the image into variablesize blocks and approximate each block with an appropriate model.
The framework used within this chapter for compression of 3D video signals is that we discuss depthsignal coding here independently from the textured video. Hence, we make an attempt to find an algorithm that well fits to the characteristics of the depth signal and we assume that the texture video is coded with any suitable video coder, such as H.264/MPEG4 AVC. Considering this compression framework, we have adopted the “Wedgelet” and “Platelet” imagemodeling method. In this way, we follow the idea developed to code images using piecewise polynomials [91] as modeling functions. More particularly, we consider four different piecewiselinear functions for modeling. The first and second modeling functions, which are a constant and linear function, respectively, have been found suitable to approximate smooth regions. The third and fourth modeling functions attempt to capture depth discontinuities, using two constant functions or two linear functions separated by a straight line. The proposed four modeling functions are used to approximate the entire image. To this end, the depth image is subdivided into variablesize blocks, using a quadtree decomposition. An independent modeling function is subsequently selected for each node. The selection of the most appropriate function is performed using a cost function that balances both rate and distortion. In a similar way, we employ an equivalent cost function that determines the optimal block sizes employed within the quadtree. Our results show that the proposal yields up to 13 dB PSNR improvement over a JPEG2000 encoder.
This chapter is organized as follows. In the next section of this chapter, we define the previously introduced piecewiselinear modeling functions. Afterwards, Section 6.3 describes the bitallocation strategy for mapping the model function outputs to code words, using ratedistortion principles. Experimental results are provided in Section 6.4 and the chapter concludes with Section 6.5.