3.1 Introduction
In the previous chapter, we have introduced the fundamentals of multiview geometry that model the projection of a 3D point onto the 2D image plane. In this chapter, we employ the geometry of multiple views to solve the inverse problem of estimating the 3D position (depth) of a point using multiple 2D images.
Depth estimation aims at calculating the structure and depth of objects in a scene from a set of multiple views or images. The problem is to localize corresponding pixels in the multiple views, i.e., pointcorrespondences, that identify the same 3D scene point. By finding those pointcorrespondences, the depth information can be derived by triangulation. In the case pointcorrespondences are estimated for each nonoccluded pixel, a socalled dense depth map or depth image can be constructed (see Figure 3.1).
Figure 3.1 The depth of a pixel can be derived by triangulating multiple views of the scene (here, two views (a) and (b)). (c) The scene structure represented in a socalled depth image.

One important application of depth maps is image rendering for freeviewpoint video and 3DTV systems. For such applications, we have seen in Section 1.3 that a 3D video scene can be represented efficiently by using one depth image for each texture view. In such a case, multiview depth images should be computed or estimated. We define multiview depth images as a set of depth images corresponding to the multiple captured texture views. Considering the multiview video acquisition and compression framework, the multiview depth images should:
 be accurately estimated so that highquality images can be rendered,
 present smooth properties on the object surfaces so that a high compression ratio (of depth images) can be obtained,
 have sharp discontinuities along object borders so that a high rendering quality can be obtained,
 be consistent across the views so that the multiview compression system can exploit the interview redundancy (between depth views).
At first glance, the second and third objective may look contradictory, but the reader should consider that properties of surfaces are different from the properties of the borders of an object. The proposed depth estimation algorithm handles this difference by adapting the smoothness requirement at object borders, i.e., depth discontinuities.
In the first part of this chapter, we focus on the problem of estimating the depth using two views. To simplify the understanding of the problem, the restricted case of depth estimation using two rectified views is considered. Assuming this restricted case, we present an elementary depth image estimation algorithm and discuss corresponding difficulties. To calculate more accurate depth estimates, we review two optimization strategies: a local and a onedimensional optimization technique. Results when using these two techniques showing estimated depth images and intermediate conclusions are presented.
In the second part, we extend the twoview depthestimation algorithm to the case of multiple views. Based on this extension, we propose two new constraints that enforce consistent depth estimates across the scanlines (interline smoothness constraint) and across the views (interview smoothness constraint). We show that these additional constraints can be readily integrated into the optimization algorithm and significantly increase the accuracy of depth estimates. As a result, both constraints yield smooth and consistent depth images across the views, so that a high compression ratio can be obtained.