In the previous chapter, we have introduced the fundamentals of multi-view geometry that model the projection of a 3D point onto the 2D image plane. In this chapter, we employ the geometry of multiple views to solve the inverse problem of estimating the 3D position (depth) of a point using multiple 2D images.
Depth estimation aims at calculating the structure and depth of objects in a scene from a set of multiple views or images. The problem is to localize corresponding pixels in the multiple views, i.e., point-correspondences, that identify the same 3D scene point. By finding those point-correspondences, the depth information can be derived by triangulation. In the case point-correspondences are estimated for each non-occluded pixel, a so-called dense depth map or depth image can be constructed (see Figure 3.1).
Figure 3.1 The depth of a pixel can be derived by triangulating multiple views of the scene (here, two views (a) and (b)). (c) The scene structure represented in a so-called depth image.
One important application of depth maps is image rendering for free-viewpoint video and 3D-TV systems. For such applications, we have seen in Section 1.3 that a 3D video scene can be represented efficiently by using one depth image for each texture view. In such a case, multi-view depth images should be computed or estimated. We define multi-view depth images as a set of depth images corresponding to the multiple captured texture views. Considering the multi-view video acquisition and compression framework, the multi-view depth images should:
- be accurately estimated so that high-quality images can be rendered,
- present smooth properties on the object surfaces so that a high compression ratio (of depth images) can be obtained,
- have sharp discontinuities along object borders so that a high rendering quality can be obtained,
- be consistent across the views so that the multi-view compression system can exploit the inter-view redundancy (between depth views).
At first glance, the second and third objective may look contradictory, but the reader should consider that properties of surfaces are different from the properties of the borders of an object. The proposed depth estimation algorithm handles this difference by adapting the smoothness requirement at object borders, i.e., depth discontinuities.
In the first part of this chapter, we focus on the problem of estimating the depth using two views. To simplify the understanding of the problem, the restricted case of depth estimation using two rectified views is considered. Assuming this restricted case, we present an elementary depth image estimation algorithm and discuss corresponding difficulties. To calculate more accurate depth estimates, we review two optimization strategies: a local and a one-dimensional optimization technique. Results when using these two techniques showing estimated depth images and intermediate conclusions are presented.
In the second part, we extend the two-view depth-estimation algorithm to the case of multiple views. Based on this extension, we propose two new constraints that enforce consistent depth estimates across the scanlines (inter-line smoothness constraint) and across the views (inter-view smoothness constraint). We show that these additional constraints can be readily integrated into the optimization algorithm and significantly increase the accuracy of depth estimates. As a result, both constraints yield smooth and consistent depth images across the views, so that a high compression ratio can be obtained.